## Motivation:

Earlier today I was talking to a researcher about how well a normal distribution could approximate a uniform distribution over an interval $$[a,b] \subset \mathbb{R}$$. I gave a few arguments for why I thought a normal distribution wouldn’t be good but I didn’t have the exact answer at the top of my head so I decided to find out. Although the following analysis involves nothing fancy I consider it useful as it’s easily generalised to higher dimensions(i.e. multivariate uniform distributions) and we arrive at a result which I wouldn’t consider intuitive.

For those who appreciate numerical experiments, I wrote a small TensorFlow script to accompany this blog post.

## Statement of the problem:

We would like to minimise the KL-Divergence:

$$\mathcal{D_{KL}}(P|Q) = -\int_{-\infty}^\infty p(x) \ln \frac{p(x)}{q(x)}dx$$

where $$P$$ is the target uniform distribution and $$Q$$ is the approximating Gaussian:

$$p(x)= \frac{1}{b-a} \mathbb{1}_{[b-a]} \implies p(x \notin [b-a]) = 0$$

and

$$q(x)= \frac{1}{\sqrt{2 \pi \sigma^2}} e^{\frac{(x-\mu)^2}{2 \sigma^2}}$$

Now, given that $$\lim_{x \to 0} x\ln(x) = 0$$ if we assume that $$(a,b)$$ is fixed our loss may be expressed in terms of $$\mu$$ and $$\sigma$$:

\begin{split} \mathcal{L}(\mu,\sigma) & = -\int_{a}^b p(x) \ln \frac{p(x)}{q(x)}dx
& = \ln(b-a) - \frac{1}{2}\ln(2\pi\sigma^2)-\frac{\frac{1}{3}(b^3-a^3)-\mu(b^2-a^2)+\mu^2(b-a)}{2\sigma^2(b-a)} \end{split}

## Minimising with respect to $$\mu$$ and $$\sigma$$:

We can easily show that the mean and variance of the Gaussian which minimises $$\mathcal{L}(\mu,\sigma)$$ correspond to the mean and variance of a uniform distribution over $$[a,b]$$:

$$\frac{\partial}{\partial \mu} \mathcal{L}(\mu,\sigma) = \frac{(b+a)}{2\sigma^2} - \frac{2\mu}{2\sigma^2}= 0 \implies \mu = \frac{a+b}{2}$$

$$\frac{\partial}{\partial \sigma} \mathcal{L}(\mu,\sigma) = -\frac{1}{\sigma}+\frac{\frac{1}{3}(b^2+a^2+ab)-\frac{1}{4}(b+a)^2}{\sigma^3} =0 \implies \sigma^2 = \frac{(b-a)^2}{12}$$

Although I wouldn’t have guessed this result the careful reader will notice that this result readily generalises to higher dimensions.

## Analysing the loss with respect to optimal Gaussians:

After entering the optimal values of $$\mu$$ and $$\sigma$$ into $$\mathcal{L}(\mu,\sigma)$$ and simplifying the resulting expression we have the following residual loss:

$$\mathcal{L}^* = -\frac{1}{2}(\ln \big(\frac{\pi}{6}\big)+1) \approx -.17$$

I find this result surprising because I didn’t expect the dependence on $$\Delta = b-a$$ to vanish. That said, my current intuition for this result is that if we tried fitting $$\mathcal{U}(a,b)$$ to $$\mathcal{N}(\mu,\sigma)$$ we would obtain:

$$[a,b] = [\mu - \sqrt{3}\sigma, \mu + \sqrt{3}\sigma]$$

so this minimisation problem corresponds to a linear re-scaling of the uniform parameters in terms of $$\mu$$ and $$\sigma$$.

## Remark:

The reader may experiment with the following TensorFlow function which outputs the approximating mean and variance of a Gaussian given a uniform distribution on the interval $$[a,b]$$.