# The Central Limit Theorem

The Central Limit Theorem says that for any data(normal or otherwise), the distribution of the sample means has an approximately normal distribution. In the history of statistics, this has simplified the development of sampling methods, statistical tests, and statistical algorithms. And this is why statisticians give special importance to the normal distribution.

To be precise, the CLT is stated in this manner:
let $\{X_{n}\}$ be a sequence of i.i.d. random variables with $\mu = 0$ and $\sigma^{2} = 1$. If $Z \sim N(0,1)$ and $S_{n} = \sum_{i=1}^{n} X_{i}$, we have $S_{n}/\sqrt{n} \longrightarrow Z$ in distribution as $n \longrightarrow \infty$. i.e. $\forall x \in \Re, \lim_{n \to \infty}P(S_{n}/\sqrt{n} \leq x) = \frac{1}{\sqrt{2 \pi}} \int^x_{-\infty} e^{-\frac{u^2}{2}}\,du$

Lemma: Levy’s continuity theorem states that convergence in distribution is equivalent to point-wise convergence of the corresponding characteristic function.

In order to use Levy’s continuity theorem, we must use the following estimates on Taylor expansions of exponential functions:
a) $u \geq 0, 0 \leq e^{-u} -1 +u \leq u^{2}/2$
b) $\forall t \in \Re, |e^{it} -1-it| \leq |t|^{2}/2$
c) $\forall t \in \Re, |e^{it} -1-it-(it)^{2}/2| \leq |t|^{3}/6$

Now we may proceed with the proof:

i) let $F$ be the characteristic function of the common distribution of the $\{X_{n}\}$. Then for every $t \in \Re$, the characteristic function of $S_{n}/\sqrt{n}$ is given by $E(e^{itS_{n}/\sqrt{n}}) = [F(t/\sqrt{n})]^{n}$

ii) Consequently, our task is to prove that $\forall t \in \Re$, $\lim_{n \to \infty}[F(t/\sqrt{n})]^{n} = e^{-t^{2}/2}$

iii) We begin our estimation by noting that $|[F(t/\sqrt{n})]^{n}-e^{-t^{2}/2}| \leq n |F(t/\sqrt{n})-e^{-t^{2}/2n}|$ since $|F(t/\sqrt{n})| \leq 1$ and $|e^{-t^{2}/2n}| \leq 1$

iv) Now, we may use the triangle inequality to show that $|[F(t/\sqrt{n})]^{n}-e^{-t^{2}/2n}| \leq n |F(t/\sqrt{n})-(1-t^{2}/2n)| + n |(1-t^{2}/2n)-e^{-t^{2}/2n}|$

v) by our first estimate, letting $u = t^{2}/2n \geq 0$, we see that $n |(1-t^{2}/2n)-e^{-t^{2}/2n}| \leq \frac{n(t^{2}/2n)}{2} = t^{4}/8n$ which approaches $0$ as $n \longrightarrow \infty$.

vi) for the first term we note that $n |F(t/\sqrt{n})-(1-t^{2}/2n)|=n |E[e^{itX}-(1+\frac{itX}{\sqrt{n}}+i^{2}t^{2}X^{2}/2n)]| \leq n E[|e^{itX}-(1+\frac{itX}{\sqrt{n}}+i^{2}t^{2}X^{2}/2n)|]$

For any $\delta > 0$  and positive integer $n$, let $A=A(\delta,n) = \{|X|>\delta\sqrt{n}\}$. Then $|e^{itX}-(1+\frac{itX}{\sqrt{n}}+i^{2}t^{2}X^{2}/2n)| \leq (\frac{t^{2}X^{2}}{n}) I_{A} + (\frac{1}{6}\frac{|tX|^{3}}{n^{3/2}}) I_{A^{c}}$

Consequently,

$n E[|e^{itX}-(1+\frac{itX}{\sqrt{n}}+i^{2}t^{2}X^{2}/2n)|] \leq n E[(\frac{t^{2}X^{2}}{n}) I_{A}] + n E[\frac{1}{6}\frac{|tX|^{3}}{n^{3/2}} I_{A^{c}}] \leq t^{2} E[X^{2} I_{A}] + \delta E[|t|^{3} |X|^{2}] =t^{2} E[X^{2} I_{A}] + \frac{\delta |t|^{3}}{6}$

Now, given $\varepsilon > 0$ we first choose $\delta > 0$ so $\frac{|t|^{3} \delta}{6} \leq \frac{\varepsilon}{2}$ and for this $\delta$ we
choose $n$ so that if $N \geq n$ we have

$t^{2}E[X^{2} I_{A}] \leq \frac{\varepsilon}{2}$

The proof then follows from the bounded convergence theorem. But, it’s important to note that:
1) You can never collect an infinite amount of data.
2) A lot of data isn’t generated by stationary processes and so the i.i.d. assumption doesn’t necessarily hold. You can check this discussion for more details.

*this proof can be easily generalized to i.i.d. random variables with finite mean and finite variance if you simply normalize the variables