The Central Limit Theorem says that for any data(normal or otherwise), the distribution of the sample means has an approximately normal distribution. In the history of statistics, this has simplified the development of sampling methods, statistical tests, and statistical algorithms. And this is why statisticians give special importance to the normal distribution.
To be precise, the CLT is stated in this manner:
let be a sequence of i.i.d. random variables with and . If and , we have in distribution as . i.e.
Lemma: Levy’s continuity theorem states that convergence in distribution is equivalent to point-wise convergence of the corresponding characteristic function.
In order to use Levy’s continuity theorem, we must use the following estimates on Taylor expansions of exponential functions:
Now we may proceed with the proof:
i) let be the characteristic function of the common distribution of the . Then for every , the characteristic function of is given by
ii) Consequently, our task is to prove that ,
iii) We begin our estimation by noting that since and
iv) Now, we may use the triangle inequality to show that
v) by our first estimate, letting , we see that which approaches as .
vi) for the first term we note that
For any and positive integer , let . Then
Now, given we first choose so and for this we
choose so that if we have
The proof then follows from the bounded convergence theorem. But, it’s important to note that:
1) You can never collect an infinite amount of data.
2) A lot of data isn’t generated by stationary processes and so the i.i.d. assumption doesn’t necessarily hold. You can check this discussion for more details.
*this proof can be easily generalized to i.i.d. random variables with finite mean and finite variance if you simply normalize the variables