Cauchy Distribution

In my previous post, I gave a demonstration of the Central Limit Theorem
for any data having finite mean and finite variance. And as the Cauchy
distribution will show, this last requirement can’t be relaxed.

Definition: a random variable X is Cauchy distributed if the density of X is given by F(x) = \frac{1}{\pi (1+x^2)}

It’s trivial to show that the expectation and variance for a Cauchy distributed
random variable are both infinite so we may now proceed to make the following

If \{X_{n}\}  is a sequence of independent Cauchy distributed random variables,
then Y_{n} = \frac{1}{n}\sum_{i=1}^{n} X_{i} has a Cauchy distribution.

Lemma: If X is Cauchy distributed, then \varphi_{X}(t) = e^{-|t|}

We want to compute the characteristic function of Y_{n} and compare it to the characteristic function of a Cauchy distributed random variable…

\varphi_{Y_{n}}(t) = \prod_{n=1}^n\varphi_{\frac{X_{i}}{n}}(t) = \prod_{n=1}^n\varphi_{X_{i}}(t/n)=(\varphi_{X_{1}}(t/n))^n=(e^{-\frac{|t|}{n}})^n=e^{-|t|}


The Central Limit Theorem

The Central Limit Theorem says that for any data(normal or otherwise), the distribution of the sample means has an approximately normal distribution. In the history of statistics, this has simplified the development of sampling methods, statistical tests, and statistical algorithms. And this is why statisticians give special importance to the normal distribution.

To be precise, the CLT is stated in this manner:
let \{X_{n}\} be a sequence of i.i.d. random variables with \mu = 0 and \sigma^{2} = 1. If Z \sim N(0,1) and S_{n} = \sum_{i=1}^{n} X_{i} , we have S_{n}/\sqrt{n} \longrightarrow Z in distribution as n \longrightarrow \infty. i.e. \forall x \in \Re, \lim_{n \to \infty}P(S_{n}/\sqrt{n} \leq x) = \frac{1}{\sqrt{2 \pi}} \int^x_{-\infty} e^{-\frac{u^2}{2}}\,du

Lemma: Levy’s continuity theorem states that convergence in distribution is equivalent to point-wise convergence of the corresponding characteristic function.

In order to use Levy’s continuity theorem, we must use the following estimates on Taylor expansions of exponential functions:
a) u \geq 0, 0 \leq e^{-u} -1 +u \leq u^{2}/2
b) \forall t \in \Re, |e^{it} -1-it| \leq |t|^{2}/2
c) \forall t \in \Re, |e^{it} -1-it-(it)^{2}/2| \leq |t|^{3}/6

Now we may proceed with the proof:

i) let F be the characteristic function of the common distribution of the \{X_{n}\}. Then for every t \in \Re, the characteristic function of S_{n}/\sqrt{n} is given by E(e^{itS_{n}/\sqrt{n}}) = [F(t/\sqrt{n})]^{n}

ii) Consequently, our task is to prove that \forall t \in \Re, \lim_{n \to \infty}[F(t/\sqrt{n})]^{n} = e^{-t^{2}/2}

iii) We begin our estimation by noting that |[F(t/\sqrt{n})]^{n}-e^{-t^{2}/2}| \leq n |F(t/\sqrt{n})-e^{-t^{2}/2n}| since |F(t/\sqrt{n})| \leq 1 and |e^{-t^{2}/2n}| \leq 1

iv) Now, we may use the triangle inequality to show that |[F(t/\sqrt{n})]^{n}-e^{-t^{2}/2n}| \leq n |F(t/\sqrt{n})-(1-t^{2}/2n)| + n |(1-t^{2}/2n)-e^{-t^{2}/2n}|

v) by our first estimate, letting u = t^{2}/2n \geq 0, we see that n |(1-t^{2}/2n)-e^{-t^{2}/2n}| \leq \frac{n(t^{2}/2n)}{2} = t^{4}/8n which approaches 0 as n \longrightarrow \infty.

vi) for the first term we note that n |F(t/\sqrt{n})-(1-t^{2}/2n)|=n |E[e^{itX}-(1+\frac{itX}{\sqrt{n}}+i^{2}t^{2}X^{2}/2n)]| \leq n E[|e^{itX}-(1+\frac{itX}{\sqrt{n}}+i^{2}t^{2}X^{2}/2n)|]

For any \delta > 0  and positive integer n, let A=A(\delta,n) = \{|X|>\delta\sqrt{n}\}. Then |e^{itX}-(1+\frac{itX}{\sqrt{n}}+i^{2}t^{2}X^{2}/2n)| \leq (\frac{t^{2}X^{2}}{n}) I_{A} + (\frac{1}{6}\frac{|tX|^{3}}{n^{3/2}}) I_{A^{c}}


n E[|e^{itX}-(1+\frac{itX}{\sqrt{n}}+i^{2}t^{2}X^{2}/2n)|] \leq n E[(\frac{t^{2}X^{2}}{n}) I_{A}] + n E[\frac{1}{6}\frac{|tX|^{3}}{n^{3/2}} I_{A^{c}}] \leq t^{2} E[X^{2} I_{A}] + \delta E[|t|^{3} |X|^{2}] =t^{2} E[X^{2} I_{A}] + \frac{\delta |t|^{3}}{6}

Now, given \varepsilon > 0 we first choose \delta > 0 so \frac{|t|^{3} \delta}{6} \leq \frac{\varepsilon}{2} and for this \delta we
choose n so that if N \geq n we have

t^{2}E[X^{2} I_{A}] \leq \frac{\varepsilon}{2}

The proof then follows from the bounded convergence theorem. But, it’s important to note that:
1) You can never collect an infinite amount of data.
2) A lot of data isn’t generated by stationary processes and so the i.i.d. assumption doesn’t necessarily hold. You can check this discussion for more details.

*this proof can be easily generalized to i.i.d. random variables with finite mean and finite variance if you simply normalize the variables