The Erdős-Kac theorem:

The Erdős-Kac theorem states that the number of unique prime factors for large integers is Gaussian. While this theorem is developed from the Hardy-Ramanujan theorem it remains a rather unexpected result.

To be precise, if we sample \(x \sim U([1,n])\) then \(\nu(x)\) is asymptotically normal:

\begin{equation} P\big(\frac{\nu(x)- \log\log n}{\sqrt{\log\log n}} \geq \lambda\big) = \frac{1}{\sqrt{2 \pi}} \int_{\lambda}^{\infty} e^{-\frac{t^2}{2}}dt \end{equation}

for all \(\lambda \in \mathbb{R}\).

In our proof, we’ll use the theorem in probability that if all the moments of a random variable \(X\) are the same as that of a normal distribution, then \(X\) is normally distributed.

Proof:

If we randomly choose \(x \in [1,n]\), for any prime \(p\) let’s define the indicator variable \(X_p\) where \(X_p = 1\) if \(p|x\) and \(X_p=0\) otherwise. So the number of prime divisors of \(x\) less than or equal to \(M\) is approximately:

\begin{equation} X = \sum_{p \leq M} X_p \end{equation}

Now, let’s set \(M = n^{\frac{1}{s(n)}}\) where \(s(n) = \log\log\log n\) so that \(\lim\limits_{n \to \infty}\frac{s(n)}{\sqrt{\log\log n}}=0\). Then we have:

\begin{equation} \nu(x) - s(n) \leq X \leq \nu(x) \end{equation}

Furthermore, \(X_p \sim \text{Ber}(\frac{1}{p})\) so we may define:

\begin{equation} Y_p = \text{Ber}(\frac{1}{p}) \end{equation}

\begin{equation} Y = \sum_{p \leq M} Y_p \end{equation}

where we have:

\begin{equation} \mu = \mathbb{E}[Y] \sim \mathbb{E}[X] \end{equation}

\begin{equation} \sigma^2 = \text{Var}[Y] \sim \text{Var}[X] \end{equation}

We may use \(\mu\) and \(\sigma\) to normalise \(X\) and \(Y\),

\begin{equation} \overline{X} = \frac{X - \mu}{\sigma} \end{equation}

\begin{equation} \overline{Y} = \frac{Y - \mu}{\sigma} \end{equation}

and the remaining obstacle in the proof is to show that all moments converge:

\begin{equation} \forall k, \mathbb{E}[\overline{X}^k] = \mathbb{E}[\overline{Y}^k] \end{equation}

This may be accomplished by considering the factors of \(\mathbb{E}[X^k-Y^k]\) for distinct primes \(p_1, ..., p_r \leq M\) so we have:

\begin{equation} \mathbb{E}[X_{p_1}X_{p_2}…X_{p_r}-Y_{p_1}Y_{p_2}…Y_{p_r}] = \frac{1}{n} \big\lfloor \frac{n}{\prod_{i=1}^r p_i}\big\rfloor - \frac{1}{\prod_{i=1}^r p_i} \leq \mathcal{O}\big(\frac{1}{n}\big) \end{equation}

so there are \(M^k \leq n\) terms in the expansion of \(\overline{X}^k\) and as each term in the expected difference contributes \(\mathcal{O}\big(\frac{1}{n}\big)\) we have:

\begin{equation} \forall k, \mathbb{E}[\overline{X}^k-\overline{Y}^k] \leq \mathcal{O}(1) \end{equation}

Given that all moments converge, we may conclude that \(\overline{X}\) is asymptotically normal. QED.

References:

  1. Hardy, G. H.; Ramanujan, S. (1917), “The normal number of prime factors of a number n”, Quarterly Journal of Mathematics
  2. Erdős, Paul; Kac, Mark (1940). “The Gaussian Law of Errors in the Theory of Additive Number Theoretic Functions”. American Journal of Mathematics.
  3. Rényi, A.; Turán, P. (1958). “On a theorem of Erdös-Kac” . Acta Arithmetica.