Kepler Lounge: An information-theoretic proof of the Erdős-Kac theorem

Aidan Rocke

Introduction:

From an information-theoretic perspective, in order to analyse the normal order of prime divisors of a typical integer we must first carefully define the Algorithmic Probability that a prime number is observed, either through multiplication or division.

Algorithmic Probability as the Universal A Priori Probability:

The notion of Algorithmic Probability allows us to define a for two reasons. First, although Kolmogorov Complexity is not computable for any data structure \(X\) its Minimum Description Length is independent of the choice of description language \(U\) due to the Invariance theorem [4]:

\[\begin{equation} \lvert K_U(X)-K_{U'}(X)\rvert \leq \text{Cst} \tag{1} \end{equation}\]

and so in a precise sense, asymptotic Kolmogorov Complexity results are Universal. Second, Levin’s Coding theorem, asserts that [3]:

\[\begin{equation} -\log_2 m(X) = K_U(X) + \mathcal{O}(1) \tag{2} \end{equation}\]

where \(m(X)\) is the of \(X\).

It is worth adding that if the Observable Universe is simulated by a Universal Turing Machine \(U\), relative to this UTM no virtual machine is required to translate between universal description languages so we may ignore the constant term:

\[\begin{equation} -\log_2 m(X) = K_U(X) \tag{3} \end{equation}\]

which we shall use in place of Levin’s theorem for the remainder of this analysis due to the physical nature of information.

The Algorithmic Probability of a prime number:

From a frequentist perspective, the typical probability that we observe a prime \(p \in \mathbb{P}\) may be defined as follows:

\[\begin{equation} \lim_{N \to \infty} \forall X \sim U([1,N]), P(X \bmod p = 0) = \lim_{N \to \infty} \frac{1}{N} \cdot \lfloor \frac{N}{p} \rfloor = \frac{1}{p} \tag{4} \end{equation}\]

In order to define the entropy associated with this discrete probability distribution we may introduce the random variable \(x_p \in \{0,1\}\) defined as follows:

\[\begin{equation} \lim_{N \to \infty} \forall X \sim U([1,N]), X \bmod p = 0 \implies x_p = 1 \tag{5} \end{equation}\]

and \(x_p = 0\) otherwise.

In light of (3), we may define the algorithmic information gained from observing a prime \(p \in \mathbb{P}\):

\[\begin{equation} K_U(x_p) = -\log_2 \frac{1}{p} = \log_2 p \tag{6} \end{equation}\]

so the prime numbers are algorithmically random. Moreover, Levin’s Coding theorem implies:

\[\begin{equation} -\log_2 m(x)^{m(x)} \sim m(x) \cdot K_U(x) \tag{7} \end{equation}\]

Thus, from an information-theoretic perspective, Chebyshev’s theorem(1852):

\[\begin{equation} H(x_{p_1},....,x_{p_{\pi(N)}}) = \sum_{p \leq N} H(x_p) \sim \sum_{p \leq N} \frac{1}{p} \cdot \log_2 p \sim \log_2 N \tag{8} \end{equation}\]

may be understood as the Expected Algorithmic Information gained from observing a prime number of unknown magnitude in the interval \([1,N] \subset \mathbb{N}\).

As a direct consequence of the formula for the joint entropy (8), we may deduce that:

\[\begin{equation} P(x_p \land x_q) = P(x_p) \cdot P(x_q) = 2^{-K_U(x_p)} \cdot 2^{-K_U(x_q)} = \frac{1}{p} \cdot \frac{1}{q} \tag{9} \end{equation}\]

as the algorithmic randomness of prime numbers (6), guarantees that \(x_p\) and \(x_q\) are independent events.

The Expected number of Unique Prime Divisors:

For any integer \(X \sim U([1,N])\), we may define its number of Unique Prime Divisors \(w(X) = \sum_{p \leq N} X_p\) where \(X_p = 1\) if \(X \bmod p = 0\) and \(X_p = 0\) otherwise. Thus, we may calculate the Expectation:

\[\begin{equation} \forall X \sim U([1,N]), \mathbb{E}[w(X)] = \sum_{p \leq N} 1 \cdot P(x_p) + 0 \cdot (1-P(x_p)) \sim \sum_{p \leq N} \frac{1}{p} \sim \ln \ln N \tag{10} \end{equation}\]

where we used Mertens’ Second theorem \(\sum_{p \leq N} \frac{1}{p} \sim \ln \ln N\).

The standard deviation of \(\omega(X)\):

Given that the prime numbers are algorithmically random, the random variables \(X_p\) are independent so the variance \(w(X)\) is linear in \(X_p\):

\[\begin{equation} \forall X \sim U([1,N]), \text{Var}[\omega(X)] = \sum_{p \leq N} \text{Var}[X_p]= \sum_{p \leq N} \mathbb{E}[X_p^2] - \mathbb{E}[X_p]^2 \sim \sum_{p \leq N} \frac{1}{p} - \frac{1}{p^2} \sim \ln \ln N \tag{11} \end{equation}\]

since \(\sum_{p \leq N} \frac{1}{p^2} \leq \frac{\pi^2}{6}\).

The Erdős-Kac theorem:

In order to prove the Erdős-Kac theorem, it remains to show that \(\omega(X) = \sum_{p \leq N} X_p\) satisfies the Lindeberg condition for the Central Limit Theorem:

\[\begin{equation} \Lambda_N(\epsilon) = \sum_{p \leq N} \Big\langle \Big(\frac{X_p}{\sqrt{\text{Var}[\omega(X)]}}\Big)^2: \Big\lvert \frac{X_p}{\sqrt{\text{Var}[\omega(X)]}} \Big\rvert \geq \epsilon \Big\rangle \tag{12} \end{equation}\]

\[\begin{equation} \forall \epsilon > 0, \lim_{N \to \infty} \Lambda_N(\epsilon) = 0 \tag{13} \end{equation}\]

where \(\langle \alpha:\beta \rangle\) denotes the expectation value of \(\alpha\) restricted to outcomes \(\beta\).

Given that \(\sigma_{w(X)}^2 \sim \ln \ln N\) and \(X_p \in \{0,1\}\), we find that the Lindeberg condition is satisfied:

\[\begin{equation} \forall \delta \in (0,1) \forall \epsilon \geq \big(\frac{1}{\sigma_{w(X)}}\big)^\delta,\lim_{N \to \infty} \Lambda_N(\epsilon) = 0 \tag{14} \end{equation}\]

Hence, the Central Limit Theorem allows us to deduce that:

\[\begin{equation} \forall X \sim U([1,N]), \frac{\omega(X)- \ln \ln N}{\sqrt{\ln \ln N}} \tag{15} \end{equation}\]

converges to the standard normal distribution \(\mathcal{N}(0,1)\) as \(N \rightarrow \infty\).

Discussion:

This theorem is of great importance for two related reasons. First, it is equivalent to the proposition that the prime numbers are algorithmically random. Second, it is a canonical example of a theorem that is impossible to guess from empirical observations [10]. In fact, it is far from certain that Erdős and Kac would have proved the Erdős-Kac theorem if its precursor, the Hardy-Ramanujan theorem, was not first discovered [1].

More generally, at a time of Big Data and the imminent supremacy of AI, this theorem forces the issue of determining how some scientists were able to formulate correct theories based on zero empirical evidence. While the Erdős-Kac theorem has the form of a statistical observation, the normal order of \(w(X)\) only begins to emerge for \(X \sim U([1,N])\) where \(N \geq 10^{100}\).

Within the current scientific paradigm, non-trivial scientific discoveries of this kind that are provably beyond the scope of scientific induction(and hence machine learning) do not have an adequate explanation.

References:

Hardy, G. H.; Ramanujan, S. (1917), “The normal number of prime factors of a number n”, Quarterly Journal of Mathematics
Marcus Hutter et al. (2007) Algorithmic probability. Scholarpedia, 2(8):2572.
L.A. Levin. Laws of information conservation (non-growth) and aspects of the foundation of probability theory. Problems Information Transmission, 10(3):206-210, 1974.
Kolmogorov A.N. Three approaches to the quantitative definition of information. Problems Inform. Transmission, 1(1):1–7, 1965.
Lindeberg, J. W. “Eine neue Herleitung des Exponential-gesetzes in der Wahrscheinlichkeitsrechnung.” Math. Zeit. 15, 211-235, 1922.
Erdős, Paul; Kac, Mark (1940). “The Gaussian Law of Errors in the Theory of Additive Number Theoretic Functions”. American Journal of Mathematics.
Rényi, A.; Turán, P. (1958). “On a theorem of Erdös-Kac” . Acta Arithmetica.
Peter Grünwald and Paul Vitányi. Shannon Information and Kolmogorov Complexity. 2010
Lance Fortnow. Kolmogorov Complexity. 2000.
BubbleZ (https://mathoverflow.net/users/470546/bubblez), Theorems that are essentially impossible to guess by empirical observation, URL (version: 2021-12-29): https://mathoverflow.net/q/412762
P.L. Chebychev. Mémoire sur les nombres premiers. J. de Math. Pures Appl., 17:366–390, 1852.
P.L. Chebychev. Sur la totalité des nombres premiers inférieurs à une limite donnée. J. de Math. Pures Appl., 17:341–365, 1852.

An information-theoretic proof of the Erdős-Kac theorem