Chebyshev’s theorem via Occam’s razor

An information-theoretic derivation of Chebyshev’s theorem(1852), an important precursor of the Prime Number Theorem.

Aidan Rocke https://github.com/AidanRocke
01-09-2022

The aim of this article is to derive Chebyshev’s theorem(1852) using an adaptation of Kontoyiannis’ original arguments [3].

For an integer sampled uniformly from the interval \([1,N]\) we may define its random prime factorisation in terms of the random variables \(X_p\):

\[\begin{equation} \forall Z \sim U([1,N]), Z = \prod_{p \leq N} p^{X_p} \tag{1} \end{equation}\]

While the description length of \(Z\) is not computable, we may calculate its Expected Kolmogorov Complexity using the relation [4]:

\[\begin{equation} \mathbb{E}[K_U(Z)] = H(Z) + \mathcal{O}(1) \tag{2} \end{equation}\]

In our case, the asymptotic formula for \(\mathbb{E}[K_U(Z)]\) may be explicitly calculated using the fact that almost all integers are algorithmically random:

\[\begin{equation} \mathbb{E}[K_U(Z)] \sim \mathbb{E}[\log_2 Z] = \frac{1}{N} \sum_{k=1}^N \log_2 k = \frac{\log_2 N!}{N} \sim \log_2 N \tag{3} \end{equation}\]

Hence, the Shannon Entropy of (1) simplifies to:

\[\begin{equation} \log_2 N \sim \sum_{p \leq N} \mathbb{E}[X_p] \cdot \log_2 p \tag{4} \end{equation}\]

where we may calculate \(\mathbb{E}[X_p]\) using what we know about the Algorithmic Probability of a prime number:

\[\begin{equation} P(Z \bmod p = 0) \sim \frac{1}{p} \tag{5} \end{equation}\]

Using the Universal Distribution (5), we find:

\[\begin{equation} \mathbb{E}[X_p] = \sum_{k \geq 1} P(X_p \geq k) = \sum_{k \geq 1} P(Z \bmod p^k = 0) \sim \sum_{k \geq 1} \frac{1}{p^k} \tag{6} \end{equation}\]

which simplifies to:

\[\begin{equation} \mathbb{E}[X_p] \sim \frac{1}{p} \tag{7} \end{equation}\]

Thus, we may deduce Chebyshev’s remarkable theorem in base-2:

\[\begin{equation} \sum_{p \leq N} \frac{1}{p} \cdot \log_2 p \sim \log_2 N \tag{8} \end{equation}\]

What does this theorem mean? Given the Algorithmic Probability \(m(X)\) of an event \(X\), we may consider (8) in light of Levin’s Coding theorem [5]:

\[\begin{equation} -\log_2 m(X) = K_U(X) + \mathcal{O}(1) \implies -\log_2 m(X)^{m(X)} \sim m(X) \cdot K_U(X) \tag{9} \end{equation}\]

and the information-theoretic independence of distinct primes:

\[\begin{equation} \forall p,q \in \mathbb{P}, P(Z \bmod p = 0 \land Z \bmod q = 0) \sim \frac{1}{p} \cdot \frac{1}{q} \tag{10} \end{equation}\]

so Chebyshev’s theorem tells us that the Algorithmic Probability that we observe a prime number in the interval \([1,N]\) is on the order of \(\sim 2^{-\log_2 N} = \frac{1}{N}\) as \(N \to \infty\).

References:

  1. P.L. Chebychev. Mémoire sur les nombres premiers. J. de Math. Pures Appl., 17:366–390, 1852.

  2. P.L. Chebychev. Sur la totalité des nombres premiers inférieurs à une limite donnée. J. de Math. Pures Appl., 17:341–365, 1852.

  3. I. Kontoyiannis. “Some information-theoretic computations related to the distribution of prime numbers.” In Festschrift in Honor of Jorma Rissanen, (P. Grunwald, P. Myllymaki, I. Tabus, M. Weinberger, B. Yu, eds.), pp. 135-143, Tampere University Press, May 2008.

  4. Peter Grünwald and Paul Vitányi. Shannon Information and Kolmogorov Complexity. 2010.

  5. L.A. Levin. Laws of information conservation (non-growth) and aspects of the foundation of probability theory. Problems Information Transmission, 10(3):206-210, 1974.

Citation

For attribution, please cite this work as

Rocke (2022, Jan. 9). Kepler Lounge: Chebyshev's theorem via Occam's razor. Retrieved from keplerlounge.com

BibTeX citation

@misc{rocke2022chebyshev's,
  author = {Rocke, Aidan},
  title = {Kepler Lounge: Chebyshev's theorem via Occam's razor},
  url = {keplerlounge.com},
  year = {2022}
}