Chebyshev’s theorem via Occam’s razor

An information-theoretic derivation of Chebyshev’s theorem(1852), an important precursor of the Prime Number Theorem.

Aidan Rocke

The aim of this article is to derive Chebyshev’s theorem(1852) using an adaptation of Kontoyiannis’ original arguments [3].

For an integer sampled uniformly from the interval \([1,N]\) we may define its random prime factorisation in terms of the random variables \(X_p\):

\[\begin{equation} \forall Z \sim U([1,N]), Z = \prod_{p \leq N} p^{X_p} \tag{1} \end{equation}\]

While the description length of \(Z\) is not computable, we may calculate its Expected Kolmogorov Complexity using the relation [4]:

\[\begin{equation} \mathbb{E}[K_U(Z)] = H(Z) + \mathcal{O}(1) \tag{2} \end{equation}\]

In our case, the asymptotic formula for \(\mathbb{E}[K_U(Z)]\) may be explicitly calculated using the fact that almost all integers are algorithmically random:

\[\begin{equation} \mathbb{E}[K_U(Z)] \sim \mathbb{E}[\log_2 Z] = \frac{1}{N} \sum_{k=1}^N \log_2 k = \frac{\log_2 N!}{N} \sim \log_2 N \tag{3} \end{equation}\]

Hence, the Shannon Entropy of (1) simplifies to:

\[\begin{equation} \log_2 N \sim \sum_{p \leq N} \mathbb{E}[X_p] \cdot \log_2 p \tag{4} \end{equation}\]

where we may calculate \(\mathbb{E}[X_p]\) using what we know about the Algorithmic Probability of a prime number:

\[\begin{equation} P(Z \bmod p = 0) \sim \frac{1}{p} \tag{5} \end{equation}\]

Using the Universal Distribution (5), we find:

\[\begin{equation} \mathbb{E}[X_p] = \sum_{k \geq 1} P(X_p \geq k) = \sum_{k \geq 1} P(Z \bmod p^k = 0) \sim \sum_{k \geq 1} \frac{1}{p^k} \tag{6} \end{equation}\]

which simplifies to:

\[\begin{equation} \mathbb{E}[X_p] \sim \frac{1}{p} \tag{7} \end{equation}\]

Thus, we may deduce Chebyshev’s remarkable theorem in base-2:

\[\begin{equation} \sum_{p \leq N} \frac{1}{p} \cdot \log_2 p \sim \log_2 N \tag{8} \end{equation}\]

What does this theorem mean? Given the Algorithmic Probability \(m(X)\) of an event \(X\), we may consider (8) in light of Levin’s Coding theorem [5]:

\[\begin{equation} -\log_2 m(X) = K_U(X) + \mathcal{O}(1) \implies -\log_2 m(X)^{m(X)} \sim m(X) \cdot K_U(X) \tag{9} \end{equation}\]

and the information-theoretic independence of distinct primes:

\[\begin{equation} \forall p,q \in \mathbb{P}, P(Z \bmod p = 0 \land Z \bmod q = 0) \sim \frac{1}{p} \cdot \frac{1}{q} \tag{10} \end{equation}\]

so Chebyshev’s theorem tells us that the Algorithmic Probability that we observe a prime number in the interval \([1,N]\) is on the order of \(\sim 2^{-\log_2 N} = \frac{1}{N}\) as \(N \to \infty\).


  1. P.L. Chebychev. Mémoire sur les nombres premiers. J. de Math. Pures Appl., 17:366–390, 1852.

  2. P.L. Chebychev. Sur la totalité des nombres premiers inférieurs à une limite donnée. J. de Math. Pures Appl., 17:341–365, 1852.

  3. I. Kontoyiannis. “Some information-theoretic computations related to the distribution of prime numbers.” In Festschrift in Honor of Jorma Rissanen, (P. Grunwald, P. Myllymaki, I. Tabus, M. Weinberger, B. Yu, eds.), pp. 135-143, Tampere University Press, May 2008.

  4. Peter Grünwald and Paul Vitányi. Shannon Information and Kolmogorov Complexity. 2010.

  5. L.A. Levin. Laws of information conservation (non-growth) and aspects of the foundation of probability theory. Problems Information Transmission, 10(3):206-210, 1974.


For attribution, please cite this work as

Rocke (2022, Jan. 9). Kepler Lounge: Chebyshev's theorem via Occam's razor. Retrieved from

BibTeX citation

  author = {Rocke, Aidan},
  title = {Kepler Lounge: Chebyshev's theorem via Occam's razor},
  url = {},
  year = {2022}