Processing math: 100%

An information-theoretic derivation of the Prime Number Theorem

An information-theoretic derivation of the Prime Number Theorem, using the Law of Conservation of Information.

Author

Affiliation

Aidan Rocke

 

Published

Jan. 11, 2022

Citation

Rocke, 2022

The only implicit assumption in this derivation is that all the information in the Universe is conserved as it is only in such Universes that Occam’s razor and its corollary, Maximum Entropy inference, are generally applicable.

The Maximum Entropy assumption:

If we know nothing about the distribution of primes, in the worst case we may assume that each prime less than or equal to N is drawn uniformly from [1,N]. So our source of primes is:

XU([1,N])

where H(X)=lnN is the Shannon entropy of the uniform distribution.

Average number of bits to encode a prime number:

Now, we may define the prime encoding of [1,N] as the binary sequence XN={xn}Nn=1 where xn=1 if n is prime and xn=0 otherwise. With no prior knowledge, given that each integer is either prime or not prime, we have 2N possible prime encodings in [1,N]N.

If there are π(N) primes less than or equal to N then the average number of bits per arrangement gives us the average amount of information gained from correctly identifying each prime in [1,N] as:

Sc=log2(2N)π(N)=Nπ(N)

and if we assume a maximum entropy distribution over the primes then we would expect that each prime is drawn from a uniform distribution. So we would have:

Sc=Nπ(N)lnN

so the average information gained from observing a prime number coincides with average distance between consecutive prime numbers.

Prime Encodings as the outcome of Bernoulli trials:

In fact, given the assumptions the expected information gained from observing each prime in [1,N] is on the order of:

N1k=11k|(k,k+1]|=N1k=11klnN

as there are k distinct ways to sample uniformly from [1,k] and a frequency of 1k associated with the event that kP. So we are effectively modelling each xkXN as a Bernoulli trial xkBer(1k).

This implies that the average number of bits per prime number is given by Nπ(N)lnN. Rearranging, we find:

π(N)N1lnN

which is in complete agreement with the Prime Number Theorem.

The Shannon source coding theorem and the algorithmic randomness of prime encodings:

By the Shannon source coding theorem, we may infer that π(N) primes can’t be compressed into fewer than π(N)lnN bits. Furthermore, as the expected Kolmogorov Complexity converges to the Shannon entropy for computable probability distributions:

E[K(XN)]π(N)lnNN

where the identification of E[K(XN)] with π(N)lnN is a direct consequence of the fact that K(XN) measures the information gained from observing XN and π(N)lnN measures the expected information gained from observing XN.

From a complementary perspective, the Asymptotic Equipartition Theorem, we may infer that the Algorithmic Probability of a prime number of unknown magnitude occurring in the interval [1,N] is on the order of 1N.

The Asymptotic Equipartition Theorem:

Given that prime encodings may be modeled as a sequence of Bernoulli trials, we may use the Asymptotic Equipartition Property to infer the Typical Probability of observing a prime ^xp[1,N]P of unknown magnitude. To be precise, given the prime encoding XN={xk}Nk=1:

lnP(^xp)lnP(x1,...,xN)N

for large N.

Given (4), this expression simplifies to:

lnP(x1,...,xN)N=lnNk=1P(xk)N=lnNk=11kN=lnN!NlnN

Hence, the Algorithmic Probability of a prime number of unknown magnitude in the interval [1,N] converges to:

lnP(^xp)lnNP(^xp)1N

It follows that the probability of observing all π(N) primes at {^xpi}π(N)i=1 where ^xpiU([1,N]) is given by:

P(ˆxp1,...,ˆxpπ(N))(1N)π(N)=eπ(N)lnN=eN+o(1)

Moreover, as each prime pN occurs with typical frequency 1p we may deduce that:

P(ˆxp1,...,ˆxpπ(N))π(N)n=11pn=eN+o(1)

As an immediate corollary of (10), we may recover the expected information gained using a second application of the Asymptotic Equipartition Property:

lnP(ˆxp1,...,ˆxpπ(N))ln(1N)π(N)π(N)lnNN

which provides us with an important insight into the incompressibility of prime encodings, as the last formula implies that prime numbers of unknown magnitude behave like independent random variables drawn from a uniform distribution.

References:

  1. Aidan Rocke (https://mathoverflow.net/users/56328/aidan-rocke), information-theoretic derivation of the prime number theorem, URL (version: 2021-04-08): https://mathoverflow.net/q/384109

  2. Patrick Billingsley. Prime Numbers and Brownian Motion. The American Mathematical Monthly. 1973.

  3. John A. Wheeler, 1990, “Information, physics, quantum: The search for links” in W. Zurek (ed.) Complexity, Entropy, and the Physics of Information. Redwood City, CA: Addison-Wesley.

  4. Andrew Chi-Chih Yao. Theory and applications of trapdoor functions. In Proceedings of the 23rd IEEE Symposium on Foundations of Computer Science, 1982.

  5. Marcus Hutter et al. (2007) Algorithmic probability. Scholarpedia, 2(8):2572.

  6. Rocke (2022, Jan. 3). Kepler Lounge: Occam’s razor. Retrieved from keplerlounge.com

  7. Rocke (2022, Jan. 7). Kepler Lounge: The Algorithmic Probability of a Prime Number. Retrieved from keplerlounge.com

  8. Peter Grünwald and Paul Vitányi. Shannon Information and Kolmogorov Complexity. 2010.

Footnotes

    Citation

    For attribution, please cite this work as

    Rocke (2022, Jan. 12). Kepler Lounge: An information-theoretic derivation of the Prime Number Theorem. Retrieved from keplerlounge.com

    BibTeX citation

    @misc{rocke2022an,
      author = {Rocke, Aidan},
      title = {Kepler Lounge: An information-theoretic derivation of the Prime Number Theorem},
      url = {keplerlounge.com},
      year = {2022}
    }