An information-theoretic derivation of the Prime Number Theorem, using the Law of Conservation of Information.
The only implicit assumption in this derivation is that all the information in the Universe is conserved as it is only in such Universes that Occam’s razor and its corollary, Maximum Entropy inference, are generally applicable.
If we know nothing about the distribution of primes, in the worst case we may assume that each prime less than or equal to N is drawn uniformly from [1,N]. So our source of primes is:
X∼U([1,N])
where H(X)=lnN is the Shannon entropy of the uniform distribution.
Now, we may define the prime encoding of [1,N] as the binary sequence XN={xn}Nn=1 where xn=1 if n is prime and xn=0 otherwise. With no prior knowledge, given that each integer is either prime or not prime, we have 2N possible prime encodings in [1,N]⊂N.
If there are π(N) primes less than or equal to N then the average number of bits per arrangement gives us the average amount of information gained from correctly identifying each prime in [1,N] as:
Sc=log2(2N)π(N)=Nπ(N)
and if we assume a maximum entropy distribution over the primes then we would expect that each prime is drawn from a uniform distribution. So we would have:
Sc=Nπ(N)∼lnN
so the average information gained from observing a prime number coincides with average distance between consecutive prime numbers.
In fact, given the assumptions the expected information gained from observing each prime in [1,N] is on the order of:
N−1∑k=11k⋅|(k,k+1]|=N−1∑k=11k≈lnN
as there are k distinct ways to sample uniformly from [1,k] and a frequency of 1k associated with the event that k∈P. So we are effectively modelling each xk∈XN as a Bernoulli trial xk∼Ber(1k).
This implies that the average number of bits per prime number is given by Nπ(N)∼lnN. Rearranging, we find:
π(N)N∼1lnN
which is in complete agreement with the Prime Number Theorem.
By the Shannon source coding theorem, we may infer that π(N) primes can’t be compressed into fewer than π(N)⋅lnN bits. Furthermore, as the expected Kolmogorov Complexity converges to the Shannon entropy for computable probability distributions:
E[K(XN)]∼π(N)⋅lnN∼N
where the identification of E[K(XN)] with π(N)⋅lnN is a direct consequence of the fact that K(XN) measures the information gained from observing XN and π(N)⋅lnN measures the expected information gained from observing XN.
From a complementary perspective, the Asymptotic Equipartition Theorem, we may infer that the Algorithmic Probability of a prime number of unknown magnitude occurring in the interval [1,N] is on the order of ∼1N.
Given that prime encodings may be modeled as a sequence of Bernoulli trials, we may use the Asymptotic Equipartition Property to infer the Typical Probability of observing a prime ^xp∈[1,N]∩P of unknown magnitude. To be precise, given the prime encoding XN={xk}Nk=1:
−lnP(^xp)∼−lnP(x1,...,xN)N
for large N.
Given (4), this expression simplifies to:
−lnP(x1,...,xN)N=−ln∏Nk=1P(xk)N=−ln∏Nk=11kN=lnN!N∼lnN
Hence, the Algorithmic Probability of a prime number of unknown magnitude in the interval [1,N] converges to:
−lnP(^xp)∼lnN⟹P(^xp)∼1N
It follows that the probability of observing all π(N) primes at {^xpi}π(N)i=1 where ^xpi∼U([1,N]) is given by:
P(ˆxp1,...,ˆxpπ(N))∼(1N)π(N)=e−π(N)⋅lnN=e−N+o(1)
Moreover, as each prime p∈N occurs with typical frequency 1p we may deduce that:
P(ˆxp1,...,ˆxpπ(N))∼π(N)∏n=11pn=e−N+o(1)
As an immediate corollary of (10), we may recover the expected information gained using a second application of the Asymptotic Equipartition Property:
−lnP(ˆxp1,...,ˆxpπ(N))∼−ln(1N)π(N)∼π(N)⋅lnN∼N
which provides us with an important insight into the incompressibility of prime encodings, as the last formula implies that prime numbers of unknown magnitude behave like independent random variables drawn from a uniform distribution.
Aidan Rocke (https://mathoverflow.net/users/56328/aidan-rocke), information-theoretic derivation of the prime number theorem, URL (version: 2021-04-08): https://mathoverflow.net/q/384109
Patrick Billingsley. Prime Numbers and Brownian Motion. The American Mathematical Monthly. 1973.
John A. Wheeler, 1990, “Information, physics, quantum: The search for links” in W. Zurek (ed.) Complexity, Entropy, and the Physics of Information. Redwood City, CA: Addison-Wesley.
Andrew Chi-Chih Yao. Theory and applications of trapdoor functions. In Proceedings of the 23rd IEEE Symposium on Foundations of Computer Science, 1982.
Marcus Hutter et al. (2007) Algorithmic probability. Scholarpedia, 2(8):2572.
Rocke (2022, Jan. 3). Kepler Lounge: Occam’s razor. Retrieved from keplerlounge.com
Rocke (2022, Jan. 7). Kepler Lounge: The Algorithmic Probability of a Prime Number. Retrieved from keplerlounge.com
Peter Grünwald and Paul Vitányi. Shannon Information and Kolmogorov Complexity. 2010.
For attribution, please cite this work as
Rocke (2022, Jan. 12). Kepler Lounge: An information-theoretic derivation of the Prime Number Theorem. Retrieved from keplerlounge.com
BibTeX citation
@misc{rocke2022an, author = {Rocke, Aidan}, title = {Kepler Lounge: An information-theoretic derivation of the Prime Number Theorem}, url = {keplerlounge.com}, year = {2022} }