Information-theoretic interpretations of the Prime Number Theorem:

From an information-theoretic perspective, the Prime Number theorem states that:

$$1.$$ $$2^{\log_2 N}$$ messages may be transmitted using $$\pi(N)$$ prime numbers where each prime number, on average, encodes $$\ln(N)$$ bits of information:

$$2^{\log_2 N} \sim \pi(N) \cdot \ln(N)$$

$$2.$$ Each prime that is observed in the interval $$[1,N]$$ is essentially a surprise, sampled uniformly from $$[1,N]$$, with respect to any finite-state gambler:

$$\mathbb{E}[K(X_N)] \sim \pi(N) \cdot \ln(N) \sim N$$

where $$X_N = \{x_i\}_{i=1}^N$$ denotes the prime encoding of $$[1,N]$$ i.e. $$x_n = 1$$ if $$n$$ is prime and $$x_n = 0$$ otherwise.

Given the Prime Number theorem, we may readily deduce an approximation of Mertens’ theorem:

$$\sum_{n=1}^{\pi(N)} \frac{\ln p_n}{p_n} \sim \ln N + \mathcal{O}(\ln \ln N)$$

which follows immediately from the fact that $$\pi(p_n) = n \sim \frac{p_n}{\ln p_n}$$:

$$\sum_{n=1}^{\pi(N)} \frac{\ln p_n}{p_n} \approx \sum_{n=1}^{\pi(N)} \frac{1}{n} \sim \ln N + \mathcal{O}(\ln \ln N)$$

However, Mertens’ theorem is a bit more than a mere corollary of the Prime Number Theorem. It is a profound observation that raises important questions.

Information-theoretic corollaries of Mertens’ theorem:

Most mathematicians know Mertens’ first theorem in its familiar form:

$$\sum_{n=1}^{\pi(N)} \frac{\ln p_n}{p_n} = \ln N + \mathcal{O}(1)$$

but it recently occurred to me that Mertens’ theorem has an information-theoretic corollary:

$$X_i \sim U([1,p_i]) \implies \sum_{i=1}^{\pi(N)} P(X_i = p_i) \cdot H(X_i) = \ln N$$

so the prime numbers behave as if they were sampled uniformly without replacement from compact subsets of $$\mathbb{N}$$ and the weighted average of $$\pi(N)$$ primes encodes $$\ln N$$ bits of information.

However, with respect to a finite-state gambler that has no a priori knowledge of locations of the primes in $$[1,N]$$ nor their relative magnitudes it is accurate to say that each prime is sampled uniformly from $$[1,N]$$ using the frequency distribution $$f(X=p) = \frac{1}{p}$$ so we have:

$$\sum_{1 \leq p \leq N} f(X=p) = \sum_{p=1}^N \frac{1}{p} \approx \ln N$$

which is an un-normalised Zipfian distribution. This relation shall be investigated in more detail in the future.

References:

1. Mark B. Villarino. Mertens’ Proof of Mertens’ Theorem. 2005.

2. Sondow, Jonathan and Weisstein, Eric W. “Mertens Theorem.” From MathWorld–A Wolfram Web Resource. https://mathworld.wolfram.com/MertensTheorem.html