Information-theoretic interpretations of the Prime Number Theorem:

From an information-theoretic perspective, the Prime Number theorem states that:

\(1.\) \(2^{\log_2 N}\) messages may be transmitted using \(\pi(N)\) prime numbers where each prime number, on average, encodes \(\ln(N)\) bits of information:

\begin{equation} 2^{\log_2 N} \sim \pi(N) \cdot \ln(N) \end{equation}

\(2.\) Each prime that is observed in the interval \([1,N]\) is essentially a surprise, sampled uniformly from \([1,N]\), with respect to any finite-state gambler:

\begin{equation} \mathbb{E}[K(X_N)] \sim \pi(N) \cdot \ln(N) \sim N \end{equation}

where \(X_N = \{x_i\}_{i=1}^N\) denotes the prime encoding of \([1,N]\) i.e. \(x_n = 1\) if \(n\) is prime and \(x_n = 0\) otherwise.

Given the Prime Number theorem, we may readily deduce an approximation of Mertens’ theorem:

\begin{equation} \sum_{n=1}^{\pi(N)} \frac{\ln p_n}{p_n} \sim \ln N + \mathcal{O}(\ln \ln N) \end{equation}

which follows immediately from the fact that \(\pi(p_n) = n \sim \frac{p_n}{\ln p_n}\):

\begin{equation} \sum_{n=1}^{\pi(N)} \frac{\ln p_n}{p_n} \approx \sum_{n=1}^{\pi(N)} \frac{1}{n} \sim \ln N + \mathcal{O}(\ln \ln N) \end{equation}

However, Mertens’ theorem is a bit more than a mere corollary of the Prime Number Theorem. It is a profound observation that raises important questions.

Information-theoretic corollaries of Mertens’ theorem:

Most mathematicians know Mertens’ first theorem in its familiar form:

\begin{equation} \sum_{n=1}^{\pi(N)} \frac{\ln p_n}{p_n} = \ln N + \mathcal{O}(1) \end{equation}

but it recently occurred to me that Mertens’ theorem has an information-theoretic corollary:

\begin{equation} X_i \sim U([1,p_i]) \implies \sum_{i=1}^{\pi(N)} P(X_i = p_i) \cdot H(X_i) = \ln N \end{equation}

so the prime numbers behave as if they were sampled uniformly without replacement from compact subsets of \(\mathbb{N}\) and the weighted average of \(\pi(N)\) primes encodes \(\ln N\) bits of information.

However, with respect to a finite-state gambler that has no a priori knowledge of locations of the primes in \([1,N]\) nor their relative magnitudes it is accurate to say that each prime is sampled uniformly from \([1,N]\) using the frequency distribution \(f(X=p) = \frac{1}{p}\) so we have:

\begin{equation} \sum_{1 \leq p \leq N} f(X=p) = \sum_{p=1}^N \frac{1}{p} \approx \ln N \end{equation}

which is an un-normalised Zipfian distribution. This relation shall be investigated in more detail in the future.

References:

  1. Mark B. Villarino. Mertens’ Proof of Mertens’ Theorem. 2005.

  2. Sondow, Jonathan and Weisstein, Eric W. “Mertens Theorem.” From MathWorld–A Wolfram Web Resource. https://mathworld.wolfram.com/MertensTheorem.html