Information-theoretic interpretations of the Prime Number Theorem:

From an information-theoretic perspective, the Prime Number theorem states that:

\(1.\) \(2^{\log_2 N}\) messages may be transmitted using \(\pi(N)\) prime numbers where each prime number, on average, encodes \(\ln(N)\) bits of information:

\begin{equation} 2^{\log_2 N} \sim \pi(N) \cdot \ln(N) \end{equation}

\(2.\) Each prime that is observed in the interval \([1,N]\) is essentially a surprise, sampled uniformly from \([1,N]\), with respect to any finite-state gambler:

\begin{equation} \mathbb{E}[K(X_N)] \sim \pi(N) \cdot \ln(N) \sim N \end{equation}

where \(X_N = \{x_i\}_{i=1}^N\) denotes the prime encoding of \([1,N]\) i.e. \(x_n = 1\) if \(n\) is prime and \(x_n = 0\) otherwise.

Given the Prime Number theorem, we may readily deduce an approximation of Mertens’ theorem:

\begin{equation} \sum_{n=1}^{\pi(N)} \frac{\ln p_n}{p_n} \sim \ln N + \mathcal{O}(\ln \ln N) \end{equation}

which follows immediately from the fact that \(\pi(p_n) = n \sim \frac{p_n}{\ln p_n}\):

\begin{equation} \sum_{n=1}^{\pi(N)} \frac{\ln p_n}{p_n} \approx \sum_{n=1}^{\pi(N)} \frac{1}{n} \sim \ln N + \mathcal{O}(\ln \ln N) \end{equation}

However, Mertens’ theorem is a bit more than a mere corollary of the Prime Number Theorem. It is a profound observation that raises important questions.

Information-theoretic corollaries of Mertens’ theorem:

Most mathematicians know Mertens’ first theorem in its familiar form:

\begin{equation} \sum_{n=1}^{\pi(N)} \frac{\ln p_n}{p_n} = \ln N + \mathcal{O}(1) \end{equation}

but it recently occurred to me that Mertens’ theorem has an information-theoretic corollary:

\begin{equation} X_i \sim U([1,p_i]) \implies \sum_{i=1}^{\pi(N)} P(X_i = p_i) \cdot H(X_i) = \ln N \end{equation}

so the prime numbers behave as if they were sampled uniformly without replacement from compact subsets of \(\mathbb{N}\) and the weighted average of \(\pi(N)\) primes encodes \(\ln N\) bits of information.

However, with respect to a finite-state gambler that has no a priori knowledge of locations of the primes in \([1,N]\) nor their relative magnitudes it is accurate to say that each prime is sampled uniformly from \([1,N]\) using the frequency distribution \(f(X=p) = \frac{1}{p}\) so we have:

\begin{equation} \sum_{1 \leq p \leq N} f(X=p) = \sum_{p=1}^N \frac{1}{p} \approx \ln N \end{equation}

which is an un-normalised Zipfian distribution. This relation shall be investigated in more detail in the future.


  1. Mark B. Villarino. Mertens’ Proof of Mertens’ Theorem. 2005.

  2. Sondow, Jonathan and Weisstein, Eric W. “Mertens Theorem.” From MathWorld–A Wolfram Web Resource.