Introduction:

Building upon the work of Chebyshev, Shannon and Kontoyiannis, it may be demonstrated that Chebyshev’s asymptotic result:

\begin{equation} \ln N \sim \sum_{p \leq N} \frac{1}{p} \cdot \ln p \end{equation}

has a natural information-theoretic interpretation as the information-content of a typical integer \(Z \sim U([1,N])\), sampled under the maximum entropy distribution, is asymptotically equivalent to the information content of all primes \(p \leq N\) weighted by their typical frequency. This result is established by showing that the correct information-theoretic interpretation of the entropy \(\ln p\) may be deduced from the author’s recent observation that the prime numbers have a uniform source distribution. Moreover, using Chaitin’s theorem that almost all integers are algorithmically random we may extend this asymptotic result to the setting of algorithmic information theory and derive an upper-bound on the algorithmic information of a typical integer.

Definition: The Statistical Information of a Prime Number

Using the method of level sets and the method of Lagrange Multipliers, it was demonstrated in [1] that the Statistical Information of a prime \(p \in \mathbb{P}\) is asymptotically \(H(p) \sim \ln p\) as the primes have a uniform source distribution. Moreover, this measure of information content is in direct correspondence to the asymptotic density of the primes. Thus, the entropy \(H(p)\) is not invariant to the choice of base of the logarithm.

Theorem: All the Statistical Information in the integers is contained in the primes

The strategy in this section is to proceed using the sequence of information-theoretic arguments used by I. Kontoyiannis in [4] before using the main result of [1], that the prime numbers have a uniform source distribution, to yield the correct information-theoretic interpretation of Chebyshev’s theorem.

Let’s sample a typical integer \(Z \sim U([1,N])\), under the maximum entropy distribution, and express it uniquely as:

\begin{equation} Z = \prod_{k=1}^{\pi(N)} p_k^{X_k} \end{equation}

This defines the random variable \(X_i\) with distribution:

\begin{equation} P(X_i \geq k) = \frac{1}{N} \big\lfloor \frac{N}{p_i^k} \big\rfloor \approx \big(\frac{1}{p_i} \big)^k \end{equation}

Likewise,

\begin{equation} P(X_i \geq k \land X_i \geq l) \approx \big(\frac{1}{p_i} \big)^k \cdot \big(\frac{1}{p_j}\big)^l \end{equation}

Now, we’ll note that:

\begin{equation} \mu_i = \mathbb{E}[X_i] = \sum_{k \geq 1} P(X_i \geq k) \leq \sum_{k \geq 1} P(X_i \geq k) \leq \sum_{k \geq 1} \frac{1}{p_i^k} = \frac{1}{p_i -1} \end{equation}

and the entropy of a geometric random variable with mean \(\mu\) is given by:

\begin{equation} h(\mu) = (\mu + 1)\cdot \ln(\mu + 1) - \mu \ln \mu \end{equation}

This yields the lower-bound:

\begin{equation} \ln N = H(Z)=H(X_1,…X_{\pi(N)}) \leq \sum_{k=1}^{\pi(N)} H(X_k) \leq \sum_{k=1}^{\pi(N)} h(\mu_k) \leq \sum_{k=1}^{\pi(N)} h\big(\frac{1}{p_k-1}\big) \end{equation}

which simplifies to:

\begin{equation} \ln N \leq \sum_{p \leq N} \frac{\ln p}{p-1} -\ln\big(1-\frac{1}{p}\big) \leq (1+\epsilon_n) \sum_{p \leq N} \frac{\ln p}{p} = (1+\epsilon_n) \cdot \xi_N \end{equation}

It is worth pointing out that the entropy term:

\begin{equation} \xi_N = \sum_{p \leq N} \frac{1}{p} \cdot \ln p \end{equation}

represents the statistical information of all prime numbers \(p \leq N\) weighted by their typical frequency \(\frac{1}{p}\).

So far we have managed to show that:

\begin{equation} \lim_{N \to \infty} \inf \frac{H(Z)}{\ln N} \geq 1 \end{equation}

and it remains to show that:

\begin{equation} \lim_{N \to \infty} \sup \frac{H(Z)}{\ln N} \leq 1 \end{equation}

In order to prove the upper-bound we shall make use of a theorem due to Erdős:

\begin{equation} \sum_{p \leq N} \ln p \leq 2N \end{equation}

as well as the fact that:

\begin{equation} \mu_i = \mathbb{E}[X_i] = \sum_{k \geq 1} P(X_i \geq k) = \sum_{k \geq 1} \frac{1}{N} \big\lfloor \frac{N}{p_i^k} \big\rfloor \geq \frac{1}{N} \big\lfloor \frac{N}{p_i^k} \big\rfloor \geq \frac{1}{p_i} - \frac{1}{N} \end{equation}

Now, since under the uniform distribution we have:

\begin{equation} P(Z) = \frac{1}{N} \leq \frac{1}{Z} \end{equation}

we may deduce:

\begin{equation} H(Z) = \mathbb{E}\big[\ln \frac{1}{P(Z)}\big] \geq \mathbb{E}\big[\ln Z\big] = \mathbb{E}\big[\ln \prod_{i=1}^{\pi(N)} p_i^{X_i}\big] = \sum_{i=1}^{\pi(N)} \mathbb{E}[X_i] \cdot \ln p \end{equation}

and using Erdős’ lower-bound we have:

\begin{equation} \ln N \geq \sum_{p \leq N} \big(\frac{1}{p}-\frac{1}{N}\big) \cdot \ln p = \xi_N - \frac{1}{N}\sum_{p \leq N} \ln p \geq \xi_N -2 \end{equation}

which yields the upper-bound:

\begin{equation} \frac{\xi_N}{\ln N} \leq 1 + \frac{2}{\ln N} \end{equation}

When combined with the lower-bound, we may deduce that:

\begin{equation} \ln N \sim \sum_{p \leq N} \frac{1}{p} \cdot \ln p \end{equation}

so all the Statistical Information in the integers is in the prime numbers.

Definition: The algorithmic information of an integer

The algorithmic information of an integer \(n \in \mathbb{N}\) is \(K_U(n) \sim \log_2(n)\) asymptotically almost everywhere. This follows immediately from the fact that almost all strings are incompressible [6].

Theorem: All the Algorithmic Information in the integers is contained in the primes

The following section presumes basic knowledge of algorithmic information theory which some mathematicians may not be familiar with. This powerful theory which combines Shannon’s theory of communication and Turing’s theory of computation is carefully introduced in references [5] and [6].

As the information-theoretic formulation of Chebyshev’s relation:

\begin{equation} \ln N \sim \sum_{p \leq N} \frac{1}{p} \cdot \ln p \end{equation}

is not invariant to the choice of base of the logarithm, it is natural to ask whether a similar asymptotic relation holds within the setting of algorithmic information theory.

Given the definition of the algorithmic information of an integer, it follows that almost all integers are algorithmically random. Hence, for \(Z \sim U([1,N])\) the following relation holds asymptotically almost everywhere:

\begin{equation} K_U(Z) \sim \log_2 Z \sim \sum_{p \leq Z} \frac{1}{p} \cdot \log_2 p \end{equation}

This implies that the typical frequency that a large integer \(N\) is divisible by a prime \(p\) converges to the typical frequency that an integer \(Z \sim U([1,N])\) is divisible by \(p\), so all the algorithmic information in the integers is contained in the prime numbers.

Theorem: An upper-bound on the algorithmic information of a typical integer

If we consider the difference between the statistical information and algorithmic information of integers, we may derive an important upper-bound on the algorithmic information of a typical integer:

\begin{equation} \forall X \sim U([1,N]), \sup K_U(X) \sim \mathbb{E}[K_U(X)] = \mathcal{O}(H(X)) \end{equation}

which shows that the Expectation Operator serves as a bridge between two distinct theories of information.

Proof:

If we analyse the RHS of (21), we’ll first note that:

\begin{equation} H(X) \sim \ln N \end{equation}

which follows from the theorem concerning the Statistical Information of integers.

As for the expectation \(\mathbb{E}[K_U(X)]\), if we use the definition of the algorithmic information of an integer and apply Stirling’s log-factorial approximation we have:

\begin{equation} \mathbb{E}[K_U(X)] = \sum_{x=1}^N \frac{1}{N} \cdot K_U(x) \sim \frac{1}{N} \cdot \sum_{x=1}^N \log_2(x) = \frac{1}{N} \cdot \log_2(N!) \sim \log_2(N) \end{equation}

and the combination of (22) and (23) yields the RHS:

\begin{equation} \mathbb{E}[K_U(X)] \sim \frac{\ln N}{\ln 2} \implies \mathbb{E}[K_U(X)] = \mathcal{O}(H(X)) \end{equation}

As for the LHS, we’ll note that \(\forall X \sim U([1,N]), K_U(X) \leq \log_2(N)\) which allows us to deduce the desired asymptotic relation. QED.

References:

  1. Aidan Rocke. AN INFORMATION-THEORETIC UPPER BOUND ON PRIME GAPS. Arxiv. 2021.

  2. P.L. Chebychev. Mémoire sur les nombres premiers. J. de Math. Pures Appl., 17:366–390, 1852.

  3. P.L. Chebychev. Sur la totalité des nombres premiers inférieurs à une limite donnée. J. de Math. Pures Appl., 17:341–365, 1852.

  4. I. Kontoyiannis. “Some information-theoretic computations related to the distribution of prime numbers.” In Festschrift in Honor of Jorma Rissanen, (P. Grunwald, P. Myllymaki, I. Tabus, M. Weinberger, B. Yu, eds.), pp. 135-143, Tampere University Press, May 2008.

  5. Peter Grünwald and Paul Vitányi. Shannon Information and Kolmogorov Complexity. 2010.

  6. Lance Fortnow. Kolmogorov Complexity. 2000.

  7. Peter D. Grünwald. The Minimum Description Length Principle . MIT Press. 2007.

  8. A. N. Kolmogorov Three approaches to the quantitative definition of information. Problems of Information and Transmission, 1(1):1–7, 1965

  9. G. J. Chaitin On the length of programs for computing finite binary sequences: Statistical considerations. Journal of the ACM, 16(1):145–159, 1969.

  10. R. J. Solomonoff A formal theory of inductive inference: Parts 1 and 2. Information and Control, 7:1–22 and 224–254, 1964.

  11. Marcus Hutter et al. (2007) Algorithmic probability. Scholarpedia, 2(8):2572.