Original Motivations:

This investigation into the law governing the distribution of primes was originally motivated by two mysteries. The first mystery concerned Wigner’s observation that mathematical laws have remarkable generalisation power in the natural sciences. Given that these laws are generally constrained to satisfy Occam’s razor, I observed that Occam’s razor would be generally applicable in a Universe where information is conserved [1]. It is at that moment that I understood the importance of the Law of Conservation of Information in Quantum Mechanics, otherwise known as the principle that the von Neumann entropy is invariant to Unitary transformations.

In parallel, I was talking to a physicist about the technology required to experimentally verify the Simulation Hypothesis. Upon further reflection, I realised that we may investigate mathematical signatures of the Simulation Hypothesis via mathematical laws that must be consistent with the Law of Conservation of Information. On the same day, I realised why the prime numbers were distributed as they were.

The key idea was that the prime numbers may be treated as experimental data, as they are not the invention of man.

The Monte Carlo Hypothesis:

What exactly is the Monte Carlo Hypothesis?

The Prime Number Theorem says how the prime numbers are distributed but not why. On the other hand, an information-theoretic derivation of the Prime Number Theorem indicates that they are distributed in this way because prime encodings are algorithmically random and the prime numbers have a maximum entropy distribution. We may proceed with the derivation using principles of Classical Information Theory.

If we know nothing about the primes in the worst case we may assume that each prime number less than or equal to \(N\) is drawn uniformly from \([1,N]\). So our source of primes is:

\begin{equation} X \sim U([1,N]) \end{equation}

where \(H(X) = \ln(N)\) is the Shannon entropy of the uniform distribution.

Now, given a strictly increasing integer sequence of length \(N\), \(U_N = \{u_i\}_{i=1}^N\), where \(u_i = i\) we may define the prime encoding of \(U_N\) as the binary sequence \(X_N = \{x_i\}_{i=1}^N\) where \(x_i = 1\) if \(u_i\) is prime and \(x_i = 0\) otherwise. With no prior knowledge, given that each integer is either prime or not prime, we have \(2^N\) possible prime encodings of \([1,N] \subset \mathbb{N}\).

If there are \(\pi(N)\) primes less than or equal to \(N\) then the average number of bits per arrangement gives us the average amount of information gained from correctly identifying each prime number in \(U_N\) as:

\begin{equation} S_c = \frac{\log_2 (2^N)}{\pi(N)} = \frac{N}{\pi(N)} \end{equation}

Furthermore, if we assume a maximum entropy distribution over the primes then we would expect that each prime is drawn from a uniform distribution as in (1) so we would have:

\begin{equation} S_c = \frac{N}{\pi(N)} \sim \ln(N) \end{equation}

As for why the natural logarithm appears in (3), we may first note that the base of the logarithm in the Shannon entropy may be freely chosen without changing its properties. Moreover, given the assumptions if we define \((k,k+1] \subset [1,N]\) the average distance between consecutive primes is given by the sum of weighted distances \(l\):

\begin{equation} \sum_{k=1}^{N-1} \frac{1}{k} \cdot |(k,k+1]| = \sum_{k=1}^{N-1} \frac{1}{k} \approx \sum_{l=1}^M l \cdot P_l \approx \ln(N) \end{equation}

where \(P_l = \frac{1}{l} \cdot \sum_{k = \frac{l \cdot (l-1)}{2}}^{\frac{l \cdot (l-1)}{2}+l-1} \frac{1}{k+1}\) and \(M = \frac{\sqrt{1+8\cdot(N+1)}-1}{2}\).

This is consistent with the maximum entropy assumption in (1) as there are \(k\) distinct ways to sample uniformly from \([1,k]\) and a frequency of \(\frac{1}{k}\) associated with the event that a prime lies in \((k-1,k]\). In fact, the computation (4) is consistent with Boltzmann’s notion of entropy as a measure of possible arrangements.

There is another useful interpretation of (4). If we break \(\sum_{k=1}^N \frac{1}{k}\) into \(\pi(N)\) disjoint blocks of size \([p_k,p_{k+1}]\) where \(p_k, p_{k+1} \in \mathbb{P}\):

\begin{equation} \sum_{k=1}^N \frac{1}{k} \approx \sum_{k=1}^{\pi(N)} \sum_{n=p_k}^{p_{k+1}} \frac{1}{n} = \sum_{k=1}^{\pi(N)} (p_{k+1}-p_k) \cdot P(p_k) \approx \ln(N) \end{equation}

where \(P(p_k) = \frac{1}{p_{k+1}-p_k} \sum_{b=p_k}^{p_{k+1}} \frac{1}{b}\).

Given (5), we may note that \(\ln(N)\) also approximates the expected number of observations per prime where \(P(p_k)\) may be interpreted as the probability of a successful observation in a frequentist sense. This is consistent with John Wheeler’s it from bit interpretation of entropy, where the entropy measures the average number of bits(i.e. yes/no questions) per prime number.

Now, we may note that by combining (3), (4), and (5) we have:

\begin{equation} \pi(N) \sim \frac{N}{\ln(N)} \end{equation}

which happens to be equivalent to the Prime Number Theorem.

By the Shannon source coding theorem, we may infer that \(\pi(N)\) primes can’t be compressed into fewer than \(\pi(N) \cdot \ln(N)\) bits so this result tells us something about the incompressibility of the primes. In fact, there is a strong connection between maximum entropy distributions and incompressible signals. This may be clarified by the following insight:

\begin{equation} \mathbb{E}[K(X_N)] \sim \pi(N) \cdot \ln(N) \sim N \end{equation}

where the implicit assumption is that for any computable probability distribution, the expected value of the Kolmogorov Complexity equals the Shannon entropy.

Information-theoretic intuitions for the Monte Carlo Hypothesis:

From an information-theoretic perspective, the Monte Carlo Hypothesis (7) tells us that all the information in the integers is contained in the prime numbers. This may be deduced from Chaitin’s theorem that almost all integers are incompressible:

\begin{equation} \forall N \in \mathbb{N}, K_U(N) \sim \log_2(N) \end{equation}

Given that all integers have a unique prime factorisation, for large integers \(N \in \mathbb{N}\):

\begin{equation} N = \prod_{k=1}^n p_k^{\alpha_k} \implies K_U(N) \sim \sum_{k=1}^n \alpha_k \cdot \log_2 p_k \end{equation}

so all the information in the integers is contained in the prime numbers.

Experimentally verifiable consequences of the Monte Carlo Hypothesis:

Given that the Monte Carlo Hypothesis tells us that prime encodings are algorithmically random, we may deduce that:

  1. Prime encodings \(X_N \in \{0,1\}^N\) should pass a weighted next-bit test which would imply that one-way functions exist.

  2. Integer factorisation, and therefore the RSA function, is a one-way function and therefore \(P \neq NP\).

Both of these corollaries are experimentally verifiable using automated program synthesis. In fact, they are direct consequences of the fact that all the information in the integers is contained in the primes and leads us to a refinement of the original hypothesis:

\begin{equation} \mathbb{E}[K^{\text{poly}}(X_N)] \sim \pi(N) \cdot \ln(N) \sim N \end{equation}

so prime encodings are algorithmically random relative to a Universal Compressor that can simulate machine learning models with polynomial-time computational complexity.

P vs NP and the Computational Complexity of Physical Reality:

If we consider that laws governing the distribution of primes form the mathematical basis for the Simulation Hypothesis then the Riemann Hypothesis is a problem in physics. Meanwhile, the Minimum Energy Principle indicates that all of physics(including the primes) may be efficiently simulated relative to a Universal Quantum computer. However, this would require defining an entire research program in theoretical physics that is beyond the scope of the present survey.

In fact, further investigations would require important innovations in applied physics as they involve engineering such a computer.


  1. Aidan Rocke. https://keplerlounge.com/physics/2021/07/11/reasonable-effectiveness.html. 2021.
  2. Aidan Rocke (https://mathoverflow.net/users/56328/aidan-rocke), information-theoretic derivation of the prime number theorem, URL (version: 2021-02-20): https://mathoverflow.net/q/384109
  3. Scott Aaronson. NP-complete Problems and Physical Reality. 2005.
  4. A. N. Kolmogorov Three approaches to the quantitative definition of information. Problems of Information and Transmission, 1(1):1–7, 1965
  5. G. J. Chaitin On the length of programs for computing finite binary sequences: Statistical considerations. Journal of the ACM, 16(1):145–159, 1969.
  6. R. J. Solomonoff A formal theory of inductive inference: Parts 1 and 2. Information and Control, 7:1–22 and 224–254, 1964.
  7. David Deutsch. Quantum theory, the Church–Turing principle and the universal quantum computer. 1985.
  8. John A. Wheeler, 1990, “Information, physics, quantum: The search for links” in W. Zurek (ed.) Complexity, Entropy, and the Physics of Information. Redwood City, CA: Addison-Wesley.
  9. Igor V. Volovich. Number Theory as the Ultimate Physical Theory. Pleiades Publishing. 2010.
  10. Iftach Haitner and Salil Vadhan. The Many Entropies in One-Way Functions. 2017.