…we did not really believe that we picked up signals from another civilisation, but obviously the idea had crossed our minds.-Jocelyn Bell Burnell

Motivation:

The problem of communicating with extraterrestrial civilisations was first carefully considered by Gauss in his Pythagorean proposal. Ever since then, the challenge of devising fool-proof mathematical signatures for communicating with extraterrestrial civilisations has gripped the minds of generations of physicists and mathematicians.

The essential difficulty lies in engineering a physical signal whose semantics are invariant to all possible representations. In a precise sense, this signal must be transformation-invariant where the transformations are arbitrary to the extent that they don’t change the phase-space dimension of the processes responsible for generating that signal. What can be shown is that such signals must be incompressible or what algorithmic information theorists call algorithmically random.

Such physical signals are practically impossible as they may be engineered if and only if they are pseudo-random. In fact, I shall demonstrate that the distribution of prime numbers is unique in satisfying this property relative to civilisations that operate within the Turing limit.

This allows us to devise an algorithm that is unique in satisfying the criterion of representation invariance.

A protocol for interstellar communication:

If we are to intercept a signal from an advanced extraterrestrial civilisation, there are two possibilities:

  1. The civilisation operates within the Turing limit.
  2. The civilisation operates beyond the Turing limit.

Civilisations beyond the Turing limit(i.e. capable of hyper-computation) are theoretically possible if they master principles of Planck-scale engineering. However, we would not be able to communicate with them as they would use fundamentally different scientific methods and they would know everything that can be known about the distribution of prime numbers. On the other hand, what we have in common with civilisations within the Turing limit is that we can’t solve the halting problem. In particular, we can’t prove the Riemann hypothesis so we shall both perceive an algorithmically random distribution of prime numbers.

Now, how might a civilisation encode the distribution of prime numbers? A clever method would be to use the electromagnetic frequency band of Pulsars, which are of scientific interest to a civilisation that has not mastered principles of Planck-scale engineering.

In particular, it is sufficient to use two-frequency pulses. One frequency for ones and a neighboring frequency for zeroes. This would allow a binary encoding of the location of prime numbers in a sequence of increasing integers. But, why might the distribution of prime numbers be algorithmically random?

This will be explored in more detail in the next section. The bigger picture is that algorithmic randomness is not observer-independent because the scientific method itself is not observer-independent.

An information-theoretic derivation of the prime number theorem, or the incompressibility of the prime numbers:

If we know nothing about the primes in the worst case we may assume that each prime number less than or equal to \(N\) is drawn uniformly from \([1,N]\). So our source of primes is:

\begin{equation} X \sim U([1,N]) \end{equation}

where \(H(X) = \ln(N)\) is the Shannon entropy of the uniform distribution.

Now, given a strictly increasing integer sequence of length \(N\), \(U_N = \{u_i\}_{i=1}^N\) where \(u_i = i\) we may define the prime encoding of \(U_N\) as the binary sequence \(X_N = \{x_i\}_{i=1}^N\) where \(x_i =1\) if \(u_i\) is prime and \(x_i=0\) otherwise. With no prior knowledge, given that each integer is either prime or not prime, we have \(2^N\) possible prime encodings(i.e. arrangements of the primes) in \([1,N] \subset \mathbb{N}\).

If there are \(\pi(N)\) primes less than or equal to \(N\) then the average number of bits per arrangement gives us the average amount of information gained from correctly identifying each prime number in \(U_N\) as:

\begin{equation} S_c = \frac{\log_2 (2^N)}{\pi(N)}= \frac{N}{\pi(N)} \end{equation}

Furthermore, if we assume a maximum entropy distribution over the primes then we would expect that each prime is drawn from a uniform distribution as in (1) so we would expect:

\begin{equation} S_c = \frac{N}{\pi(N)} \sim \ln(N) \end{equation}

As for why the natural logarithm appears in (3), we may first note that the base of the logarithm in the Shannon Entropy may be freely chosen without changing its properties. Moreover, given the assumptions if we define \((k,k+1] \subset [1,N]\) the average distance between consecutive primes is given by the sum of weighted distances \(l\):

\begin{equation} \sum_{k=1}^{N-1} \frac{1}{k} \lvert (k,k+1] \rvert = \sum_{k=1}^{N-1} \frac{1}{k} \approx \sum_{l=1}^\lambda l \cdot P_l \approx \ln(N) \tag{4} \end{equation}

where \(P_l = \frac{1}{l} \cdot \sum_{k= \frac{l \cdot (l-1)}{2}}^{\frac{l\cdot (l-1)}{2}+l-1} \frac{1}{k+1}\) and \(\lambda = \frac{\sqrt{1+8(N+1)}-1}{2}\).

This is consistent with the maximum entropy assumption in (1) as there are \(k\) distinct ways to sample uniformly from \([1,k]\) and a frequency of \(\frac{1}{k}\) associated with the event that a prime lies in \((k-1,k]\). The computation (4) is also consistent with Boltzmann’s notion of entropy as a measure of possible arrangements.

There is another useful interpretation of (4). If we break \(\sum_{k=1}^{N} \frac{1}{k}\) into \(\pi(N)\) disjoint blocks of size \([p_k,p_{k+1}]\) where \(p_k,p_{k+1} \in \mathbb{P}\) and \(p_1 = 1\):

\begin{equation} \sum_{k=1}^{N} \frac{1}{k} \approx \sum_{k=1}^{\pi(N)} \sum_{b=p_k}^{p_{k+1}} \frac{1}{b} = \sum_{k=1}^{\pi(N)} (p_{k+1}-p_k)\cdot P(p_k) \approx \ln(N) \tag{5} \end{equation}

where \(P(p_k)= \frac{1}{(p_{k+1}-p_k)} \sum_{b=p_k}^{p_{k+1}} \frac{1}{b}\). So we see that (4) also approximates the expected number of observations per prime where \(P(p_k)\) may be interpreted as the probability of a successful observation in a frequentist sense. This is consistent with John Wheeler’s it from bit interpretation of entropy [5], where the entropy measures the average number of bits(i.e. yes/no questions) per prime number.

Now, given (3),(4) and (5) this implies that the average number of bits per prime number is given by:

\begin{equation} \frac{N}{\pi(N)} \sim \ln(N) \end{equation}

which is in complete agreement with the predictions of the prime number theorem.

The Shannon source coding theorem, and the compressibility of the primes:

By the Shannon source coding theorem, we may also infer that \(\pi(N)\) primes can’t be compressed into fewer than \(\pi(N) \cdot \ln(N)\) bits. So the primes are incompressible. In fact, given that the expected Kolmogorov Complexity equals the Shannon entropy for computable probability distributions for a prime encoding of length \(N\) we must asymptotically obtain:

\begin{equation} \mathbb{E}[K(X_N)] \sim \pi(N) \cdot \ln(N) \sim N \end{equation}

where \(K(\cdot)\) is the Kolmogorov Complexity. But, why might the distribution of prime numbers be incompressible?

The primes are incompressible for two fundamental reasons. We may first note that they are dimensionally independent which implies that every integer has a unique prime factorisation. Since any mathematical system that is sufficiently powerful to formulate a general theory of computation must contain arithmetic and the integers have a unique representation in terms of the primes, it follows that all the other types in such a system are derived types relative to the prime numbers. The prime numbers are unique in that they have an irreducible representation whereas all other mathematical objects have a compositional structure.

For a deeper insight into why the prime numbers appear to behave in an algorithmically random manner, one may consider the fact that if the human brain may be simulated by a Turing machine then algorithmic randomness is not observer independent. This is a reasonable proposition if you consider the all-or-nothing operation of neurons as well as the observation that all organisms have finite sensory and behavioural state spaces.

Testable predictions:

I have colleagues in the machine learning community that believe their powerful machine learning methods may be used to predict the distribution of primes with much greater accuracy than any of the statistical algorithms developed so far. However, my model implies that independently of the amount of data and computational resources at their disposal, if the best machine learning model predicts the next \(N\) primes to be at \(\{a_i\}_{i=1}^N \in \mathbb{N}\) then for large \(N\) their model’s statistical performance will converge to an accuracy that is no better than:

\begin{equation} \frac{1}{N}\sum_{i=1}^N \frac{1}{a_i} \end{equation}

and the the scenario where they predict all \(N\) primes accurately occurs with a frequency of:

\begin{equation} \prod_{i=1}^N \frac{1}{a_i} \end{equation}

It is worth noting that this information-theoretic analysis yields a statistical model and testable predictions that are not reducible to the Prime Number Theorem. What is even more interesting is that this statistical model was not and could not have been attained by machine learning methods yet it may be empirically tested by such methods.

An invariance theorem for algorithmically random data:

Let’s suppose we have a prime encoding, as defined in the previous section, that has been beamed to us by an alien civilisation. How would we know it? Given this data, the challenge of predicting the location of the prime numbers may be converted into the sequential prediction problem:

\begin{equation} \exists f \in F_{\theta}, X_{n+1} = f \circ X_n \end{equation}

where \(X_n\) is a k-bit binary encoding of a natural signal. So if we define \(X_N^\text{train} = \{X_i\}_{i=1}^N, X_N^\text{test} = \{X_i\}_{i=1}^N\), and \(X_{n+1}^i = f \circ X_n^i \implies \delta_{\hat{f}(X_{n}^i),X_{n+1}^i} = 1\) then for \(X_n\) sampled i.i.d. from an incompressible discrete random variable \(X\) any solution to the empirical risk minimisation problem:

\begin{equation} \hat{f} = \max_{f \in F_{\theta}} \frac{1}{Nk} \sum_{n=1}^{N} \sum_{i=1}^k \delta_{\hat{f}(X_{n}^i),X_{n+1}^i} \end{equation}

we would observe for large \(N\):

\begin{equation} \sum_i \Big(p \cdot \frac{\delta_{Y_i,1} \cdot \phi_i}{N_1} + (1-p) \cdot \frac{\delta_{Y_i,0} \cdot \phi_i}{N_0} \Big) \leq \frac{1}{2} \end{equation}

where we used:

\begin{equation} \hat{Y}_i :=\hat{f}(X_i) \end{equation}

\begin{equation} \phi_i := \delta_{\hat{Y}_i, Y_i} \end{equation}

\begin{equation} N_1 = \sum_i \delta_{\hat{Y}_i,1} \end{equation}

\begin{equation} N_0 = \sum_i \delta_{\hat{Y}_i,0} \end{equation}

\begin{equation} p = \frac{1}{N} \sum_i \delta_{Y_i,1} \end{equation}

\begin{equation} 1-p = \frac{1}{N} \sum_i \delta_{Y_i,0} \end{equation}

and this result would hold for any transformation \(T\) applied to the data that does not change the phase-space dimension of the data-generating process.

Discussion:

At this point a good scientist may remark that central to my analysis was an information-theoretic analysis of the prime number theorem where I did not make use of a single result in number theory and yet I somehow obtained the exact form of the Prime Number Theorem. How is this possible?

I suspect this means that the Prime Number Theorem is a fundamental theorem in algorithmic information theory that was first discovered by number theorists.

References:

  1. Dániel Schumayer and David A. W. Hutchinson. Physics of the Riemann Hypothesis. Arxiv. 2011.
  2. Doron Zagier. Newman’s short proof of the Prime Number Theorem. The American Mathematical Monthly, Vol. 104, No. 8 (Oct., 1997), pp. 705-708
  3. Peter D. Grünwald. The Minimum Description Length Principle . MIT Press. 2007.
  4. M. Li and P. Vitányi. An Introduction to Kolmogorov Complexity and Its Applications. Graduate Texts in Computer Science. Springer. 1997.
  5. Peter Shor. Shannon’s noiseless coding theorem. lecture notes. 2010.
  6. Jerome H. Friedman, Robert Tibshirani, and Trevor Hastie. The Elements of Statistical Learning. Springer. 2001. AlphaGo Zero. Kepler Lounge. 2019.
  7. Andrew Chi-Chih Yao. Theory and applications of trapdoor functions. In Proceedings of the 23rd IEEE Symposium on Foundations of Computer Science, 1982.
  8. Sullivan, Walter (September 29, 1968). “The Universe Is Not Ours Alone; From the edge of the galaxy”. The New York Times.
  9. Alexander L. Zaitsev. Rationale for METI. Arxiv. 2011.
  10. Jérôme Durand-Lose. Black hole computation: implementations with signal machines. 2008.