Introduction:

Using Karl Friston’s Free Energy Principle and an Entropic formulation of normal numbers we are able to demonstrate that normal sequences don’t have an efficient neural code as they are not compressible. This observation is important because the human brain can only consider sequential data that is finite-state compressible.

This raises the question of whether all that is observable is all there is, and whether the human brain is equivalent to the human mind.

Predictive Coding and Efficient Neural Codes:

While there are multiple accounts of how organisms construct efficient neural codes at the computational level [8], Karl Friston has formulated an ecologically-realistic account known as the Free Energy Principle. According to this principle, an organism finds efficient representations for observations that diminish long-term average surprise relative to its environment.

Mathematically, Friston’s principle involves the minimisation of the entropy of an organism’s sensory states formulated as follows:

\begin{equation} H(y) = \lim_{T \to \infty} \frac{1}{T} \int_{0}^T -\ln p(y|m) dt \end{equation}

where \(m\) refers to the organism’s model of its environment and \(y\) represents the organism’s sensory input.

From the vantage point of an organism minimising entropy (1), an observation(or percept) corresponds to a neural code that diminishes its long-term uncertainty concerning its environment. Thus, a Fristonian organism that is Bayes-optimal would tend to classify unpredictable sequences of sensory input as uninformative.

Assuming that biological neural information processing may be modelled by finite-state machines, this raises the question of whether a finite-state incompressible sequence may be physically-meaningful. Considering the ubiquity of normal numbers, this raises the possibility that potentially important physical relationships may escape the analysis of a biological organism minimising its free energy bound on surprise. In particular, it is worth noting that Archimedes’ constant which serves as a Rosetta Stone both in mathematics and physics is conjectured to be a normal number based on strong experimental evidence [2].

As modern Artificial Intelligence systems are modelled upon biological neural networks this analysis is of general interest to AI researchers that would like to understand the epistemic limits of deep learning.

An Entropic formulation of normal number:

Without loss of generality, let \(\Sigma = \{0,1\}\) be a finite alphabet and \(\Sigma^{\infty}\) be the set of all sequences that may be drawn from that alphabet. Let \(S \in \Sigma^{\infty}\) be such a sequence and for each \(a \in \Sigma\) let \(N_S(a,n)\) denote the number of times the digit appears in the first \(n\) digits of the sequence \(S\).

We say that \(S\) is simply normal if:

\begin{equation} \forall a \in \Sigma, \lim_{n \to \infty} \frac{N_S(a,n)}{n} = \frac{1}{2} \end{equation}

Now, let \(w\) denote any finite string in \(\Sigma^*\) and let \(N_S(w,n)\) be the number of times the string \(w\) appears as a substring in the first \(n\) digits of the sequence \(S\). \(S\) is normal if, for all finite strings \(w \in \Sigma^*\):

\begin{equation} \lim_{n \to \infty} \frac{N_S(w,n)}{n} = \frac{1}{2^{|w|}} \end{equation}

It is worth pointing out that an information-theoretic formulation of normal numbers in base-2 may be derived using the Asymptotic Equipartition Property [10]. If we construct messages of length \(N\) using the alphabet \(\Sigma = \{a,b\}\), and the asymptotic frequency of each character is given by:

\begin{equation} P(a)=P(b) = \frac{1}{2} \end{equation}

then for large \(N\), the message will consist of \(\frac{N}{2}\) occurrences of \(a\) and \(\frac{N}{2}\) occurrences of \(b\). Using Stirling’s approximation, the number of such messages is given by:

\begin{equation} \frac{N!}{\big(\lfloor \frac{N}{2} \rfloor\big)^2} \sim \frac{N^N}{(\frac{N}{2})^N} = 2^{N \cdot S} \end{equation}

where \(S\) is the Shannon Entropy per letter:

\begin{equation} S = -\frac{1}{2}\log_2 \frac{1}{2} -\frac{1}{2}\log_2 \frac{1}{2} = 1 \end{equation}

Therefore, the typical information gained from observing such a message of length \(N\) is on the order of \(N\) bits. From this Entropic formulation of normal numbers we may deduce that there is no efficient neural code for normal numbers as they are not compressible.

References

  1. Patrick Billingsley. Prime Numbers and Brownian Motion. The American Mathematical Monthly. 1973.

  2. Francisco J. Aragón Artacho, David H. Bailey, Jonathan M. Borwein, Peter B. Borwein. Walking on real numbers. 2012.

  3. Yiftach Dayan, Arijit Ganguly, Barak Weiss. Random walks on tori and normal numbers in self similar sets. Arxiv. 2020.

  4. Alan Edelman, Brian D. Sutton. From Random Matrices to Stochastic Operators. Arxiv. 2006.

  5. Bialek & Tishby. Predictive Information. Arxiv. 1999.

  6. Karl Friston. The Free Energy Principle: a rough guide to the brain? Cell Press. 2009.

  7. S. P. Strong, Roland Koberle, Rob R. de Ruyter van Steveninck, and William Bialek. Entropy and Information in Neural Spike Trains. Physical Review Letters. 1998.

  8. William Bialek, Rob R. de Ruyter van Steveninck and Naftali Tishby. Efficient representation as a design principle for neural coding and computation. Arxiv. 2007.

  9. Noga Zaslavsky, Charles Kemp, Terry Regier, and Naftali Tishby. Efficient compression in color naming and its evolution. 2018.

  10. Edward Witten. A Mini-Introduction To Information Theory. Arxiv. 2019.