Kepler Lounge: Lesser known miracles of Levin's Universal Distribution

Aidan Rocke

Introduction:

Over the course of the last decade, important evidence has accumulated in favour of the Multiverse formalism of Quantum Mechanics where the Observable Universe is simulated by the Universal Wave Function. Meanwhile, the scientific community has consistently observed the unreasonable effectiveness of Deep Neural Networks for Universal Data Compression. As Levin’s Universal Distribution is the precise mathematical formulation of the Physical Church-Turing thesis and the Universal Wave Function is the most compelling Universal Turing Machine for the Observable Universe, it is natural to explore the implications of this necessary and sufficient correspondence.

If a number of scientists including Harold Erbin and Daniel Persson are making progress on the neural network-Quantum Field Theory correspondence, the objective of the present analysis is to deduce a strong form of Levin’s Coding theorem as a corollary of the fact that Expected Kolmogorov Complexity equals Shannon Entropy.

Theorem: Expected Kolmogorov Complexity equals Shannon Entropy

Let’s suppose that i.i.d. data \(x_{1:n} \in \mathcal{A}^{(n)}\) are generated by sampling from the distribution \(P_X\) on the finite set \(\mathcal{A}\). Then it may be demonstrated that Expected Kolmogorov Complexity equals Shannon Entropy up to an additive constant:

\[\begin{equation} \sum_{x_{1:n} \in \mathcal{A}^(n)} P(X^{(n)}=x_{1:n}) \cdot K_U(X^{(n)} = x_{1:n}) = H(X) + \mathcal{O}(1) \tag{1} \end{equation}\]

Now, by definition:

\[\begin{equation} H(X^{(n)}) = -\sum_{x_{1:n} \in \mathcal{A}^n} P(X^{(n)}=x_{1:n}) \cdot \log_2 P(X^{(n)}=x_{1:n}) \tag{2} \end{equation}\]

where \(P\) is a Universal Distribution that holds for our particular Observable Universe so that: \[\begin{equation} -\log_2 P(X^{(n)}=x_{1:n}) = K_U(X^{(n)}=x_{1:n}) -\mathcal{O}(1) \tag{3} \end{equation}\]

and Unitarity is guaranteed through an Oracle that identifies it with the Universal Wave Function, as \(U\) is a computer that simulates the Observable Universe.

Proof:

If we carefully consider the Asymptotic Equipartition Theorem,

\[\begin{equation} \lim_{n \to \infty} P(x_{1:n} \in \mathcal{A_{\epsilon}}^{(n)}) = P\big(\lvert \frac{1}{n} \sum_{i=1} \log_2 \frac{1}{P(X_i^{(n)})}-H(X^{(n)}) \rvert \leq \epsilon \big) \rightarrow 1 \tag{4} \end{equation}\]

we may define a natural measure on the atypical set \(E_n = \mathcal{A} \setminus \mathcal{A_{\epsilon}}^{(n)}\):

\[\begin{equation} \mu_n = \frac{1}{\lvert E_n \rvert} \sum_{x_{1:n} \in E_n} P(X^{(n)}=x_{1:n}) \tag{5} \end{equation}\]

Thus, we may observe that \(\lim_{n \to \infty} \mu_n = 0\) so there must exist a positive exponent \(\alpha > 1\) such that as \(n \to \infty\):

\[\begin{equation} \mu_n^{-1} = \mathcal{o}\big(\lvert E_n \rvert^{\alpha} \big) \tag{6} \end{equation}\]

Moreover, given that \(K_U(X^{(n)}=x_{1:n}) \leq \log_2 |E_n|\) for \(x_{1:n} \in E_n\):

\[\begin{equation} \sum_{x_{1:n} \in E_n} P(X^{(n)}=x_{1:n}) \cdot K_U(X^{(n)}=x_{1:n}) \leq \mu_n \cdot |E_n| \cdot \log_2 |E_n| = \mathcal{o}\big(\lvert E_n \rvert^{-\alpha} \big) \tag{7} \end{equation}\]

Hence, we may deduce:

\[\begin{equation} \forall x_{1:n} \in \mathcal{A_{\epsilon}}^{(n)}, K_U(X^{(n)}=x_{1:n}) \sim \log_2 |\mathcal{A_{\epsilon}}^{(n)}| \implies \mathbb{E}[K_U(X^{(n)})] \sim \log_2 |\mathcal{A_{\epsilon}}^{(n)}| \tag{8} \end{equation}\]

Finally, due to the Asymptotic Equipartition Theorem we may determine that:

\[\begin{equation} H(X^{(n)}) \sim \mathbb{E}[K_U(X^{(n)})] \sim \log_2 |\mathcal{A_{\epsilon}}^{(n)}| \tag{9} \end{equation}\]

which is a more precise asymptotic form of the theorem that Expected Kolmogorov Complexity equals Shannon Entropy than what may be found in modern information theory textbooks.

Corollary: Levin’s Coding theorem

If we carefully consider propositions (8) and (9), we may deduce:

\[\begin{equation} \forall x_{1:n} \in \mathcal{A_{\epsilon}}^{(n)}, -\log_2 P(X^{(n)}=x_{1:n}) \sim \log_2 |\mathcal{A_{\epsilon}}^{(n)}| \tag{10} \end{equation}\]

\[\begin{equation} \forall x_{1:n} \in \mathcal{A_{\epsilon}}^{(n)}, K_U(X^{(n)}=x_{1:n}) \sim \log_2 |\mathcal{A_{\epsilon}}^{(n)}| \tag{11} \end{equation}\]

Thus, we may derive Levin’s Coding theorem for i.i.d. time-series data:

\[\begin{equation} -\log_2 P(X^{(n)}=x_{1:n}) \sim K_U(X^{(n)}=x_{1:n}) \sim \log_2 |\mathcal{A_{\epsilon}}^{(n)}| \tag{12} \end{equation}\]

which may be readily generalised to non-stationary data from the vantage point of the Asymptotic Equipartition Theorem.

References:

Max Tegmark. The Multiverse Hierarchy. “Universe or Multiverse?”. Cambridge University Press. 2007.
Tor Lattimore, Marcus Hutter. No Free Lunch versus Occam’s Razor in Supervised Learning. Arxiv. 2011.
Kolmogorov A.N. Three approaches to the quantitative definition of information. Problems Inform. Transmission, 1(1):1–7, 1965.
L.A. Levin. Laws of information conservation (non-growth) and aspects of the foundation of probability theory. Problems Information Transmission, 10(3):206-210, 1974.
Peter Grünwald and Paul Vitányi. Shannon Information and Kolmogorov Complexity. 2010.
MacKay, David J.C. (2003). Information Theory, Inference, and Learning Algorithms. Cambridge University Press. ISBN 0-521-64298-1.
Harold Erbin, Vincent Lahoche, Dine Ousmane Samary. Renormalization in the neural network-quantum field theory correspondence. NeurIPS 2022 workshop: Machine learning and the physical sciences. 2022.
Rocke (2022, Jan. 5). Kepler Lounge: Revisiting the unreasonable effectiveness of mathematics. Retrieved from keplerlounge.com

Lesser known miracles of Levin’s Universal Distribution

Introduction:

Theorem: Expected Kolmogorov Complexity equals Shannon Entropy

Proof:

Corollary: Levin’s Coding theorem

References:

Citation