Occam’s razor

An information-theoretic formulation of Occam’s razor as the Universal A Priori Probability.

Aidan Rocke https://github.com/AidanRocke


Given a scientific dataset and a collection of predictive models proposed by a number of data scientists, we may define Occam’s razor as a Universal distribution on this collection of predictive models. But, the notion of a Universal distribution wasn’t made precise until Solomonoff developed the theory of Algorithmic Probability which has become known as Solomonoff Induction. This theory has a number of important implications.

This is important from an epistemological point of view for the following reason made precise by Einstein:

The supreme task of the physicist is to arrive at those universal elementary laws from which the cosmos can be built up by pure deduction. There is no logical path to these laws; only intuition, resting on sympathetic understanding of experience, can reach them. In this methodological uncertainty, one might suppose that there were any number of possible systems of theoretical physics all equally well justified; and this opinion is no doubt correct, theoretically. But the development of physics has shown that at any given moment, out of all conceivable constructions, a single one has always proved itself decidedly superior to all the rest. Nobody who has really gone deeply into the matter will deny that in practice the world of phenomena uniquely determines the theoretical system, in spite of the fact that there is no logical bridge between phenomena and their theoretical principles; this is what Leibnitz described so happily as a “pre-established harmony.”-Einstein

One notable exception to Einstein’s analysis is Quantum Theory, which today has a number of incompatible formulations ranging from De Broglie-Bohm theory, Everettian Quantum Mechanics, the Von Neumann-Wigner formulation of Quantum Mechanics, and Gerard’t Hooft’s Cellular Automaton approach to Quantum Mechanics. However, the exception proves the rule and it is reasonable to suppose that the present state of Quantum Mechanics is merely temporary.

On another level, the theory of Algorithmic Probability has important consequences for the way humans prove theorems which remain poorly understood.

The Universal Distribution:

There are two reasons why the Algorithmic Probability allows us to define a Universal A Priori Probability. First, although Kolmogorov Complexity is not computable for any data structure \(X\) its Minimum Description Length is independent of the choice of description language \(U\) due to the Invariance theorem [4]:

\[\begin{equation} \lvert K_U(X)-K_{U'}(X)\rvert \leq \text{Cst} \end{equation}\]

and so in a precise sense, asymptotic Kolmogorov Complexity results are Universal. Second, Levin’s Coding theorem, asserts that [3]:

\[\begin{equation} -\log_2 m(X) = K_U(X) + \mathcal{O}(1) \end{equation}\]

where \(m(X)\) is the Algorithmic Probability of \(X\).


One may ask what such a theory is good for. On the one hand, Algorithmic Probability is a key pillar of Deep Mind’s scientific framework for developing Artificial General Intelligence [6]. On another hand, there is an important collection of non-trivial results in number theory that could not have been deduced from empirical observations. Given that Algorithmic Probability is not computable, might some scientists such as Einstein and Ramanujan have developed the ability to perform approximate Solomonoff Induction if they were not performing scientific induction?

While this question has been partially addressed by Schmidhuber’s invention of the Gödel machine, which he calls an Optimal Ordered Problem Solver(OOPS), it is worth noting that human mathematicians do not behave like Gödel machines in their approach to theorem proving.

Thus we have a genuine scientific mystery that leaves something to the imagination.


  1. Marcus Hutter et al. (2007) Algorithmic probability. Scholarpedia, 2(8):2572.
  2. R. J. Solomonoff. A formal theory of inductive inference: Parts 1 and 2. Information and Control, 7:1–22 and 224–254, 1964.
  3. L.A. Levin. Laws of information conservation (non-growth) and aspects of the foundation of probability theory. Problems Information Transmission, 10(3):206-210, 1974.
  4. Kolmogorov A.N. Three approaches to the quantitative definition of information. Problems Inform. Transmission, 1(1):1–7, 1965.
  5. BubbleZ (https://mathoverflow.net/users/470546/bubblez), Theorems that are essentially impossible to guess by empirical observation, URL (version: 2021-12-29): https://mathoverflow.net/q/412762
  6. Shane Legg. Machine Super Intelligence. 2008.
  7. Schmidhuber, J. (2005). Gödel machines: Self-referential universal problem solvers making provably optimal self-improvements. In Artificial general intelligence. Springer, in press.


For attribution, please cite this work as

Rocke (2022, Jan. 3). Kepler Lounge: Occam's razor. Retrieved from keplerlounge.com

BibTeX citation

  author = {Rocke, Aidan},
  title = {Kepler Lounge: Occam's razor},
  url = {keplerlounge.com},
  year = {2022}