Revisiting the unreasonable effectiveness of mathematics
Introduction:
62 years since Eugene Wigner’s highly influential essay on the unreasonable effectiveness of mathematics in the natural sciences, I think it may be time for a reappraisal. On balance, with important theoretical advances in algorithmic information theory and Quantum Computation it appears that the remarkable effectiveness of mathematics in the natural sciences is quite reasonable.
By effectiveness, I am specifically referring to Wigner’s observation that mathematical laws have remarkable generalisation power.
An informationtheoretic perspective:
What I have retained from my discussions with physicists and other natural scientists is that the same mathematical laws with remarkable generalisation power are also constrained by Occam’s razor. Given two computable theories, Einstein explicitly stated that a physicist ought to choose the simplest theory that yields negligible experimental error:
It can be scarcely denied that the supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience.Einstein(1933)
In fact, from an informationtheoretic perspective the remarkable generalisation power of mathematical laws in the natural sciences is a direct consequence of the effectiveness of Occam’s razor.
The Law of Conservation of Information:
From an informationtheoretic perspective, a Universe where Occam’s razor is generally applicable is one where information is generally conserved. This law of conservation of information which dates back to von Neumann essentially states that the von Neumann entropy is invariant to Unitary transformations. This is meaningful within the framework of Everettian Quantum Mechanics as a density matrix may be assigned to the state of the Universe. This way information is conserved as we run a simulation of the Universe forwards or backwards in time.
It follows that the Law of Conservation of information has the nontrivial implication that fundamental physical laws are generally timereversible. Moreover, given that Occam’s razor has an appropriate formulation within the context of algorithmic information theory as the Minimum Description Length principle, this informationtheoretic analysis generally presumes that the Universe itself may be simulated by a Universal Turing Machine.
The Physical ChurchTuring thesis:
The research of David Deutsch(and others) on the Physical ChurchTuring thesis explains how a Universal Quantum computer may simulate the laws of physics. This is consistent with the general belief that Quantum Mechanics may be used to simulate all of physics so the most important contributions to the Physical ChurchTuring thesis have been via theories of quantum computation.
More importantly, the Physical ChurchTuring thesis provides us with a credible explanation for the remarkable effectivenss of mathematics in the natural sciences.
What is truly remarkable:
If we view the scientific method as an algorithmic search procedure then there is no reason, a priori, to suspect that a particular inductive bias should be particularly powerful. This much was established by David Wolpert in his No Free Lunch theorems [10].
On the other hand, the history of the natural sciences indicates that Occam’s razor is remarkably effective. The effectiveness of this inductive bias has recently been used to explain the generalisation power of deep neural networks [11].
References:

Eugene Wigner. The Unreasonable Effectiveness of Mathematics in the Natural Sciences. 1960.

David Deutsch. Quantum theory, the Church–Turing principle and the universal quantum computer. 1985.

Peter D. Grünwald. The Minimum Description Length Principle . MIT Press. 2007.

A. N. Kolmogorov Three approaches to the quantitative definition of information. Problems of Information and Transmission, 1(1):1–7, 1965

G. J. Chaitin On the length of programs for computing finite binary sequences: Statistical considerations. Journal of the ACM, 16(1):145–159, 1969.

R. J. Solomonoff A formal theory of inductive inference: Parts 1 and 2. Information and Control, 7:1–22 and 224–254, 1964.

Michael Nielsen. Interesting problems: The ChurchTuringDeutsch Principle. 2004. https://michaelnielsen.org/blog/interestingproblemsthechurchturingdeutschprinciple/

Marcus Hutter et al. (2007) Algorithmic probability. Scholarpedia, 2(8):2572.

The Evolution of Physics, Albert Einstein & Leopold Infeld, 1938, Edited by C.P. Snow, Cambridge University Press

Wolpert, D.H., Macready, W.G. (1997), “No Free Lunch Theorems for Optimization”, IEEE Transactions on Evolutionary Computation 1, 67.

Guillermo Valle Pérez, Chico Camargo, Ard Louis. Deep Learning generalizes because the parameterfunction map is biased towards simple functions. 2019.