Motivation:

Over the last three years, from the application of machine learning methods to different problems in science and discussions with other scientists, the issue of whether the scientific method is observer independent has been a recurring question. This appears to date back to Planck who supplied a useful intuition:

Science can’t solve the ultimate mystery of nature. And that is because, in the last analysis, we ourselves are a part of the mystery that we are trying to solve.-Planck

In my attempts to address Planck’s claim, I managed to derive a direct correspondence between incompressibility, epistemic uncertainty and incompleteness. In particular, I found a mathematical inequality from which we may deduce that science is not observer independent.

However, perhaps the biggest challenge involves formulating the problem of scientific measurement and scientific induction in a clear and mathematically precise manner.

The general problem of scientific measurement:

The general problem of the mathematical sciences involves describing the behaviour of dynamical systems in terms of the solutions to certain classes of differential equations. Now, given that any partial derivative may be expressed as a ratio of two observables that generally have distinct units and more generally the chain rule in multivariable calculus implements a product of scientific units, dimensional independence is an essential requirement of any system of fundamental units $$U=\{u_i\}_{i=1}^n$$.

All other scientific types are derived from $$U$$ so if $$\mathbb{C}$$ denotes the set of composite scientific types:

$$\forall c \in \mathbb{C}, \exists \alpha_i \in \mathbb{Z}, c = \prod_{i=1}^n u_i^{\alpha_i}$$

Furthermore, if we define the n-dimensional vector space:

$$\text{span}(\log U) = \big\{ \sum_{i=1}^n \alpha_i \log u_i \lvert u_i \in U, \alpha_i \in \mathbb{Z} \big\}$$

the $$u_i$$ are dimensionally independent in the sense that $$\forall u_i,u_{j\neq i} \in U, \log u_i \perp \log u_{j \neq i}$$:

$$\exists \alpha_i \in \mathbb{Z}, \sum_{i=1}^n \alpha_i \log u_i \iff \alpha_i = 1 \land \alpha_{j \neq i} = 0$$

It follows that all scientific types are derived types relative to $$U$$ and from (4) we may deduce that all natural signals have a unique factorisation in terms of $$U$$. This result is generally known as the Buckingham-Pi theorem and the International System of Units is a concrete instance of this theorem. These are defined in terms of the fundamental constants $$\hbar, c, e, k_B...$$ for two reasons. First, these constants are dimensionally independent as desired. Second, these are constants so these definitions are not expected to change.

Paradoxically, as all the relations in science are defined in terms of the fundamental constants these represent the epistemic limits of all the quantitative sciences. For this reason, high-precision measurements of the fundamental constants not only advance the frontiers of high-precision engineering but also provide fundamental tests of QFT. As Poincaré pointed out scientists only discover the relation between things and not the things themselves. I am digressing a little but I shall return to this point later on.

A key reason why modern science has been so successful is that only a small number of observables are dimensionally independent, and this insight leads us to a different approach to scientific discovery that is equivalent in power to the approach of classifying and analysing systems of differential equations that most scientists are familiar with.

The general problem of scientific induction:

In principle, any physical system may be modelled as a discrete dynamical system:

$$\forall k \in \mathbb{Z}, x_{k+1} = T \circ x_k$$

where, without loss of generality, $$x_k \subset S \subset \mathbb{R}^n$$, $$k$$ is a discrete time index and $$T:S \rightarrow S$$ is a dynamic map. This representation is epistemologically sound as data collected from dynamical systems always comes in discrete-time samples.

Within this context, we may represent data as evaluations of functions of the state $$x_k$$, known as observables. In fact, if $$g: S \rightarrow \mathbb{R}$$ is an observable associated with the system then the collection of all such observables forms a vector space due to the Buckingham-Pi theorem.

Now, the Koopman operator $$\Psi$$ is a linear transform on this vector space given by:

$$\Psi g(x) = g \circ T(x)$$

which in a discrete setting implies:

$$\Psi g(x_k) = g \circ T(x_k) = g(x_{k+1})$$

where the linearity of the Koopman operator follows from the linearity of the composition operator:

$$\Psi \circ (g_1 + g_2)(x) = g_1 \circ T(x) + g_2 \circ T(x)$$

So we may think of the Koopman operator as lifting dynamics of the state space to the space of observables.

In this setting of Koopman operators, we may formulate the general problem of scientific induction as the challenge of discovering a machine learning model $$M$$ that approximates the eigenfunctions of $$\Psi$$ so for a dataset $$X=\{X_{i+1},X_i\}$$ we are looking for $$M$$ such that:

$$\hat{X}_{i+1} = M \circ X_i$$

$$\lVert \widehat{X_{i+1}} - X_{i+1} \rVert \approx 0$$

In general $$S$$ is a complex system and $$M$$ has bounded resources, so the statistical and epistemic uncertainty associated with $$S$$ relative to $$M$$ is best handled by a family of probabilistic models $$P_M$$. So the general problem involves discovering a machine learning model $$\Omega \in P_M$$ such that the entropy satisfies:

$$H(P(X_{i+1} \lvert X_i)) \approx 0$$

where (10) encodes both the statistical and inductive generalisation aspects of scientific induction.

On the equivalence of incompressibility and incompleteness in the natural sciences:

Given our formulation of scientific induction, the Minimum Description Length principle implies that the Kolmogorov Complexity $$K(\cdot)$$ of our dataset $$X$$ is given by:

$$K_P(X) = \min_{P \in P_M} (K_P(X) + \lvert P \rvert) = \min_{P \in P_M} K(X \lvert P) + K(P)$$

so there is an implicit tradeoff between model accuracy and model complexity. Now, let’s suppose $$\exists \Omega \in P_M$$ such that:

$$K(X|\Omega) + K(\Omega) = \min_{P \in P_M}(K(X \lvert P) + K(P))$$

as $$\Omega$$ is a probabilistic program, we may represent the expected Kolmogorov Complexity as:

$$\mathbb{E}[K(X)] = \mathbb{E}[K(X|\Omega)] + \mathbb{E}[K(\Omega)]$$

where the expectation is calculated with the distributions of $$\Omega$$ and $$P(X)$$. Now, since the expected Kolmogorov Complexity equals the Shannon entropy we find:

$$\mathbb{E}[K(X)] = H(X|\Omega) + H(\Omega)$$

where the implicit assumption in (15) is that in the regime of repeatable scientific experiments, ergodic assumptions are satisfied.

From (15) we may deduce the following lower bound:

$$\mathbb{E}[K(X)] \geq H(\Omega)$$

This lower-bound addresses Planck’s claim as it shows that science is not observer independent. In particular, the structure underlying the fundamental axioms embedded in $$\Omega$$ are beyond the knowledge of $$\Omega$$. To be precise, the epistemic uncertainty of $$\Omega$$ concerning itself is exactly the Minimum Description Length of $$\Omega$$ relative to a universal computer. As any computational process is ultimately a mathematical description of a physical process, we may identify such a universal simulator with the universe itself. Such an identification is consistent with the implications of the Physical Church-Turing thesis.

Finally, since $$\Omega$$ implicitly contains a theory of computation and all such theories require arithmetic an optimal encoding of arithmetic is beyond the knowledge of $$\Omega$$ independently of the computational resources available to $$\Omega$$.

Discussion:

I will note that we may derive an upper-bound for (14) by a direct application of the Shannon source-coding theorem assuming that the dataset $$X$$ consists of $$N$$ samples at each time step:

$$\mathbb{E}[K(X)] \leq N \cdot H(X)$$

However, the reader might have a bigger question. Why might the prime numbers be beyond the epistemic limits of scientific knowledge?

My intuition for why the primes are incompressible rests upon two reasons that are very similar to the reasons why the fundamental constants represent the epistemic limits of the totality of science. First, they are dimensionally independent. By this I mean that we may define the infinite-dimensional vector space:

$$\text{span}(\log \mathbb{P}) = \Big\{\sum_{i=1}^\infty \alpha_i \log p_i \lvert p_i \in \mathbb{P}, \alpha_i \in \mathbb{Z}\Big\}$$

where $$\log \mathbb{Q_+} \subset \text{span}(\log \mathbb{P})$$, and $$\forall p_j, p_{i \neq j} \in \mathbb{P}, \log p_{i \neq j} \perp \log p_j$$ since:

$$\exists \alpha_i \in \mathbb{Z}, \sum_{i=1}^\infty \alpha_i \log p_i = \log p_j \iff \alpha_j = 1 \land \alpha_{i \neq j} = 0$$

which implies that prime factorisations are unique.

Second, any mathematical system that is sufficiently powerful to formulate a general theory of computation must contain arithmetic and since all data types are derived from the integers which have a unique representation in terms of the primes, it follows that all the other types in such a system are derived types relative to the prime numbers.

References:

1. Bernard Koopman. Hamiltonian systems and Transformations in Hilbert Space.
2. Steven L. Brunton. Notes on Koopman operator theory. 2019.
3. Peter D. Grünwald. The Minimum Description Length Principle . MIT Press. 2007.
4. M. Li and P. Vitányi. An Introduction to Kolmogorov Complexity and Its Applications. Graduate Texts in Computer Science. Springer. 1997.
5. Fine & Rosenberger. Number Theory: An Introduction Via the Distribution of Primes. 2007.
6. Copeland, B. Jack, “The Church-Turing Thesis”, The Stanford Encyclopedia of Philosophy (Summer 2020 Edition), Edward N. Zalta (ed.), URL = https://plato.stanford.edu/archives/sum2020/entries/church-turing/.
7. Henri Poincaré. Science et Hypothèse. Champs sciences. 2014.
8. Bertrand Russell. Human Knowledge: Its Scope and Limits. 2009.