The usual touchstone, whether that which someone asserts is merely his persuasion — or at least his subjective conviction, that is, his firm belief — is betting. It often happens that someone propounds his views with such positive and uncompromising assurance that he seems to have entirely set aside all thought of possible error. A bet disconcerts him. Sometimes it turns out that he has a conviction which can be estimated at a value of one ducat, but not of ten. For he is very willing to venture one ducat, but when it is a question of ten he becomes aware, as he had not previously been, that it may very well be that he is in error.-Immanuel Kant


Although there are three equivalent paths to algorithmic randomness, these are not all equally intuitive as this depends very much on the particular problem being considered and the particular audience that is being addressed. In particular, I would argue that Schnorr’s martingale characterisation is the most intuitively useful to a scientist with an understanding of machine learning as it may be formulated as a game of chance.

The objective of this article is to develop this intuition into a rigorous formulation using precise notions from the theory of machine learning.

Asymptotic incompressibility and effective betting strategies:

Given a binary sequence \(X_N = \{x_i\}_{i=1}^N\), we say that \(X_N\) is asymptotically incompressible if given the subsequence \(X_k\) and \(N >> k\) on average we would not profit by gambling on the \(N-k\) terms in \(X_N\) based on the partial knowledge provided by \(X_k\). If \(X_N\) satisfies these assumptions then we may state that for large \(N\):

\begin{equation} \mathbb{E}[K(X_N)] \sim N \end{equation}

Moreover, if we may assume that there exists a deterministic function \(f\) such that:

\begin{equation} x_{n+1} = f \circ x_n \end{equation}

an effective betting strategy is reducible to approximating \(f\) using a machine learning algorithm provided with the dataset \(X_k\). The non-existence of such a strategy may be precisely formulated via an invariance theorem for algorithmically random data.

An invariance theorem for algorithmically random data:

In consequence, (1) implies that for a particular sample \(X_N \sim X\) it may be possible to find \(f \in F_{\theta}\) such that \(\forall n \in [1,2N-1], f(x_n) = x_{n+1}\) on the condition that the Kolmogorov Complexity:

\begin{equation} K(f) \sim 2N \end{equation}

as this would amount to memorisation and not discovering regularities in \(X_N\) since \(X\) is algorithmically random relative to \(F_{\theta}\). Moreover, given that

\begin{equation} f(x_n)=x_{n+1} \implies \delta_{f(x_n),x_{n+1}} = 1 \end{equation}

for any candidate solution \(\hat{f} \in F_{\theta}\) and for large \(N\), the expected performance on the test set is approximated by:

\begin{equation} \frac{1}{N-1} \sum_{n=N}^{2N-1} \delta_{\hat{f}(x_n),x_{n+1}} \leq \frac{1}{2} \end{equation}

Furthermore, it may be shown that the expected performance is invariant to transformations \(T\) that don’t change the phase-space dimension of the processes responsible for generating the signal \(X_N\):

\begin{equation} X’_N = T \circ X_N \end{equation}

Thus we have an invariance theorem for algorithmically random data which may be used to perform empirical tests using machine learning to determine whether a dataset was sampled from an algorithmically random process.


In the event that the binary sequence \(X_N\) doesn’t have parity in terms of the number of ones and zeroes, we may use the true-positive rate which generalises (5) in the setting of imbalanced data.


  1. Aidan Rocke (, An invariance theorem for algorithmically random data in statistical learning, URL (version: 2021-02-22):
  2. Peter Grünwald and Paul Vitányi. Shannon Information and Kolmogorov Complexity. 2010.
  3. Jerome H. Friedman, Robert Tibshirani, and Trevor Hastie. The Elements of Statistical Learning. Springer. 2001.
  4. Andrew Chi-Chih Yao. Theory and applications of trapdoor functions. In Proceedings of the 23rd IEEE Symposium on Foundations of Computer Science, 1982.