## Compressibility implies learnability:

Let’s suppose we have a dynamical system:

$$\forall x_n \in X, x_{n+1} = f \circ x_n$$

from which we may collect samples:

$$X_N = \{x_n\}_{n=1}^N$$

We say that the data-stream $$X_N$$ is compressible if its algorithmic information content is bounded:

$$\exists k,C \in \mathbb{N} \forall N \in \mathbb{N}, K(X_N) \leq k \cdot \log(C)$$

where $$\log(C)$$ denotes the average number of bits required to encode a state in the state-space $$X$$ and $$X_k$$ represents the maximum number of samples required by the optimal learning algorithm $$\mathcal{A}$$ in order to produce a predictive model $$M$$ that can reconstruct $$X_N$$ with negligible error. It follows that the sample-complexity of $$\mathcal{A}$$ is bounded by $$k$$ and the state-transition map $$f$$ is PAC-learnable.

Moreover, if we define the expectation:

$$\mathbb{E}[K(X_N)] = \lim_{n \to \infty} \frac{1}{n} \sum_{N=1}^n K(X_N)$$

then we may express (3) succinctly by the asymptotic compressibility relation:

$$\mathbb{E}[K(X_N)] \leq k \cdot \log(C)$$

which may be understood to mean that the average size of a PAC-learnable machine learning model $$M$$ that approximates $$f$$ is not greater than $$k \cdot \log(C)$$.

Note: It is worth noting that (5) implicitly assumes that $$f$$ simulates an ergodic process. In fact, $$f$$ would not be learnable otherwise.

## State-transition maps that are not learnable are not compressible:

On the other hand, let’s suppose that instead of the situation in (3) that $$X_N$$ is Schnorr-random so no martingale succeeds on it and so the sample-complexity of $$\mathcal{A}$$ scales with $$N$$:

$$\forall N \in \mathbb{N}, K(X_N) \sim N$$

hence the sample-complexity is unbounded and therefore $$f$$ is not learnable.

From (6) we may deduce the asymptotic incompressibility relation:

$$\mathbb{E}[K(X_N)] \sim N$$

so we may conclude that $$X_N$$ is generally incompressible.

## References:

$$1.$$ Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press. 2014.

$$2.$$ Kakade, Sham (2003), On the Sample Complexity of Reinforcement Learning (PDF), PhD Thesis, University College London: Gatsby Computational Neuroscience Unit.

$$3.$$ Leslie Valiant. Probably Approximately Correct: Nature’s Algorithms for Learning and Prospering in a Complex World. Basic Books. 2013.

$$4.$$ Peter D. Grünwald. The Minimum Description Length Principle . MIT Press. 2007.

$$5.$$ Schnorr, C. P. (1971). “A unified approach to the definition of a random sequence”. Mathematical Systems Theory.

$$6.$$ A. N. Kolmogorov Three approaches to the quantitative definition of information. Problems of Information and Transmission, 1(1):1–7, 1965