Motivation:

In this brief article, I show how Shannon’s original coding theorem whose focus was fault-tolerant communication may be extended to theories of computation in order to formulate an average-case Occam’s razor.

Prefix-free Turing Machines:

Let’s suppose that \(\Sigma_1 = \Sigma_2 = \{0,1\}\) and that \(X\) is Bernoulli:

\begin{equation} P(X=1) = p \end{equation}

\begin{equation} P(X=0) = 1-p \end{equation}

and without loss of generality, let’s suppose that \(\Sigma_1^*\) and \(\Sigma_2^*\) both define binary representations of computer programs with the restriction that \(\Sigma_2^*\) forms a prefix-free set.

Prefix-free Kolmogorov Complexity:

In this setting, we may define prefix-free Kolmogorov Complexity:

\begin{equation} K: \Sigma_1^* \rightarrow \Sigma_2^* \end{equation}

\begin{equation} \forall x_{1:k} \sim X^k, K(x_{1:k}) = \min_{p \in \Sigma_2^*} \{l(p): U(p)= x_{1:k}\} \end{equation}

where \(U\) is understood to be a prefix-free Universal Turing Machine.

Given that \(K(\cdot)\) is prefix-free, it is uniquely-decodable and therefore it satisfies Kraft’s inequality:

\begin{equation} \sum_{x \in \Sigma_1^*} 2^{-K(x)} \leq 1 \end{equation}

The algorithmic complexity of randomly generated strings:

Given that almost all strings are incompressible, in order to determine the algorithmic information content of \(X^N \subset \Sigma_1^*\) it is necessary and sufficient for us to consider the cardinality of \(X^N\) where \(N\) is large. For large \(N\), \(|X^N|\) is well-approximated by:

\begin{equation} \frac{N!}{(p \cdot N)! ((1-p) \cdot N)!} \sim \frac{N^N}{(p \cdot N)^{p \cdot N} ((1-p) \cdot N)^{(1-p) \cdot N}} = 2^{N \cdot H(X)} \end{equation}

where \(H(X)\) is the Shannon entropy of our Bernoulli variable \(X\):

\begin{equation} H(X) = -p \cdot \log_2 p - (1-p)\log_2 (1-p) \end{equation}

It follows that:

\begin{equation} \mathbb{E}[K(X^N)] \approx N \cdot H(X) \end{equation}

References:

  1. A. N. Kolmogorov Three approaches to the quantitative definition of information. Problems of Information and Transmission, 1(1):1–7, 1965

  2. C.E. Shannon, “A Mathematical Theory of Communication”, Bell System Technical Journal, vol. 27, pp. 379–423, 623-656, July, October, 1948.

  3. David J. C. MacKay. Information Theory, Inference, and Learning Algorithms Cambridge: Cambridge University Press, 2003.