In this treatise we develop the thesis of J.L. Kelly and T.M. Cover that the financial value of side information is equal to the mutual information between a wager and the side information. From this principle, we may deduce that a market is efficient to the degree that it reflects all available information via data compression. It follows that the intelligent investor must operate within a theoretically sound framework such as the Kolmogorov Structure Function, for data-driven encoding of probability distributions.

Let’s assume that \(m\) horses compete in a race, and the ith horse wins with probability \(p_i\). If horse \(i\) wins, the payoff is \(o_i\) for \(1\) dollar and \(0\) dollars if horse \(i\) loses.

There are two ways of describing odds:

\(\alpha\) for 1: The gambler puts down one dollar before the race and receives \(\alpha\) dollars if his horse wins, and nothing otherwise.

\(\beta\) to 1: The gambler pays one dollar after the race if his horse loses, and will pick up \(\beta\) dollars after the race if his horse wins.

It follows that a bet at \(\beta\) to 1 odds is equivalent to a bet at \(\alpha\) for 1 odds if \(\beta=\alpha -1\).

We shall assume that the gambler distributes all of his wealth across the horses:

Let \(b_i\) be the fraction of the gamblers wealth invested in horse \(i\), where \(b_i \geq 0\) and \(\sum_i b_i = 1\).

If horse \(i\) wins the race, the gambler will receive \(o_i\) times the amount of wealth bet on horse \(i\).

Thus, at the end of the race the payoff factor for horse \(i\) is \(b_i \cdot o_i\) which occurs with probability \(p_i\).

The wealth at the end of the race is a random variable, and the gambler aims to maximise the volume of this random variable. This leads to a natural question concerning the optimal investment strategy.

Should the intelligent investor aim to maximise the expected return or maximise the expected growth rate of their investment?

Let’s consider repeated gambles on the same horse race. As the gambler can reinvest his money, his wealth is the product \(S_n\) of the gains of each race:

\[\begin{equation} S_n = \prod_{i=1}^n S(X_i) \tag{1} \end{equation}\]

where \(S(X)=b(X)o(X)\), the wealth relative, is the factor by which the gambler’s wealth grows if horse \(X\) wins.

The doubling rate of a horse race is given by:

\[\begin{equation} W(\vec{b},\vec{p}) = \mathbb{E}[\log S(X)] = \sum_{k=1}^m p_k \log b_k o_k \tag{2} \end{equation}\]

where this definition is justified by the Doubling rate theorem.

Let the race outcomes \(X_1, X_2, ...,X_n\) be i.i.d. sampled from \(P(X)\) where \(f\) is a probability distribution. Then the wealth of the gambler using betting strategy \(\vec{b}\) grows exponentially at the rate \(W(\vec{b},\vec{p})\):

\[\begin{equation} S_n = 2^{n W(\vec{b},\vec{p})} \tag{3} \end{equation}\]

\(\log S(X_1), \log S(X_2), ..., \log S(X_n)\) are i.i.d. so by the Weak Law of Large Numbers,

\[\begin{equation} \frac{1}{n}\log S_n \rightarrow \mathbb{E}[\log S(X)] \tag{4} \end{equation}\]

and therefore,

\[\begin{equation} S_n = 2^{nW(\vec{b},\vec{p})} \tag{5} \end{equation}\]

and since the gambler’s wealth grows as \(2^{nW(\vec{b},\vec{p})}\), we aim to maximize the exponent \(W(\vec{b},\vec{p})\) over all choices of portfolio \(\vec{b}\).

The maximal doubling rate over all choices of portfolio \(\vec{b}\) is given by:

\[\begin{equation} W^*(\vec{p}) = \max_{\vec{b}}W(\vec{b},\vec{p})=\max_{\vec{b}}\sum_{i=1}^m p_i \log b_i o_i \tag{6} \end{equation}\]

Using the method of Lagrange Multipliers, we may maximise \(W(\vec{b},\vec{p})\) as a function of \(\vec{b}\) subject to the constraint \(\sum_{i} b_i =1\):

\[\begin{equation} J(\vec{b})=\sum_i p_i \log b_i o_i + \lambda \sum_i b_i \tag{7} \end{equation}\]

Differentiating with respect to \(b_i\) yields,

\[\begin{equation} \forall i, \frac{\partial J}{\partial b_i} = \frac{p_i}{b_i} + \lambda \tag{8} \end{equation}\]

and we find:

\[\begin{equation} \forall i, \frac{\partial J}{\partial b_i} = 0 \implies b_i = -\frac{p_i}{\lambda} \tag{9} \end{equation}\]

which implies:

\[\begin{equation} \sum_i b_i = 1 \implies \lambda = -1 \land b_i = p_i \tag{10} \end{equation}\]

and therefore \(\vec{b}=\vec{p}\) is a stationary point of \(J(\vec{b})\).

Furthermore, we may actually demonstrate that \(\vec{b}=\vec{p}\) maximises \(J(\vec{b})\) so we have:

\[\begin{equation} W^*(\vec{b}) = W(\vec{p},\vec{p}) \tag{11} \end{equation}\]

The optimal doubling rate is given by:

\[\begin{equation} W^*(\vec{p}) = \sum_i p_i \log o_i - H(\vec{p}) \tag{12} \end{equation}\]

and is achieved by the proportional gambling scheme \(\vec{b}^* = \vec{p}\).

Using the KL-Divergence, we find:

\[\begin{equation} W(\vec{b},\vec{p}) = \sum_i p_i \log b_i o_i = \sum_i p_i \log \big(\frac{b_i}{p_i} p_i o_i\big) = \sum_i p_i \log o_i - H(\vec{p}) -D(\vec{p}\lvert \vec{b}) \leq \sum_i p_i \log o_i - H(\vec{p}) \tag{13} \end{equation}\]

with equality if and only if \(\vec{b} = \vec{p}\).

Let’s suppose the gambler has some information that is relevant to the outcome of the gamble. What is the value of this side information? Based on our prior analysis, the measure of the value of information is the increase in the doubling rate due to that information.

In fact, we may derive a connection between the increase in the doubling rate and the mutual information.

To make this connection precise, we may define:

Horse \(X \in [1,m]\) wins the race with probability \(p(x)\) and pays odds of \(o(x)\)-for-1.

Let \((X,Y)\) have joint probability mass function \(p(x,y)\).

Let \(b(x\lvert y) \geq 0, \sum_x b(x \lvert y) = 1\) be an arbitrary conditional betting strategy depending on the side information \(Y\), where \(b(x \lvert y)\) is the proportion of wealth bet on horse race \(x\) when \(y\) is observed.

Finally, let \(b(x) \geq 0, \sum_x b(x) = 1\) denote the unconditional betting scheme. It follows that the unconditional and conditional doubling rates are:

\[\begin{equation} W^*(X) = \max_{b(x)} \sum_x p(x) \log b(x) o(x) \tag{14} \end{equation}\]

\[\begin{equation} W^*(X \lvert Y) = \max_{b(x\lvert y)} \sum_{x,y} p(x,y) \log b(x \lvert y) o(x) \tag{15} \end{equation}\]

and let:

\[\begin{equation} \Delta W = W^*(X \lvert Y) - W^*(X) \tag{16} \end{equation}\]

then we observe that for \((X_i,Y_i)_{i=1}^n\) i.i.d. horse races, wealth grows on the order of \(\sim 2^{n W^*(X \lvert Y)}\) with side information and on the order of \(\sim 2^{n W^*(X)}\) without side information.

Thus, we may deduce that an efficient market reflects all available information via data compression.

The increase \(\Delta W\) in doubling rate due to side information \(Y\) for a horse race \(X\) is given by:

\[\begin{equation} \Delta W = I(X;Y) \tag{17} \end{equation}\]

With side information, the maximum value of \(W^*(X \lvert Y)\) with side information \(Y\) is achieved by conditionally proportional gambling, i.e. \(b^*(x\lvert y) = p(x \lvert y)\). Therefore, \(W^*(X \lvert Y) = \max_{b(x|y)} \mathbb{E}[\log S] = \max_{b(x\lvert y)} \sum_{x,y} p(x,y) \log b(x \lvert y) o(x)\) which simplifies to:

\[\begin{equation} W^*(X \lvert Y) = \sum_{x,y} p(x,y) \log p(x \lvert y) o(x) = \sum_{x,y} p(x,y) \log o(x) - H(X|Y) \tag{18} \end{equation}\]

On the other hand, without side information the optimal doubling rate is:

\[\begin{equation} W^*(X) = \sum p(x) \log o(x) - H(X) \tag{19} \end{equation}\]

so the increase in doubling rate due to the presence of side information \(Y\) is given by:

\[\begin{equation} \Delta W = W^*(X|Y) - W^*(X) = H(X) - H(X|Y) = I(X;Y) \tag{20} \end{equation}\]

Hence, the increase in doubling rate equals the mutual information between the side information and the horse race.

Given the natural correspondence between machine learning and information theory, it is possible to build upon the theory we have introduced and develop a theoretically sound engineering framework for portfolio optimisation. In fact, we may leverage the Kolmogorov Structure Function \(\Phi\) for modelling stochastic signals sampled from a financial market.

Let’s suppose that \(x_{1:n} \sim f(X)\) where \(X\) is a finite dataset defined using the Asymptotic Equipartition Theorem, and \(f\) is a probability distribution such that \(\sum_x f(x) = 1\) then Levin’s Coding theorem tells us that there is a natural relationship between the algorithmic probability \(P(X)\) and the Kolmogorov Complexity \(K_U(X)\):

\[\begin{equation} -\log_2 P(X) = K_U(X) + \mathcal{O}(1) \tag{21} \end{equation}\]

so optimal inference is a natural consequence of lossless data compression.

Furthermore, if we use the fact that Expected Kolmogorov Complexity equals Shannon Entropy then we may define \(\Phi\) which allows us to quantify the information in \(x_{1:n} \sim f(X)\) as a superposition of algorithmic and ergodic components:

\[\begin{equation} \Phi(x_{1:n}) := \mathbb{E}[-\log_2 P(x_{1:n}|X)P(X)] = H(x_{1:n}) + K_U(X) + \mathcal{O}(1) \tag{22} \end{equation}\]

Here, \(K_U(X)\) is the description length of the intrinsic manifold the data is drawn from which describes the algorithmic structure of the stochastic signal whereas \(H(x_{1:n}) \approx n H(x)\) determines the magnitude of the from which \(x_{1:n}\) is sampled.

Besides allowing us to model stochastic processes using a two-tier process, in practical terms \(K_U(X)\) is modelled by the hidden layers of a deep neural network with low Kolmogorov Complexity whereas the Asymptotic Equipartition Theorem tells us that a dataset of magnitude \(|X| \sim 2^{H(x_{1:n})}\) is necessary and sufficient for supervised learning.

Hence, given that \(H(x_{1:n}) = \log_2 |X| + \mathcal{O}(1)\) we may deduce that (22) may be reformulated as:

\[\begin{equation} \Phi(x_{1:n}) := \log_2 |X| + K_U(X) + \mathcal{O}(1) \tag{23} \end{equation}\]

where \(X\) is the from which \(x_{1:n}\) is sampled. It is worth adding that this result may be readily generalised to non-stationary distributions.

At this point, the reader may surmise that if markets exhibited homeostatic behaviour then financial markets would be completely efficient and it would be nearly impossible to generate a substantial profit. In fact, we would have a perfect information game stuck in a Nash Equilibrium.

Fortunately, outside the five year plan of an authoritarian government, regime shifts occur all the time for a variety of reasons. If markets are allowed to evolve they will evolve as cultures change, some markets will collapse and new markets will emerge to replace them.

Our goal then from the perspective of portfolio optimisation is to detect these subtle changes before any other hedge fund using machine learning methods.

Terrence Tao. Structure and Randomness in Combinatorics. 2007.

Lance Fortnow. The Structure of Data and Machine Learning. 2022.

Vereshchagin and Vitányi. Kolmogorov’s Structure functions and Model Selection. 2004.

Schmidhuber, J. Discovering neural nets with low Kolmogorov complexity and high generalization capability. Neural Networks. 1997.

Guillermo Valle-Pérez, Chicho Camargo, Ard Louis. Deep learning generalizes because the parameter-function map is biased towards simple functions. Arxiv. 2018.

Daniel Bernoulli. Exposition of a New Theory of Measurement Risk. 1738.

Kelly, J. L. A New Interpretation of Information Rate. Bell System Technical Journal. 1956.

T.M. Cover. Elements of Information Theory. Wiley. 2006.

For attribution, please cite this work as

Rocke (2022, Dec. 15). Kepler Lounge: Gambling and Data Compression. Retrieved from keplerlounge.com

BibTeX citation

@misc{rocke2022gambling, author = {Rocke, Aidan}, title = {Kepler Lounge: Gambling and Data Compression}, url = {keplerlounge.com}, year = {2022} }