Abstract:

The main contribution of this article is to show that within the context of adaptive agents the Causal Path Entropy and Empowerment are equivalent only in deterministic environments. In non-deterministic environments, it is shown that the Causal Path Entropy generally under-estimates the number of intrinsic options available to an agent unlike Empowerment. In fact, it is shown that the difference between Causal Path Entropy and Empowerment can’t be increased without diminishing Empowerment.

Causal Path Entropy for adaptive agents:

We shall start by defining the notion of Causal Path Entropy, introduced in [1], show how it simplifies to a conditional Shannon entropy for digital organisms, and extend it so that it explicitly accounts for actions taken by an organism.

For any open thermodynamic system such as a biological organism we may treat phase-space paths taken by the system over a time interval as microstates and partition them into macrostates using the equivalence relation:

\begin{equation} x(t) \sim x’(t) \iff x(0)=x’(0) \end{equation}

As a result, we can identify each macrostate with a present system state .

We may then define the Causal Path Entropy of a macrostate associated with the present system state as the path integral:

\begin{equation} S_c(X_i,\tau)=-k_B \int_{x(t)} P(x(t)|x(0))\ln P(x(t)|x(0)) Dx(t) \end{equation}

where is the Boltzmann constant and it must be noted that in order to calculate we need the state-transition probability distribution which corresponds to an exact simulator of the agent’s environment. Given that this is generally unknown to the agent at the instant , macrostates are generally unknown to the agent as well, and therefore it’s more epistemically sound to denote the Causal Path Entropy as:

\begin{equation} S_c(x(0))=-k_B \int_{x(t)} p(x(t)|x(0))\ln p(x(t)|x(0)) Dx(t) \end{equation}

where denotes a subjective state-transition probability distribution.

Now, if the organism is digital(i.e. simulated by a Turing machine) we may drop the Boltzmann constant and a discrete phase-space implies that simplifies to the Shannon ‘path entropy’:

In order for the calculation of Causal Path Entropy to be useful for a digital organism we must explicitly account for its agency which is determined by its capacity for rational action in the world. If denotes a discrete action space and denotes a discrete state space:

\begin{equation} p(x_n|x_0)= \frac{p(x_n,a_{1:n}|x_0)}{p(a_{1:n}|x_n,x_0)} \end{equation}

where is an n-tuple of actions and .

Now, we may note that the numerator of may be expressed in terms of the agent’s conditional distribution over n-step action sequences:

\begin{equation} p(x_n,a_{1:n}|x_0)= w(a_{1:n}|x_0)p(x_n|a_{1:n},x_0) \end{equation}

By combining and we have:

\begin{equation} p(x_n|x_0)= \frac{w(a_{1:n}|x_0)p(x_n|a_{1:n},x_0)}{p(a_{1:n}|x_n,x_0)} \end{equation}

Using the Causal Path Entropy becomes:

\begin{equation} S_c(x_0) = \max\limits_{w} \mathbb{E} \big[ \ln \big( \frac{p(a_{1:n}|x_n,x_0)}{w(a_{1:n}|x_0)p(x_n|a_{1:n},x_0)}\big)\big] \end{equation}

Hence, we have:

\begin{equation} S_c(x_0) = \max\limits_{w} \big[H(a_{1:n}|x_0)-H(a_{1:n}|x_n,x_0) +H(x_n|a_{1:n},x_0)\big] \end{equation}

shall be useful in analysing the difference between the Causal Path Entropy and Empowerment, which we shall now introduce.

Empowerment:

We shall introduce the n-step empowerment as was done in [3] where the n-step empowerment is defined by searching for the maximal mutual information conditional on a starting state between a sequence of actions and the final state reached :

\begin{equation} \xi(x_0) = \max\limits_{w} I(a_{1:n},x_n|x_0)=\max\limits_{w} \mathbb{E} \big[ \ln \big( \frac{p(a_{1:n},x_n|x_0)}{w(a_{1:n}|x_0)p(x_n|x_0)}\big)\big] \end{equation}

Hence, may be expressed as the difference of two conditional Shannon entropies:

\begin{equation} \xi(x_0) = \max\limits_{w} \big[H(a_{1:n}|x_0)-H(a_{1:n}|x_n,x_0) \big] \end{equation}

Analysis of equivalence:

If we combine and we find that the Causal Path Entropy at may be expressed in terms of the Empowerment at :

\begin{equation} S_c(x_0) = \xi(x_0) + \max\limits_{w} \big[H(x_n|a_{1:n},x_0)\big] \end{equation}

Therefore, in order to have equivalence we must have:

\begin{equation} H(x_n|a_{1:n},x_0) = 0 \end{equation}

which is true if and only if (where denotes XOR) and this is the case only in deterministic environments.

It must be noted that in deterministic environments simplifies to:

\begin{equation} S_c(x_0) = \xi(x_0) = \ln N_{x_0} \end{equation}

where represents the number of intrinsic options available at .

Discussion:

Whenever we must have so the Causal Path Entropy provides intrinsic compensation for:

  1. Exploring unpredictable environments.
  2. Exploring unknown environments.
  3. Unreliable actuators.
  4. Unreliable sensors.

To be precise, maximisation of corresponds to making actions maximally uninformative about the terminal state . It follows that the Causal Path Entropy does less than accurately measure an agent’s number of intrinsic options. This is especially clear if we use and to re-formulate the Empowerment of the agent at :

\begin{equation} \xi(x_0) = S_c(x_0) - \max\limits_{w} \big[H(x_n|a_{1:n},x_0)\big] = \max\limits_{w} \big[H(x_n|x_0) -H(x_n|a_{1:n},x_0)\big] \end{equation}

From we deduce that in non-deterministic environments the difference between Causal Path Entropy and Empowerment, i.e. , can’t be increased without diminishing Empowerment.

References:

  1. Gross, A. Wissner. (2013) Causal Entropic Forces. Physical Review Letters.
  2. Salge, C., Glackin, C. & Polani, D. Empowerment-An Introduction. Arxiv.
  3. Mohamed, S., Rezende, D. Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning. Arxiv.