# Bellman’s Principle of Optimality and the Hamilton-Jacobi-Bellman equation

Any dynamical system that has a variational formulation admits an optimal controller within the framework of Hamilton-Jacobi-Bellman theory.

Aidan Rocke https://github.com/AidanRocke
11-08-2022

## Motivation:

Any dynamical system that has a variational formulation admits an optimal controller within the framework of Hamilton-Jacobi-Bellman theory. In particular, the Hamilton-Jacobi-Bellman equation applies to deep reinforcement learning systems as stochastic gradient descent implicitly performs variational inference .

An important consequence of this theory is that the scalar field property of the Bellman equation in Reinforcement Learning doesn’t depend upon an anthropic principle.

# Optimal Non-linear control:

Given a non-linear dynamical system:

$\begin{equation} \dot{x} = f(x(t),u(t),t) dt \tag{1} \end{equation}$

we may design a controller $$u(t)$$ for $$x(t)$$ that minimizes a cost $$J$$:

$\begin{equation} J(x(t),u(t),t_0,t_f)=\int_{t_0}^{t_f} \mathcal{L}(x(\tau),u(\tau)) d\tau \tag{2} \end{equation}$

where the optimal controller $$\hat{u}(t)$$ is defined relative to a value function:

$\begin{equation} V(x(t_0),t_0,t_f) = \min_{u(t)} J(x(t),u(t),t_0,t_f) \tag{3} \end{equation}$

that only depends upon the initial state and the final state of the system.

## Hamilton-Jacobi-Bellman equation:

From a dynamical systems perspective, we may derive the following evolution equation relative the optimal controller $$\hat{u}(t)$$:

$\begin{equation} -\frac{\partial V}{\partial t} = \min_{u(t)} \big(\big(\frac{\partial V}{\partial x}\big)^T f(x(t),u(t)) + \mathcal{L}(x(t),u(t)) \big) \tag{4} \end{equation}$

where the continuity of $$V$$ depends upon Bellman’s Principle of Optimality:

$\begin{equation} V(x(t_0),t_0,t_f) = V(x(t_0),t_0,t) + V(x(t),t,t_f) \tag{5} \end{equation}$

## Lemma: Bellman’s Principle of Optimality:

Starting from the Bellman equation for Markov Decision Processes,

$\begin{equation} V^{\pi}(s_0) = \mathbb{E}_{\pi}\big[\sum_{k=0}^{\infty} \gamma^k r_k \lvert s = s_0 \big] = \mathbb{E}_{\pi}\big[\sum_{k=0}^{t} \gamma^k r_k + \sum_{k=t}^{\infty} \gamma^k r_k \lvert s=s_0\big] \tag{6} \end{equation}$

which yields the Bellman principle of optimality:

$\begin{equation} V^{\pi}(s_0) = V^{\pi}(s_0 \rightarrow s_k) + V^{\pi}(s_t \rightarrow s_{\infty}) \tag{7} \end{equation}$

## Deriving the Hamilton-Jacobi-Bellman equation:

Using the total derivative and setting $$t:= t_0$$, we have:

$\begin{equation} \frac{d}{dt} V(x(t),t,t_f) = \frac{\partial V}{\partial t} + \big(\frac{\partial V}{\partial x})^T\frac{dx}{dt} = \min_{u(t)} \frac{d}{dt}\big(\int_t^{t_f} \mathcal{L}(x(\tau),u(\tau))d\tau\big) = \min_{u(t)}\big(-\mathcal{L}(x(t),u(t))\big) \tag{8} \end{equation}$

which yields the evolution equation:

$\begin{equation} -\frac{\partial V}{\partial t} = \min_{u(t)} \big(\big(\frac{\partial V}{\partial x}\big)^T f(x(t),u(t)) + \mathcal{L}(x(t),u(t)) \big) \tag{9} \end{equation}$

## Discussion:

In principle, this theory may be generalised to stochastic dynamical systems. However, in the case of information-processing the appropriate theory is variational inference.

From the vantage point of Algorithmic Probability theory, variational inference is sound as the Shannon Entropy of a random variable corresponds to the Minimum Description Length of its associated probability distribution.

1. Rocke (2022, Oct. 6). Kepler Lounge: Kolmogorov’s theory of Algorithmic Probability. Retrieved from keplerlounge.com
2. Rocke (2022, Nov. 4). Kepler Lounge: Money as a scalar field via Reinforcement Learning. Retrieved from keplerlounge.com
3. Pratik Chaudhari, Stefano Soatto. Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks. Arxiv. 2018.