# Bellman’s Principle of Optimality and the Hamilton-Jacobi-Bellman equation

Any dynamical system that has a variational formulation admits an optimal controller within the framework of Hamilton-Jacobi-Bellman theory.

Aidan Rocke https://github.com/AidanRocke
11-08-2022

## Motivation:

Any dynamical system that has a variational formulation admits an optimal controller within the framework of Hamilton-Jacobi-Bellman theory. In particular, the Hamilton-Jacobi-Bellman equation applies to deep reinforcement learning systems as stochastic gradient descent implicitly performs variational inference [3].

An important consequence of this theory is that the scalar field property of the Bellman equation in Reinforcement Learning doesn’t depend upon an anthropic principle.

# Optimal Non-linear control:

Given a non-linear dynamical system:

$$$\dot{x} = f(x(t),u(t),t) dt \tag{1}$$$

we may design a controller $$u(t)$$ for $$x(t)$$ that minimizes a cost $$J$$:

$$$J(x(t),u(t),t_0,t_f)=\int_{t_0}^{t_f} \mathcal{L}(x(\tau),u(\tau)) d\tau \tag{2}$$$

where the optimal controller $$\hat{u}(t)$$ is defined relative to a value function:

$$$V(x(t_0),t_0,t_f) = \min_{u(t)} J(x(t),u(t),t_0,t_f) \tag{3}$$$

that only depends upon the initial state and the final state of the system.

## Hamilton-Jacobi-Bellman equation:

From a dynamical systems perspective, we may derive the following evolution equation relative the optimal controller $$\hat{u}(t)$$:

$$$-\frac{\partial V}{\partial t} = \min_{u(t)} \big(\big(\frac{\partial V}{\partial x}\big)^T f(x(t),u(t)) + \mathcal{L}(x(t),u(t)) \big) \tag{4}$$$

where the continuity of $$V$$ depends upon Bellman’s Principle of Optimality:

$$$V(x(t_0),t_0,t_f) = V(x(t_0),t_0,t) + V(x(t),t,t_f) \tag{5}$$$

## Lemma: Bellman’s Principle of Optimality:

Starting from the Bellman equation for Markov Decision Processes,

$$$V^{\pi}(s_0) = \mathbb{E}_{\pi}\big[\sum_{k=0}^{\infty} \gamma^k r_k \lvert s = s_0 \big] = \mathbb{E}_{\pi}\big[\sum_{k=0}^{t} \gamma^k r_k + \sum_{k=t}^{\infty} \gamma^k r_k \lvert s=s_0\big] \tag{6}$$$

which yields the Bellman principle of optimality:

$$$V^{\pi}(s_0) = V^{\pi}(s_0 \rightarrow s_k) + V^{\pi}(s_t \rightarrow s_{\infty}) \tag{7}$$$

## Deriving the Hamilton-Jacobi-Bellman equation:

Using the total derivative and setting $$t:= t_0$$, we have:

$$$\frac{d}{dt} V(x(t),t,t_f) = \frac{\partial V}{\partial t} + \big(\frac{\partial V}{\partial x})^T\frac{dx}{dt} = \min_{u(t)} \frac{d}{dt}\big(\int_t^{t_f} \mathcal{L}(x(\tau),u(\tau))d\tau\big) = \min_{u(t)}\big(-\mathcal{L}(x(t),u(t))\big) \tag{8}$$$

which yields the evolution equation:

$$$-\frac{\partial V}{\partial t} = \min_{u(t)} \big(\big(\frac{\partial V}{\partial x}\big)^T f(x(t),u(t)) + \mathcal{L}(x(t),u(t)) \big) \tag{9}$$$

## Discussion:

In principle, this theory may be generalised to stochastic dynamical systems. However, in the case of information-processing the appropriate theory is variational inference.

From the vantage point of Algorithmic Probability theory, variational inference is sound as the Shannon Entropy of a random variable corresponds to the Minimum Description Length of its associated probability distribution.

## References:

1. Rocke (2022, Oct. 6). Kepler Lounge: Kolmogorov’s theory of Algorithmic Probability. Retrieved from keplerlounge.com
2. Rocke (2022, Nov. 4). Kepler Lounge: Money as a scalar field via Reinforcement Learning. Retrieved from keplerlounge.com
3. Pratik Chaudhari, Stefano Soatto. Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks. Arxiv. 2018.

### Citation

Rocke (2022, Nov. 8). Kepler Lounge: Bellman's Principle of Optimality and the Hamilton-Jacobi-Bellman equation. Retrieved from keplerlounge.com
@misc{rocke2022bellman's,
}