Any dynamical system that has a variational formulation admits an optimal controller within the framework of Hamilton-Jacobi-Bellman theory.
Any dynamical system that has a variational formulation admits an optimal controller within the framework of Hamilton-Jacobi-Bellman theory. In particular, the Hamilton-Jacobi-Bellman equation applies to deep reinforcement learning systems as stochastic gradient descent implicitly performs variational inference [3].
An important consequence of this theory is that the scalar field property of the Bellman equation in Reinforcement Learning doesn’t depend upon an anthropic principle.
Given a non-linear dynamical system:
\[\begin{equation} \dot{x} = f(x(t),u(t),t) dt \tag{1} \end{equation}\]
we may design a controller \(u(t)\) for \(x(t)\) that minimizes a cost \(J\):
\[\begin{equation} J(x(t),u(t),t_0,t_f)=\int_{t_0}^{t_f} \mathcal{L}(x(\tau),u(\tau)) d\tau \tag{2} \end{equation}\]
where the optimal controller \(\hat{u}(t)\) is defined relative to a value function:
\[\begin{equation} V(x(t_0),t_0,t_f) = \min_{u(t)} J(x(t),u(t),t_0,t_f) \tag{3} \end{equation}\]
that only depends upon the initial state and the final state of the system.
From a dynamical systems perspective, we may derive the following evolution equation relative the optimal controller \(\hat{u}(t)\):
\[\begin{equation} -\frac{\partial V}{\partial t} = \min_{u(t)} \big(\big(\frac{\partial V}{\partial x}\big)^T f(x(t),u(t)) + \mathcal{L}(x(t),u(t)) \big) \tag{4} \end{equation}\]
where the continuity of \(V\) depends upon Bellman’s Principle of Optimality:
\[\begin{equation} V(x(t_0),t_0,t_f) = V(x(t_0),t_0,t) + V(x(t),t,t_f) \tag{5} \end{equation}\]
Starting from the Bellman equation for Markov Decision Processes,
\[\begin{equation} V^{\pi}(s_0) = \mathbb{E}_{\pi}\big[\sum_{k=0}^{\infty} \gamma^k r_k \lvert s = s_0 \big] = \mathbb{E}_{\pi}\big[\sum_{k=0}^{t} \gamma^k r_k + \sum_{k=t}^{\infty} \gamma^k r_k \lvert s=s_0\big] \tag{6} \end{equation}\]
which yields the Bellman principle of optimality:
\[\begin{equation} V^{\pi}(s_0) = V^{\pi}(s_0 \rightarrow s_k) + V^{\pi}(s_t \rightarrow s_{\infty}) \tag{7} \end{equation}\]
Using the total derivative and setting \(t:= t_0\), we have:
\[\begin{equation} \frac{d}{dt} V(x(t),t,t_f) = \frac{\partial V}{\partial t} + \big(\frac{\partial V}{\partial x})^T\frac{dx}{dt} = \min_{u(t)} \frac{d}{dt}\big(\int_t^{t_f} \mathcal{L}(x(\tau),u(\tau))d\tau\big) = \min_{u(t)}\big(-\mathcal{L}(x(t),u(t))\big) \tag{8} \end{equation}\]
which yields the evolution equation:
\[\begin{equation} -\frac{\partial V}{\partial t} = \min_{u(t)} \big(\big(\frac{\partial V}{\partial x}\big)^T f(x(t),u(t)) + \mathcal{L}(x(t),u(t)) \big) \tag{9} \end{equation}\]
In principle, this theory may be generalised to stochastic dynamical systems. However, in the case of information-processing the appropriate theory is variational inference.
From the vantage point of Algorithmic Probability theory, variational inference is sound as the Shannon Entropy of a random variable corresponds to the Minimum Description Length of its associated probability distribution.
For attribution, please cite this work as
Rocke (2022, Nov. 8). Kepler Lounge: Bellman's Principle of Optimality and the Hamilton-Jacobi-Bellman equation. Retrieved from keplerlounge.com
BibTeX citation
@misc{rocke2022bellman's, author = {Rocke, Aidan}, title = {Kepler Lounge: Bellman's Principle of Optimality and the Hamilton-Jacobi-Bellman equation}, url = {keplerlounge.com}, year = {2022} }