Learning Lagrangians from Data

A few years ago, I watched in awe as Michael Schmidt gave a presentation on building a robot scientist based on his much-cited paper, “Distilling Free-Form Natural Laws from Experimental Data”. However, what wasn’t clear to me was how his algorithm discovered Lagrangians “without prior knowledge of physics”.

Fastforward several years in my maths degree and several mathematical physics courses; I decided to read this paper a second time. Upon a second reading, it became clear to me that M. Schmidt and H. Lipson had to use a fitness function that directly incorporated knowledge of Lagrangians in order to find the variational ODEs of interest. In fact, in a carefully argued paper, researchers at MSRI provide a reasonable candidate fitness function for deducing Lagrangians from data as follows[2]:

  1. \mathcal{L} of a system with generalised coordinates x_i, \dot{x_i} (i=1,...,d) solves the Euler-Lagrange equations:

    \frac{d}{dt} (\frac{ \partial \mathcal{L}}{ \partial \dot{x_i}}) -\frac{\partial \mathcal{L}}{\partial x_i}=0

  2. By the chain rule \forall t \in [0,T] , we may develop this equation into:

    EL_i(\mathcal{L},t) := \sum\limits_{j=1}^{d} \frac{ \partial^2 \mathcal{L}}{ \partial x_j \partial x_i}\frac{d x_j}{dt} +\sum\limits_{j=1}^{d} \frac{ \partial^2 \mathcal{L}}{ \partial \dot{x_j} \partial \dot{x_i}}\frac{d \dot{x_j}}{dt} \frac{\partial \mathcal{L}}{\partial x_i}=0

  3. From this, Hillar and Sommer derive the following plausible fitness function:

    LFit(f) := -\frac{1}{N} \sum\limits_{k=1}^{N} log(1+\sum\limits_{j=1}^{d}|EL_i(\mathcal{L},t)|)

Meanwhile, we still have some questions to answer:

  1. What’s a reasonable form for candidate Lagrangians?
  2. Does our fitness function place sufficient constraints on the Lagrangian?

I have no doubt that Hillar & Sommer probably had reasonable answers to these questions but it’s D. Hillis et al. who provide a thorough response in their easy-to-read paper[1]. To the first question, Taylor’s theorem provides a reasonable answer:

  1. There exists an approximate polynomial representation for any Lagrangian, and we may assume that the Lagrangian may be represented by a polynomial in the coordinates and velocities.
  2. In order to place bounds on memory and computation, Hillis et al. use a reduced polynomial representation. i.e. there is a maximum power(m) of any coordinate for any coordinate/velocity and there’s a maximum degree(p) for any combination of coordinates and velocities.
  3. As an example, for a single variable \theta with m=2,p=3 we have:

    \mathcal{L}(\theta,\dot{\theta}) = c_1 \dot{\theta}^2+c_2 \theta+c_3 \theta^2+c_4 \theta \dot{\theta}^2 

Using Nelder-Mead to minimise a carefully chosen fitness function in terms of the coefficients c_i , Hillis et al. manage to find suitable Lagrangians for a variety of simple physical systems including the Duffing oscillator, Penning trap and Coupled harmonic oscillator. In fact, they go beyond using polynomial representations and in their Generalisation section explain how they use genetic programming to evolve complex functions based on simple functions(sine,cosine…etc.) and use an appropriate search algorithm that optimises the fitness score and also minimises the size of the ‘phylogenetic trees’.

From the results provided by M. Schmidt & H. Lipson, it appears that they used a very similar approach. However, unlike Hillis, their paper is very badly written and they don’t share the code they used. At best it might be considered a piece of marketing which would later be used to boost M. Schmidt’s data mining company, Nutonian.

Deception aside, I believe that it’s possible to produce innovative research in this area if researchers continue in the tradition of Hillis and Hillar. In particular, I think it might be possible to develop algorithms that find variational PDEs that describe a system’s behaviour and then use these variational PDEs to automatically classify physical systems. This, I believe, would be an important contribution to the physics community.

Note 1: It’s quite easy to search for Lagrangians with a restricted polynomial representation using Python with Scipy and autograd.

Note 2: Hillis et al. made their code available on Github. Although I don’t use Clojure, I learned a lot from going through the structure of their codebase.


1.D. Hillis et al. “An algorithm for discovering Lagrangians automatically from data” PeerJ 20. October 2015
2.  C. Hillar et al. “Comment on the article “Distilling Free-Form Natural Laws from Experimental Data”” Arxiv. 26 October 2012
3. M. Schmidt and H. Lipson.”Distilling Free-Form Natural Laws from Experimental Data” Science. 3 April 2009