Learning Lagrangians from Data

A few years ago, I watched in awe as Michael Schmidt gave a presentation on building a robot scientist based on his much-cited paper, “Distilling Free-Form Natural Laws from Experimental Data”. However, what wasn’t clear to me was how his algorithm discovered Lagrangians “without prior knowledge of physics”.

Fastforward several years in my maths degree and several mathematical physics courses; I decided to read this paper a second time. Upon a second reading, it became clear to me that M. Schmidt and H. Lipson had to use a fitness function that directly incorporated knowledge of Lagrangians in order to find the variational ODEs of interest. In fact, in a carefully argued paper, researchers at MSRI provide a reasonable candidate fitness function for deducing Lagrangians from data as follows[2]:

  1. \mathcal{L} of a system with generalised coordinates x_i, \dot{x_i} (i=1,...,d) solves the Euler-Lagrange equations:

    \frac{d}{dt} (\frac{ \partial \mathcal{L}}{ \partial \dot{x_i}}) -\frac{\partial \mathcal{L}}{\partial x_i}=0

  2. By the chain rule \forall t \in [0,T] , we may develop this equation into:

    EL_i(\mathcal{L},t) := \sum\limits_{j=1}^{d} \frac{ \partial^2 \mathcal{L}}{ \partial x_j \partial x_i}\frac{d x_j}{dt} +\sum\limits_{j=1}^{d} \frac{ \partial^2 \mathcal{L}}{ \partial \dot{x_j} \partial \dot{x_i}}\frac{d \dot{x_j}}{dt} \frac{\partial \mathcal{L}}{\partial x_i}=0

  3. From this, Hillar and Sommer derive the following plausible fitness function:

    LFit(f) := -\frac{1}{N} \sum\limits_{k=1}^{N} log(1+\sum\limits_{j=1}^{d}|EL_i(\mathcal{L},t)|)

Meanwhile, we still have some questions to answer:

  1. What’s a reasonable form for candidate Lagrangians?
  2. Does our fitness function place sufficient constraints on the Lagrangian?

I have no doubt that Hillar & Sommer probably had reasonable answers to these questions but it’s D. Hillis et al. who provide a thorough response in their easy-to-read paper[1]. To the first question, Taylor’s theorem provides a reasonable answer:

  1. There exists an approximate polynomial representation for any Lagrangian, and we may assume that the Lagrangian may be represented by a polynomial in the coordinates and velocities.
  2. In order to place bounds on memory and computation, Hillis et al. use a reduced polynomial representation. i.e. there is a maximum power(m) of any coordinate for any coordinate/velocity and there’s a maximum degree(p) for any combination of coordinates and velocities.
  3. As an example, for a single variable \theta with m=2,p=3 we have:

    \mathcal{L}(\theta,\dot{\theta}) = c_1 \dot{\theta}^2+c_2 \theta+c_3 \theta^2+c_4 \theta \dot{\theta}^2 

Using Nelder-Mead to minimise a carefully chosen fitness function in terms of the coefficients c_i , Hillis et al. manage to find suitable Lagrangians for a variety of simple physical systems including the Duffing oscillator, Penning trap and Coupled harmonic oscillator. In fact, they go beyond using polynomial representations and in their Generalisation section explain how they use genetic programming to evolve complex functions based on simple functions(sine,cosine…etc.) and use an appropriate search algorithm that optimises the fitness score and also minimises the size of the ‘phylogenetic trees’.

From the results provided by M. Schmidt & H. Lipson, it appears that they used a very similar approach. However, unlike Hillis, their paper is very badly written and they don’t share the code they used. At best it might be considered a piece of marketing which would later be used to boost M. Schmidt’s data mining company, Nutonian.

Deception aside, I believe that it’s possible to produce innovative research in this area if researchers continue in the tradition of Hillis and Hillar. In particular, I think it might be possible to develop algorithms that find variational PDEs that describe a system’s behaviour and then use these variational PDEs to automatically classify physical systems. This, I believe, would be an important contribution to the physics community.

Note 1: It’s quite easy to search for Lagrangians with a restricted polynomial representation using Python with Scipy and autograd.

Note 2: Hillis et al. made their code available on Github. Although I don’t use Clojure, I learned a lot from going through the structure of their codebase.


1.D. Hillis et al. “An algorithm for discovering Lagrangians automatically from data” PeerJ 20. October 2015
2.  C. Hillar et al. “Comment on the article “Distilling Free-Form Natural Laws from Experimental Data”” Arxiv. 26 October 2012
3. M. Schmidt and H. Lipson.”Distilling Free-Form Natural Laws from Experimental Data” Science. 3 April 2009

perturbations that preserve constants of motion

simulation of three body problem with random initial conditions


Within the context of the gravitational n-body problem I’m interested in perturbations of the initial conditions  \{p_i(0),\dot{p}_i(0)\}_{i=1}^n    which leave all the constants of motion unchanged.

It’s clear to me that the linear momentum(P) is preserved under any transformation of the initial position vectors and energy(H) is preserved under any isometry applied to initial position vectors. However, when I add the constraint of conserving angular momentum(L), the general nature of these transformations which would leave (H,L, P) unchanged is not clear.

Granted, time translations would preserve the constants of motion but in a trivial manner. Now, I’m not sure whether the approach I’ve taken can be considerably improved but within the context of the three body problem in the plane, I transformed this problem into an optimisation problem as follows:

  1. We first calculate the constants of motion which are the linear momentum, energy and the angular momentum prior to perturbing the position of one of the n bodies. Let’s suppose that these constants are given by C_1, C_2, C_3 . 
  2. Assuming our bodies are restricted to move in the plane, there are a total of 4n scalars that determine the initial conditions for this n-body system. It follows that the perturbation of one body(q ) generally requires searching for a 4n-2  dimensional vector x  for positions and velocities. This can be done using Nelder-Mead, a derivative-free method, provided that we define a cost function. 
  3. The functions to be minimised are obtained directly from the linear momentum (P), angular momentum (L) and Hamiltonian (H) equations to obtain a single smooth cost function:\begin{aligned} C(x,q) =(P(x,q)-C_1)^2+(H(x,q)-C_2)^2+(L(x,q)-C_3)^2 \end{aligned}
  4. In order to increase the probability of convergence, the Nelder-Mead algorithm was randomly seeded for each perturbed position q .

I think this method can be easily extended to n-bodies.

point masses and Gauss’ Law of Gravity

My final year project involves searching for stable orbits for the gravitational n-body problem where we use Newton’s point mass approximation to describe the force field around masses assuming that these are approximately spherical. Now given that this model uses point masses we can’t model rotations. In fact, it’s generally assumed that they are negligible. 

However, I wondered whether this assumption can break down at some point. Consider the following:

  1. On the scale of millenia, the surface of our planet actually behaves like a fluid.
  2. If the distance between two planetary bodies is small and we assume zero rotation, this would imply a growing deviation from sphericity over time. It follows that the point-mass approximation would no longer make sense.
  3. In fact, our bodies would look something like blobs of gooey paint slowly dripping towards each other. 

While it’s true that the probability of zero rotation is close to zero the challenge of dealing with non-spherical masses is something that Newton wouldn’t have handled very well with his point-mass approximation. However, this is not quite the end of Newton’s model. 

Given that the gravitational potential in his model holds for point masses a natural solution would be to integrate over the volume enclosed by a compact body. This is exactly what Gauss did when he discovered the divergence theorem from which the Gauss Law of Gravity emerges as an immediate corollary. As this handles the case where our masses aren’t spherical the rest of my blog post will focus on the statement and proof of this theorem. 

Gauss Law of Gravitation:

Given U  as the gravitational potential and \vec{g} = -\nabla U  as the gravitational field, Gauss’ Law of Gravitation states:

\begin{aligned} \int_{V} (\nabla \cdot \vec{g}) dV =\int_{\partial V} \vec{g} \cdot d \vec{A}  \end{aligned}

In order to arrive at this result, I’ll first need to introduce two useful definitions:

flux:The flux \phi of a vector field through a surface is the integral of the scalar product of the field in a given point times the infinitesimal oriented surface at that point:

\begin{aligned} \phi =  \int_{\partial V} \vec{g} \cdot d \vec{A} \end{aligned}

where \vec{g} denotes the gravitational field, \partial V the boundary of our mass and d \vec{A} is our infinitesimal oriented surface.

divergence:The divergence represents volume density of the outward flux of a vector field from an infinitesimal volume around a given point:

\begin{aligned} \nabla \cdot \vec{g} \end{aligned}

The flux and divergence of a field are actually related in a local manner. Consider a box with dimensions \delta x_1,\delta x_2,\delta x_3 . Using Cartesian coordinates, let’s suppose that (x_1,x_2,x_3) denotes a corner on this box. Then the flux of \vec{g} projected onto \hat{x_3} is given by:

\begin{aligned} \phi_3 =(g_3(x_1,x_2,x_3+\delta x_3)-g_3(x_1,x_2,x_3)) \delta x_1 \delta x_2 \end{aligned}

For very small \delta x_3 ,

\begin{aligned} g_3(x_1,x_2,x_3+\delta x_3)-g_3(x_1,x_2,x_3) \approx \frac{\partial g_3}{\partial x_3} \delta x_3 \end{aligned}

where we get equality in the limit as \delta x_3 tends to zero.

As a result we have \forall i \in \{1,2,3 \}  :

\begin{aligned} \phi_i \approx \frac{\partial g_i}{\partial x_i} \delta x_1 \delta x_2 \delta x_3 =\frac{\partial g_i}{\partial x_i} \delta V  \end{aligned}

\begin{aligned} \phi =\sum_{i=1}^{3} \phi_i \end{aligned}

Now, we may deduce that the flux and divergence are related as follows:

\begin{aligned} \frac{\phi}{\delta V}\approx  \sum_{i=1}^{3} \frac{\partial g_i}{\partial x_i} \end{aligned}

\begin{aligned}\lim_{\delta V \to 0} \frac{\phi}{\delta V} = \nabla \cdot \vec{g} =  \sum_{i=1}^{3} \frac{\partial g_i}{\partial x_i} \end{aligned}

If we take an arbitrary solid and slice it into compact volumes V_i with associated surfaces \partial V_i we have:

\begin{aligned} \phi =\sum_{i=1}^{N} \phi_i =\sum_{i=1}^{N} \int_{\partial V_i} \vec{g_i} \cdot d \vec{A} =\int_{\partial V} \vec{g} \cdot d \vec{A} \end{aligned}

In addition, using the relationship between flux and divergence we have:

\begin{aligned} \phi =\lim_{N \to \infty} \lim_{V_i \to 0} \sum_{i=1}^{N} \nabla \cdot \vec{g_i} V_i = \int_{V} \nabla \cdot \vec{g} dV \end{aligned}

By equating the last two expressions we conclude the proof of this marvellous result by Gauss which is succinctly summarised on Wolfram as follows:

in the absence of the creation or destruction of matter, the density within a region of space can change only by having it flow into or away from the region through its boundary

At this point it’s natural to ask how can we describe the gravitational field which is defined in terms of the gravitational potential. In general, for non-spherical masses we can no longer use the usual inverse radius gravitational potential due to Newton.

In some sense we’ll need an infinite sum of perturbations in the gravitational potential which can be done using spherical harmonics, another wonderful mathematical invention that has many applications. This shall be the subject of my next blog post. 



Stirling’s log-factorial approximation

In my statistical physics course, I have frequently encountered the following approximation due to Stirling:

\ln(N!) \approx N \ln (N) -N

It’s very useful but my professor didn’t explain how good the approximation was. The derivation I found turns out to be very simple and so I can present it in a few lines here:

  1. Note that:\int \ln(x) dx = x \ln (x) -x \quad (1)
  2. Now, if we defineS = \sum_{n=1}^{N} \ln (n) \quad (2)
    we have an upper-Riemann sum with \Delta x =1 .
  3. So we basically have the following approximation:
    S = \sum_{n=1}^{N} \ln (n \Delta x)\Delta x \approx \int_{1}^{N} \ln(x) dx \quad (3)
  4. By the intermediate value theorem,\forall n \in\mathbb{N} \thinspace \exists c_n \in (n-1,n), S' =  \sum_{n=1}^{N} \ln (c_n \Delta x)\Delta x =\int_{1}^{N} \ln(x) dx \quad(4) where \Delta x =1 as defined previously.
  5. Let’s check how good this approximation is:|S - S'| = | \sum_{n=1}^{N} \ln (n) - \sum_{n=1}^{N} \ln (c_n) | \Rightarrow |S - S'| \leq | \sum_{n=1}^{N} \ln (n) -\ln (n+1)| = \ln (N+1) \quad (5)
  6. This error grows very slowly. In fact, if N = 10^{24} i.e. the number of molecules in a glass of water,| \ln(N!) - (N \ln (N) -N)| < 60 which is a minuscule error relative to the number of molecules.

Note: A pdf version of this blog post is available here.

The Quadrupedal Conjecture

Last summer, I asked myself several related questions concerning the locomotion of quadrupeds:

1) What if the cheetah had more than four legs?
2) Why don’t hexapods gallop?
3) Have we ever had F1 cars with more than 4 wheels?

I found all these related questions interesting but after more reflection I summarised my problem mathematically:

Could four legs be optimal for rapid linear and quasi-linear locomotion on rough planar surfaces?

After searching for papers on this subject and finding none, I decided to name this problem ‘The Quadrupedal Conjecture’ and it may be informally described as saying that four legs allow a creature to travel fastest on a nearly-flat surface. However, I thought it might be interesting to tackle a simpler problem first which we may call the ‘Little Polypedal Conjecture’:

Can we show that given a polyped with 2 n legs having pre-defined mass and energy, that there exists N such that for 2 N limbs or greater, its maximum linear velocity on a rough planar surface would be reduced?

I believe that there is an elegant solution to the above problem and I think that the quadrupedal conjecture has an elegant solution as well. But it is by no means guaranteed that a solution to such a problem would be simple. Finding the right degree of abstraction would be one of the main challenges as there are many different ways of approaching this problem with varying degrees of realism.

First, I must say that this is not primarily a problem in biology although this question has attracted the attention of biologists on the biology stackexchange. Second, I think this is a question that would heavily involve mathematical physics although this question has been controversial on the physics stackexchange. Further, I think that the solution to this problem would be affected by mass scaling. By this I mean that the optimal number of legs would probably vary with the range of masses available to the polyped.

Finally, I think that this question is highly relevant to roboticists who build legged robots as a thorough investigation of this question would probably lead to better models for polypedal locomotion.

That’s all I can reasonably say for now.

Note: I thought I’d add a touch of melodrama to this blog post as a friend of mine told me that my blog posts can be a bit dry.

Modeling soft actuators

Last summer I worked in the Insect Robotics Lab and got introduced to soft robotics via the maggot robot that is in development there. I worked on C. Elegans movement analysis which led me to the OpenWorm project but the idea of soft robots and soft actuators in particular stuck in my mind. The potential for developing better robot joints was very interesting and given that the field is still in its infancy, quite exciting!

So, just last week the idea occurred to me that it might be interesting to model soft robots. But, where to start? I contacted Adam Stokes, who once worked in the Whiteside lab at Harvard which originated the idea of soft robots, and he referred me to Dylan Ross of the Insect Robotics Lab. After brain-storming with Dylan today we came to the conclusion that it might not be a bad idea to start with modeling soft actuators.

-soft actuator illustration taken from Cornell University

A soft actuator(aka PneuNet) basically has three parts:

  • a pressure-pump that injects
  • fluid into a
  • visco-elastic container(often made of silicon)

Now, this leads to interesting questions that engineers in this field haven’t looked into much detail yet such as:

i) How does the PneuNet behave with variable pressure and/or variable number of cycles?
ii) How does the PneuNet behave with variable material properties(elasticity, viscosity…) assuming fixed pressure?
iii) Assuming a well-defined goal for the PneuNet how can we optimize the energy use and material cost?

These are not easy questions to answer and in fact it turns out that I have a lot of reading to do. Assuming that I’m going to use fine mesh models to describe the theoretical dynamics of this system I would need to read a book on elasticity like Theory of Elasticity by Landau and Lifschitz. And, in order to model the fluid I’ll have to read a book on computational fluid dynamics. Then I’ll need to take into account the fact that the material properties will change over time with the growing number of cycles. This means that I’ll need a tight feedback loop.

The ideal setup would have a webcam which would analyze the variation in the geometry of the visco-elastic container when inflated in real-time, sensors to detect variation in temperature and pressure of the fluid, and of course a supercomputer to solve numerical differential equations. This might be the most challenging project that I have yet to take on.

My goal for next week is to 3D print the visco-elastic container to begin simple experiments and finish reading half of Theory of Elasticity which isn’t a very thick book to be honest.