# Automatic Differentiation via Contour Integration

## Introduction:

Given the usefulness of partial derivatives for closed-loop control, it is natural to ask how might large branching structures in the brain and other biological systems compute derivatives. After some reflection I realised that an important result in complex analysis due to Cauchy, the Cauchy Integral Formula, may be used to compute derivatives with a simple forward propagation of signals using a Monte Carlo method.

In this article I introduce Cauchy’s formula, explain how Cauchy’s formula has a natural interpretation as an expected value, and demonstrate its reliability using concrete applications. These include gradient descent(also discovered by Cauchy), computing partial derivatives, and simulating a Spring Pendulum using the Euler-Lagrange equations.

Furthermore, I also address more conceptual issues towards the end of the article in the discussion section, where I provide a natural physical interpretation of complex numbers in terms of waves.

**Note:** While reading this article, you may find it helpful to experiment with functions in the following Jupyter notebook.

## Cauchy’s Integral Formula for derivatives:

## Derivation of the formula:

The Cauchy Integral Formula is defined as follows for functions where is assumed to be differentiable and is a simple closed piecewise smooth and positively oriented curve in :

\begin{equation} f’(z_0) = \frac{1}{2\pi i} \int_{\gamma} \frac{f(z)}{(z-z_0)^2} dz \end{equation}

An important consequence of this definition is that (1) is applicable to any holomorphic function which includes polynomials with complex coefficients, trigonometric functions and hyperbolic functions. The keyword here is polynomials. Due to Taylor’s theorem, any differentiable real-valued function may be approximated by polynomials so Cauchy’s method is applicable to any function that is differentiable.

In order to apply (1) numerically it is convenient to derive a simpler formulation using the unit disk as a boundary. By letting we obtain:

\begin{equation} \forall z_0 \in A, f’(z_0) = \frac{1}{2\pi} \int_{0}^{2\pi} f(z_0+e^{i\theta}) \cdot e^{-i\theta} d\theta \end{equation}

and if we are mainly interested in functions of a real variable then and is of the form:

\begin{equation} A = \{x + e^{i\theta}: x, \theta \in \mathbb{R}\} \end{equation}

## Deterministic computation of derivatives:

Given (2), if denotes the angular frequency in the discrete case, then denotes the sampling frequency and we may implement this integration procedure in Julia as follows:

```
function nabla(f, x::Float64, delta::Float64)
N = round(Int,2*pi/delta)
thetas = vcat(1:N)*delta
## collect arguments and rotations:
rotations = map(theta -> exp(-im*theta),thetas)
arguments = x .+ conj.(rotations)
## calculate expectation:
expectation = 1.0/N*real(sum(map(f,arguments).*rotations))
return expectation
end
```

If you are familiar with Julia you may notice that, as with any finite summation, the order of the operations is permutation invariant. This may be implemented in a tree-like structure where a large number of local computations occur in parallel. However, given that most biological networks are inherently noisy some scientists may object to the fact that this program is deterministic.

In the next section I explain why this is generally a non-issue. On the contrary, random sampling using intrinsic noise allows cheap, fast and reliable Monte Carlo estimates of an integral.

## Cauchy’s formula as a Monte Carlo method:

### Interpretation as an Expected Value:

If we define the real-valued function:

\begin{equation} g: \theta \mapsto \text{Re}(f(x+e^{i\theta})\cdot e^{-i\theta}) \end{equation}

By the intermediate-value theorem,

\begin{equation} \forall x \in A \exists \theta^* \in [0,2\pi], g(\theta^*) = \text{Re}(f’(x)) \end{equation}

It follows that we may interpret as an expected value and the factor as the value of a Probability Density Function of a continuous uniform distribution. This may serve as a basis for cheap and stochastic gradient estimates using Monte Carlo methods.

### Monte Carlo estimates of the gradient:

It is worth noting that (2) isn’t a high-dimensional integral. Nevertheless, a biological network may be inherently noisy and face severe computation constraints so a Monte Carlo method may be both useful and a lot more biologically plausible.

Using half the number of samples, this may be implemented in Julia simply by adding one line:

```
function mc_nabla(f, x::Float64, delta::Float64)
N = round(Int,2*pi/delta)
## sample with only half the number of points:
sample = rand(1:N,round(Int,N/2))
thetas = sample*delta
## collect arguments and rotations:
rotations = map(theta -> exp(-im*theta),thetas)
arguments = x .+ conj.(rotations)
## calculate expectation:
expectation = 2.0/N*real(sum(map(f,arguments).*rotations))
return expectation
end
```

I’d like to add that by sampling randomly using intrinsic noise we are taking into account the epistemological uncertainty of the biological network, that doesn’t generally know how to sample optimally for arbitrary functions . So if we assume bounded computational resources this would generally be a better procedure to follow even in the absence of intrinsic noise.

## Practical Applications:

### Performing gradient descent:

Given the great importance of gradient descent in machine learning, the reader might wonder whether Cauchy’s definition of the complex derivative may be used to perform gradient descent. The answer is yes:

```
function gradient_descent(f,x_p::Float64,alpha::Float64)
## 100 steps
for i=1:100
x_n = x_p - alpha*nabla(f,x_p,delta)
x_p = x_n
end
return x_p
end
```

…which may be tested on concrete problems such as this one:

```
## function:
g(x) = (x-1)^2 + (x-2)^4 + (x-3)^6
## initial value:
x_p = 5.0
## learning rate:
alpha = 0.01
x_min = gradient_descent(g,x_p,alpha)
```

The reader should find a minimum-value around 2.17 which Wolfram Alpha agrees with. A good sign.

### Computing Partial Derivatives:

We may also use Cauchy’s formula to compute partial derivatives of functions of several variables such as:

\begin{equation} q(x,y,z) = x + y^2 + \cos(z) \end{equation}

even if (2) is only defined for functions of a single complex variable. Indeed, note that under quite general circumstances we may introduce the auxiliary functions:

\begin{equation} f(\lambda,x) = f(\lambda \cdot e_i + x \odot (1_n -e_i)) \end{equation}

and use (7) to exchange the order in which in the limits are taken:

\begin{equation} \lim_{\lambda \to\hat{x_i}} \frac{\partial}{\partial \lambda} \lim_{x\to\hat{x}} \tilde{f}(\lambda,x)= \lim_{x \to \hat{x}} \frac{\partial f}{\partial x_i} \end{equation}

so we have:

\begin{equation} \lim_{\lambda \to\hat{x_i}} \frac{\partial \tilde{f}(\lambda,x=\hat{x})}{\partial \lambda} = \lim_{x \to \hat{x}} \frac{\partial f}{\partial x_i} \end{equation}

where is a function of a single variable.

In Julia, this mathematial analysis then allows us to define the partial derivative as follows:

```
function partial_nabla(f, i::Int64, X::Array{Float64,1},delta::Float64)
## f:= the function to be differentiated
## i:= partial differentiation with respect to this index
## X:= where the partial derivative is evaluated
## delta:= the sampling frequency
N = length(X)
kd(i,n) = [j==i for j in 1:n]
f_i = x -> f(x*kd(i,N) .+ X.*(ones(N)-kd(i,N)))
return nabla(f_i,X[i],delta)
end
```

Given that the main purpose of the brain is to generate movements and consider their implications, we are now ready to introduce the most important practical application of the partial derivative for organisms capable of counterfactual reasoning.

## Using partial derivatives to simulate physical systems:

If a physical system is conservative and describable by Cartesian coordinates the evolution of this system is completely defined by its Lagrangian:

\begin{equation} \mathcal{L}(x_1,…,x_n,\dot{x_1},…,\dot{x_n}) = T(\dot{x_1},…,\dot{x_n})-V(x_1,…,x_n) \end{equation}

where is the kinetic energy of the system and is the potential energy of that system. (Although this representation may seem strange, the reader should take my word for it that this will allow us to simplify very complicated calculations.)

We may then simulate the evolution of this system using the Euler-Lagrange equations:

\begin{equation} \forall i, \frac{\partial \mathcal{L}}{\partial x_i} = \frac{d}{dt}\frac{\partial \mathcal{L}}{\partial \dot{x_i}} \end{equation}

where is the force acting on the object centered at and is the momentum due to respectively.

For concreteness, let’s consider the Lagrangian of a one-dimensional Hookean spring with mass and stiffness :

\begin{equation} T(\dot{x}) = \frac{1}{2}m\dot{x}^2 \end{equation}

\begin{equation} V(x) = \frac{1}{2}kx^2 \end{equation}

\begin{equation} \mathcal{L}(x,\dot{x}) = T(\dot{x}) - V(x) = \frac{1}{2}m\dot{x}^2-\frac{1}{2}kx^2 \end{equation}

In the Julia Language we may describe this system as follows:

```
M, K = 1.0, 2.0, 1.0
T(X) = 0.5*M*(X[2]^2)
V(X) = 0.5*K*X[1]^2
L(X) = T(X)-V(X)
```

Given initial conditions, we may simulate this system using repeated computations of the elastic force on ,

\begin{equation} \frac{\partial \mathcal{L}}{\partial x} = -kx \end{equation}

The full simulation in Julia for 100 time-steps using the leapfrog method may then be defined:

```
N = 100
delta = (2*pi)/1000
dt = 0.1
## simulate the pendulum system:
for i=1:N-1
## update position:
Z[i+1,1] = Z[i,1] + Z[i,2]*dt + 0.5*(partial_nabla(L,1,Z[i,:],delta))*dt^2
## update velocity:
Z[i+1,2] = Z[i,2] + 0.5*(partial_nabla(L,1,Z[i,:],delta)+partial_nabla(L,1,Z[i+1,:],delta))*dt
end
```

The algorithmic procedure for more complex physical systems is similar and I invite the reader to explore the physics simulations notebook where I also consider the simple pendulum and the spring pendulum.

## Discussion:

At this point the reader might still have questions. I can’t address all questions but today I will try to address the physical interpretation of complex numbers.

It’s worth noting that complex numbers, have both magnitude and an angle . Therefore multiplication by complex numbers has a geometric interpretation as re-scaling by and rotation by . This leads to a natural physical interpretation.

For any we may define the frequency which specifies a sampling rate and may be physically interpreted as the propagation of a wave whose amplitude corresponds to . So complex numbers may be represented in biological networks through the propagation of waves.

## References:

- L.D. Landau & E.M. Lifshitz. Mechanics ( Volume 1 of A Course of Theoretical Physics ). Pergamon Press 1969.
- W. Rudin. Real and complex analysis. McGraw-Hill. 3rd ed.1986.
- Aidan Rocke. AutoDiff(2020).GitHub repository, https://github.com/AidanRocke/AutoDiff
- Aidan Rocke. Twitter Thread. https://twitter.com/bayesianbrain/status/1202650626653597698
- Duncan E. Donohue,Giorgio A. Ascoli. A Comparative Computer Simulation of Dendritic Morphology. PLOS Biology. June 6, 2008.
- Michael London & Michael Häusser. Dendritic Computation. Annu. Rev. Neurosci. 2005. 28:503–32
- WARREN S. MCCULLOCH AND WALTER PITTS. A LOGICAL CALCULUS OF THE IDEAS IMMANENT IN NERVOUS ACTIVITY. Bulletin of Mothemnticnl Biology Vol. 52, No. l/2. pp. 99-115. 1990.