 Might dendritic trees be used to compute partial derivatives?(image taken from )

## Introduction:

Given the usefulness of partial derivatives for closed-loop control, it is natural to ask how might large branching structures in the brain and other biological systems compute derivatives. After some reflection I realised that an important result in complex analysis due to Cauchy, the Cauchy Integral Formula, may be used to compute derivatives with a simple forward propagation of signals using a Monte Carlo method.

In this article I introduce Cauchy’s formula, explain how Cauchy’s formula has a natural interpretation as an expected value, and demonstrate its reliability using concrete applications. These include gradient descent(also discovered by Cauchy), computing partial derivatives, and simulating a Spring Pendulum using the Euler-Lagrange equations.

Furthermore, I also address more conceptual issues towards the end of the article in the discussion section, where I provide a natural physical interpretation of complex numbers in terms of waves.

## Derivation of the formula:

The Cauchy Integral Formula is defined as follows for functions $f: A \rightarrow \mathbb{C}$ where $f$ is assumed to be differentiable $\forall z \in A$ and $\gamma$ is a simple closed piecewise smooth and positively oriented curve in $A$:

\begin{equation} f’(z_0) = \frac{1}{2\pi i} \int_{\gamma} \frac{f(z)}{(z-z_0)^2} dz \end{equation}

An important consequence of this definition is that (1) is applicable to any holomorphic function which includes polynomials with complex coefficients, trigonometric functions and hyperbolic functions. The keyword here is polynomials. Due to Taylor’s theorem, any differentiable real-valued function may be approximated by polynomials so Cauchy’s method is applicable to any function that is differentiable.

In order to apply (1) numerically it is convenient to derive a simpler formulation using the unit disk as a boundary. By letting $z=z_0+e^{i\theta}$ we obtain:

\begin{equation} \forall z_0 \in A, f’(z_0) = \frac{1}{2\pi} \int_{0}^{2\pi} f(z_0+e^{i\theta}) \cdot e^{-i\theta} d\theta \end{equation}

and if we are mainly interested in functions of a real variable then $\text{Im}(z_0)=0$ and $A$ is of the form:

\begin{equation} A = \{x + e^{i\theta}: x, \theta \in \mathbb{R}\} \end{equation}

## Deterministic computation of derivatives:

Given (2), if $\Delta \theta= \frac{2\pi}{N}$ denotes the angular frequency in the discrete case, then $\frac{\Delta \theta}{2\pi}=\frac{1}{N}$ denotes the sampling frequency and we may implement this integration procedure in Julia as follows:

	function nabla(f, x::Float64, delta::Float64)

N = round(Int,2*pi/delta)
thetas = vcat(1:N)*delta

## collect arguments and rotations:
rotations = map(theta -> exp(-im*theta),thetas)
arguments = x .+ conj.(rotations)

## calculate expectation:
expectation = 1.0/N*real(sum(map(f,arguments).*rotations))

return expectation

end


If you are familiar with Julia you may notice that, as with any finite summation, the order of the operations is permutation invariant. This may be implemented in a tree-like structure where a large number of local computations occur in parallel. However, given that most biological networks are inherently noisy some scientists may object to the fact that this program is deterministic.

In the next section I explain why this is generally a non-issue. On the contrary, random sampling using intrinsic noise allows cheap, fast and reliable Monte Carlo estimates of an integral.

## Cauchy’s formula as a Monte Carlo method:

### Interpretation as an Expected Value:

If we define the real-valued function:

\begin{equation} g: \theta \mapsto \text{Re}(f(x+e^{i\theta})\cdot e^{-i\theta}) \end{equation}

By the intermediate-value theorem,

\begin{equation} \forall x \in A \exists \theta^* \in [0,2\pi], g(\theta^*) = \text{Re}(f’(x)) \end{equation}

It follows that we may interpret $f'$ as an expected value and the factor $\frac{1}{2\pi}$ as the value of a Probability Density Function of a continuous uniform distribution. This may serve as a basis for cheap and stochastic gradient estimates using Monte Carlo methods.

### Monte Carlo estimates of the gradient:

It is worth noting that (2) isn’t a high-dimensional integral. Nevertheless, a biological network may be inherently noisy and face severe computation constraints so a Monte Carlo method may be both useful and a lot more biologically plausible.

Using half the number of samples, this may be implemented in Julia simply by adding one line:

	function mc_nabla(f, x::Float64, delta::Float64)

N = round(Int,2*pi/delta)

## sample with only half the number of points:
sample = rand(1:N,round(Int,N/2))
thetas = sample*delta

## collect arguments and rotations:
rotations = map(theta -> exp(-im*theta),thetas)
arguments = x .+ conj.(rotations)

## calculate expectation:
expectation = 2.0/N*real(sum(map(f,arguments).*rotations))

return expectation

end


I’d like to add that by sampling randomly using intrinsic noise we are taking into account the epistemological uncertainty of the biological network, that doesn’t generally know how to sample $x \in A$ optimally for arbitrary functions $f$. So if we assume bounded computational resources this would generally be a better procedure to follow even in the absence of intrinsic noise.

## Practical Applications:

Given the great importance of gradient descent in machine learning, the reader might wonder whether Cauchy’s definition of the complex derivative may be used to perform gradient descent. The answer is yes:

function gradient_descent(f,x_p::Float64,alpha::Float64)

## 100 steps
for i=1:100

x_n = x_p - alpha*nabla(f,x_p,delta)
x_p = x_n

end

return x_p

end


…which may be tested on concrete problems such as this one:

## function:
g(x) = (x-1)^2 + (x-2)^4 + (x-3)^6

## initial value:
x_p = 5.0

## learning rate:
alpha = 0.01



The reader should find a minimum-value around 2.17 which Wolfram Alpha agrees with. A good sign.

### Computing Partial Derivatives:

We may also use Cauchy’s formula to compute partial derivatives of functions of several variables such as:

\begin{equation} q(x,y,z) = x + y^2 + \cos(z) \end{equation}

even if (2) is only defined for functions of a single complex variable. Indeed, note that under quite general circumstances we may introduce the auxiliary functions:

\begin{equation} f(\lambda,x) = f(\lambda \cdot e_i + x \odot (1_n -e_i)) \end{equation}

and use (7) to exchange the order in which in the limits are taken:

\begin{equation} \lim_{\lambda \to\hat{x_i}} \frac{\partial}{\partial \lambda} \lim_{x\to\hat{x}} \tilde{f}(\lambda,x)= \lim_{x \to \hat{x}} \frac{\partial f}{\partial x_i} \end{equation}

so we have:

\begin{equation} \lim_{\lambda \to\hat{x_i}} \frac{\partial \tilde{f}(\lambda,x=\hat{x})}{\partial \lambda} = \lim_{x \to \hat{x}} \frac{\partial f}{\partial x_i} \end{equation}

where $\tilde{f}(\lambda,x=\hat{x})$ is a function of a single variable.

In Julia, this mathematial analysis then allows us to define the partial derivative as follows:

function partial_nabla(f, i::Int64, X::Array{Float64,1},delta::Float64)

## f:= the function to be differentiated
## i:= partial differentiation with respect to this index
## X:= where the partial derivative is evaluated
## delta:= the sampling frequency

N = length(X)

kd(i,n) = [j==i for j in 1:n]

f_i = x -> f(x*kd(i,N) .+ X.*(ones(N)-kd(i,N)))

return nabla(f_i,X[i],delta)

end


Given that the main purpose of the brain is to generate movements and consider their implications, we are now ready to introduce the most important practical application of the partial derivative for organisms capable of counterfactual reasoning.

## Using partial derivatives to simulate physical systems: A plot of angular velocity against the angle for a Hookean spring

If a physical system is conservative and describable by $N$ Cartesian coordinates $\{x_i\}_{i=1}^n \in \mathbb{R}^n$ the evolution of this system is completely defined by its Lagrangian:

\begin{equation} \mathcal{L}(x_1,…,x_n,\dot{x_1},…,\dot{x_n}) = T(\dot{x_1},…,\dot{x_n})-V(x_1,…,x_n) \end{equation}

where $T(\dot{x_1},...,\dot{x_n})$ is the kinetic energy of the system and $V(x_1,...,x_n)$ is the potential energy of that system. (Although this representation may seem strange, the reader should take my word for it that this will allow us to simplify very complicated calculations.)

We may then simulate the evolution of this system using the Euler-Lagrange equations:

\begin{equation} \forall i, \frac{\partial \mathcal{L}}{\partial x_i} = \frac{d}{dt}\frac{\partial \mathcal{L}}{\partial \dot{x_i}} \end{equation}

where $\frac{\partial \mathcal{L}}{\partial x_i}$ is the force acting on the object centered at $x_i$ and $\frac{\partial \mathcal{L}}{\partial \dot{x_i}}$ is the momentum due to $x_i$ respectively.

For concreteness, let’s consider the Lagrangian of a one-dimensional Hookean spring with mass $m$ and stiffness $k$:

\begin{equation} T(\dot{x}) = \frac{1}{2}m\dot{x}^2 \end{equation}

\begin{equation} V(x) = \frac{1}{2}kx^2 \end{equation}

\begin{equation} \mathcal{L}(x,\dot{x}) = T(\dot{x}) - V(x) = \frac{1}{2}m\dot{x}^2-\frac{1}{2}kx^2 \end{equation}

In the Julia Language we may describe this system as follows:

M, K = 1.0, 2.0, 1.0

T(X) = 0.5*M*(X^2)
V(X) = 0.5*K*X^2

L(X) = T(X)-V(X)


Given initial conditions, we may simulate this system using repeated computations of the elastic force on $x$,

\begin{equation} \frac{\partial \mathcal{L}}{\partial x} = -kx \end{equation}

The full simulation in Julia for 100 time-steps using the leapfrog method may then be defined:

N = 100

delta = (2*pi)/1000

dt = 0.1

## simulate the pendulum system:
for i=1:N-1

## update position:
Z[i+1,1] = Z[i,1] + Z[i,2]*dt + 0.5*(partial_nabla(L,1,Z[i,:],delta))*dt^2

## update velocity:
Z[i+1,2] = Z[i,2] + 0.5*(partial_nabla(L,1,Z[i,:],delta)+partial_nabla(L,1,Z[i+1,:],delta))*dt

end


The algorithmic procedure for more complex physical systems is similar and I invite the reader to explore the physics simulations notebook where I also consider the simple pendulum and the spring pendulum.

## Discussion:

At this point the reader might still have questions. I can’t address all questions but today I will try to address the physical interpretation of complex numbers.

It’s worth noting that complex numbers, $re^{i\theta} \in \mathbb{C}$ have both magnitude $r$ and an angle $\theta$. Therefore multiplication by complex numbers has a geometric interpretation as re-scaling by $r$ and rotation by $\theta$. This leads to a natural physical interpretation.

For any $\Delta \theta \in [0, 2\pi)$ we may define the frequency $f = \frac{\Delta \theta}{2\pi}$ which specifies a sampling rate and may be physically interpreted as the propagation of a wave whose amplitude corresponds to $r$. So complex numbers may be represented in biological networks through the propagation of waves.

1. L.D. Landau & E.M. Lifshitz. Mechanics ( Volume 1 of A Course of Theoretical Physics ). Pergamon Press 1969.
2. W. Rudin. Real and complex analysis. McGraw-Hill. 3rd ed.1986.
3. Aidan Rocke. AutoDiff(2020).GitHub repository, https://github.com/AidanRocke/AutoDiff