Jekyll2020-01-22T21:03:01+00:00/feed.xmlKepler Lounge
The math journal of Aidan Rocke
An alternative definition for the Partial Derivative2020-01-20T00:00:00+00:002020-01-20T00:00:00+00:00/applied-math/2020/01/20/partial-derivative<h2 id="general-idea">General idea:</h2>
<p>Let’s suppose we are given <script type="math/tex">f:\mathcal{M} \rightarrow \mathbb{R}</script> where <script type="math/tex">\mathcal{M}</script> is a compact subset of <script type="math/tex">\mathbb{R}^n</script> and <script type="math/tex">\forall i \in [1,n], \frac{\partial{f}}{\partial{x_i}}</script> is continuous. Now, instead of computing partial derivatives with respect to this function of several
variables we would like to compute equivalent derivatives with respect to <script type="math/tex">n</script> functions of a single variable. How should we proceed?</p>
<p>We note that if <script type="math/tex">e_i</script> denotes the ith standard basis vector, we may define:</p>
<p>\begin{equation}
\frac{\partial{f}}{\partial{x_i}} = \lim_{n \to \infty} n \cdot \big(f(x+\frac{1}{n}\cdot e_i)-f(x)\big) = \lim_{n \to \infty}f_n^i
\end{equation}</p>
<p>This allows us to introduce the following equivalence:</p>
<p>\begin{equation}
\lim_{x_{j \neq i} \to c_j} \frac{\partial f}{\partial x_i} = \frac{\partial}{\partial x_i} \lim_{x_{j \neq i} \to c_j} f \equiv \lim_{x_{j \neq i} \to c_j} \lim_{n \to \infty} f_n^i = \lim_{n \to \infty} \lim_{x_{j \neq i} \to c_j} f_n^i
\end{equation}</p>
<p>and we can show that these limits are interchangeable due to the Moore-Osgood theorem since:</p>
<p>\begin{equation}
\forall n \in \mathbb{N}, \lim_{x_{j \neq i} \to c_j} f_n(x)
\end{equation}</p>
<p>exists due to the assumption that <script type="math/tex">f</script> is continuous, and if we define <script type="math/tex">g_i :=\frac{\partial f}{\partial x_i}</script> we can show that:</p>
<p>\begin{equation}
\lim_{n \to \infty} f_n^i = g_i
\end{equation}</p>
<p>uniformly though (4) may not be completely obvious so it warrants a demonstration. In fact, the definition that interests us
depends on the correctness of this proof.</p>
<h2 id="proof-of-uniform-convergence">Proof of uniform convergence:</h2>
<p>By the Heine-Cantor theorem, since <script type="math/tex">\mathcal{M}</script> is compact and <script type="math/tex">g_i</script> is assumed to be continuous, <script type="math/tex">g_i</script> is uniformly continuous. It follows that <script type="math/tex">\forall \epsilon > 0 \forall x \in \mathcal{M} \exists N \in \mathbb{N} \forall n \geq N</script>:</p>
<p>\begin{equation}
d(f_n,g_i) = \lvert f_n(x)-g_i(x) \rvert = \Big\lvert \frac{f(x+\frac{1}{n}\cdot e_i)-f(x)}{\frac{1}{n}} - \frac{\partial f}{\partial x_i} \Big\rvert < \epsilon
\end{equation}</p>
<p>Furthermore, by the Mean Value Theorem (5) simplifies to:</p>
<p>\begin{equation}
\exists \alpha \in (0, \frac{1}{n}), \lvert g_i(x+\alpha \cdot e_i) - g_i(x) \rvert < \epsilon
\end{equation}</p>
<p>and this concludes our proof.</p>
<h2 id="definition">Definition:</h2>
<p>Given the following extrema:</p>
<p>\begin{equation}
m=\min_{x\in \mathcal{M}} \lvert \langle x,e_i \rangle \rvert
\end{equation}</p>
<p>\begin{equation}
M=\max_{x\in \mathcal{M}} \lvert \langle x,e_i \rangle \rvert
\end{equation}</p>
<p>we may define:</p>
<p>\begin{equation}
\forall \lambda \in [m,M] \forall x \in \mathcal{M}, \tilde{f}(\lambda, x) = f(\lambda\cdot e_i + x \odot(1_n - e_i)) \tag{*}
\end{equation}</p>
<p>where <script type="math/tex">\odot</script> denotes the Hadamard product.</p>
<p>Now, due to the hypotheses on <script type="math/tex">f</script>, (2) is valid and so we may define the partial derivatives with respect to <script type="math/tex">f</script> for all <script type="math/tex">i \in [1,n]</script> using (*):</p>
<p>\begin{equation}
\lim_{\lambda \to\hat{x_i}} \frac{\partial}{\partial \lambda} \lim_{x\to\hat{x}} \tilde{f}(\lambda,x)= \lim_{x \to \hat{x}} \frac{\partial f}{\partial x_i}
\end{equation}</p>
<p>or simply,</p>
<p>\begin{equation}
\lim_{\lambda \to\hat{x_i}} \frac{\partial \tilde{f}(\lambda,x=\hat{x})}{\partial \lambda} = \lim_{x \to \hat{x}} \frac{\partial f}{\partial x_i}
\end{equation}</p>
<p>where <script type="math/tex">\tilde{f}(\lambda,x=\hat{x})</script> is a function of a single variable.</p>Aidan RockeGeneral idea:Automatic Differentiation via Contour Integration2020-01-16T00:00:00+00:002020-01-16T00:00:00+00:00/neural-computation/2020/01/16/complex-auto-diff<center><img src="https://github.com/Kepler-Lounge/Kepler-Lounge.github.io/blob/master/_images/dendrites.png?raw=true" width="75%" height="75%" align="middle" /></center>
<center>Might dendritic trees be used to compute partial derivatives?(image taken from [5])</center>
<h2 id="introduction">Introduction:</h2>
<p>Given the usefulness of partial derivatives for closed-loop control, it is natural to ask how might large branching structures in the brain
and other biological systems compute derivatives. After some reflection I realised that an important result in complex analysis due to Cauchy,
the Cauchy Integral Formula, may be used to compute derivatives with a simple forward propagation of signals using a Monte Carlo method.</p>
<p>In this article I introduce Cauchy’s formula, explain how Cauchy’s formula has a natural interpretation as an expected value,
and demonstrate its reliability using concrete applications. These include gradient descent(also discovered by Cauchy),
computing partial derivatives, and simulating a Spring Pendulum using the Euler-Lagrange equations.</p>
<p>Furthermore, I also address more conceptual issues towards the end of the article in the discussion section, where I provide a natural
physical interpretation of complex numbers in terms of waves.</p>
<p><strong>Note:</strong> While reading this article, you may find it helpful to experiment with functions in the following <a href="https://github.com/AidanRocke/AutoDiff/blob/master/cauchy_tutorial.ipynb">Jupyter notebook</a>.</p>
<h2 id="cauchys-integral-formula-for-derivatives">Cauchy’s Integral Formula for derivatives:</h2>
<h2 id="derivation-of-the-formula">Derivation of the formula:</h2>
<p>The Cauchy Integral Formula is defined as follows for functions <script type="math/tex">f: A \rightarrow \mathbb{C}</script> where <script type="math/tex">f</script> is assumed to be differentiable <script type="math/tex">\forall z \in A</script> and
<script type="math/tex">\gamma</script> is a simple closed piecewise smooth and positively oriented curve in <script type="math/tex">A</script>:</p>
<p>\begin{equation}
f’(z_0) = \frac{1}{2\pi i} \int_{\gamma} \frac{f(z)}{(z-z_0)^2} dz
\end{equation}</p>
<p>An important consequence of this definition is that (1) is applicable to any holomorphic function which includes polynomials with complex coefficients, trigonometric
functions and hyperbolic functions. The keyword here is polynomials. Due to Taylor’s theorem, any differentiable real-valued function may be approximated by polynomials
so Cauchy’s method is applicable to any function that is differentiable.</p>
<p>In order to apply (1) numerically it is convenient to derive a simpler formulation using the unit disk as a boundary. By letting <script type="math/tex">z=z_0+e^{i\theta}</script> we obtain:</p>
<p>\begin{equation}
\forall z_0 \in A, f’(z_0) = \frac{1}{2\pi} \int_{0}^{2\pi} f(z_0+e^{i\theta}) \cdot e^{-i\theta} d\theta
\end{equation}</p>
<p>and if we are mainly interested in functions of a real variable then <script type="math/tex">\text{Im}(z_0)=0</script> and <script type="math/tex">A</script> is of the form:</p>
<p>\begin{equation}
A = \{x + e^{i\theta}: x, \theta \in \mathbb{R}\}
\end{equation}</p>
<h2 id="deterministic-computation-of-derivatives">Deterministic computation of derivatives:</h2>
<p>Given (2), if <script type="math/tex">\Delta \theta= \frac{2\pi}{N}</script> denotes the angular frequency in the discrete case, then <script type="math/tex">\frac{\Delta \theta}{2\pi}=\frac{1}{N}</script> denotes the sampling frequency and we may implement this integration procedure in Julia as follows:</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">function</span><span class="nf"> nabla</span><span class="x">(</span><span class="n">f</span><span class="x">,</span> <span class="n">x</span><span class="o">::</span><span class="kt">Float64</span><span class="x">,</span> <span class="n">delta</span><span class="o">::</span><span class="kt">Float64</span><span class="x">)</span>
<span class="n">N</span> <span class="o">=</span> <span class="n">round</span><span class="x">(</span><span class="kt">Int</span><span class="x">,</span><span class="mi">2</span><span class="o">*</span><span class="nb">pi</span><span class="o">/</span><span class="n">delta</span><span class="x">)</span>
<span class="n">thetas</span> <span class="o">=</span> <span class="n">vcat</span><span class="x">(</span><span class="mi">1</span><span class="o">:</span><span class="n">N</span><span class="x">)</span><span class="o">*</span><span class="n">delta</span>
<span class="c">## collect arguments and rotations: </span>
<span class="n">rotations</span> <span class="o">=</span> <span class="n">map</span><span class="x">(</span><span class="n">theta</span> <span class="o">-></span> <span class="n">exp</span><span class="x">(</span><span class="o">-</span><span class="nb">im</span><span class="o">*</span><span class="n">theta</span><span class="x">),</span><span class="n">thetas</span><span class="x">)</span>
<span class="n">arguments</span> <span class="o">=</span> <span class="n">x</span> <span class="o">.+</span> <span class="n">conj</span><span class="o">.</span><span class="x">(</span><span class="n">rotations</span><span class="x">)</span>
<span class="c">## calculate expectation: </span>
<span class="n">expectation</span> <span class="o">=</span> <span class="mf">1.0</span><span class="o">/</span><span class="n">N</span><span class="o">*</span><span class="n">real</span><span class="x">(</span><span class="n">sum</span><span class="x">(</span><span class="n">map</span><span class="x">(</span><span class="n">f</span><span class="x">,</span><span class="n">arguments</span><span class="x">)</span><span class="o">.*</span><span class="n">rotations</span><span class="x">))</span>
<span class="k">return</span> <span class="n">expectation</span>
<span class="k">end</span>
</code></pre></div></div>
<p>If you are familiar with Julia you may notice that, as with any finite summation, the order of the operations is permutation invariant. This may be implemented in a tree-like structure where a large number of local computations occur in parallel. However, given that most biological networks are inherently noisy some scientists may object to the fact that this program is deterministic.</p>
<p>In the next section I explain why this is generally a non-issue. On the contrary, random sampling using intrinsic noise allows cheap, fast and reliable Monte Carlo estimates of an integral.</p>
<h2 id="cauchys-formula-as-a-monte-carlo-method">Cauchy’s formula as a Monte Carlo method:</h2>
<h3 id="interpretation-as-an-expected-value">Interpretation as an Expected Value:</h3>
<p>If we define the real-valued function:</p>
<p>\begin{equation}
g: \theta \mapsto \text{Re}(f(x+e^{i\theta})\cdot e^{-i\theta})
\end{equation}</p>
<p>By the intermediate-value theorem,</p>
<p>\begin{equation}
\forall x \in A \exists \theta^* \in [0,2\pi], g(\theta^*) = \text{Re}(f’(x))
\end{equation}</p>
<p>It follows that we may interpret <script type="math/tex">f'</script> as an expected value and the factor <script type="math/tex">\frac{1}{2\pi}</script> as the value
of a Probability Density Function of a continuous uniform distribution. This may serve as a basis for cheap
and stochastic gradient estimates using Monte Carlo methods.</p>
<h3 id="monte-carlo-estimates-of-the-gradient">Monte Carlo estimates of the gradient:</h3>
<p>It is worth noting that (2) isn’t a high-dimensional integral. Nevertheless, a biological network may be inherently noisy and face severe computation constraints so a Monte Carlo method may be both useful and a lot more biologically plausible.</p>
<p>Using half the number of samples, this may be implemented in Julia simply by adding one line:</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">function</span><span class="nf"> mc_nabla</span><span class="x">(</span><span class="n">f</span><span class="x">,</span> <span class="n">x</span><span class="o">::</span><span class="kt">Float64</span><span class="x">,</span> <span class="n">delta</span><span class="o">::</span><span class="kt">Float64</span><span class="x">)</span>
<span class="n">N</span> <span class="o">=</span> <span class="n">round</span><span class="x">(</span><span class="kt">Int</span><span class="x">,</span><span class="mi">2</span><span class="o">*</span><span class="nb">pi</span><span class="o">/</span><span class="n">delta</span><span class="x">)</span>
<span class="c">## sample with only half the number of points: </span>
<span class="n">sample</span> <span class="o">=</span> <span class="n">rand</span><span class="x">(</span><span class="mi">1</span><span class="o">:</span><span class="n">N</span><span class="x">,</span><span class="n">round</span><span class="x">(</span><span class="kt">Int</span><span class="x">,</span><span class="n">N</span><span class="o">/</span><span class="mi">2</span><span class="x">))</span>
<span class="n">thetas</span> <span class="o">=</span> <span class="n">sample</span><span class="o">*</span><span class="n">delta</span>
<span class="c">## collect arguments and rotations: </span>
<span class="n">rotations</span> <span class="o">=</span> <span class="n">map</span><span class="x">(</span><span class="n">theta</span> <span class="o">-></span> <span class="n">exp</span><span class="x">(</span><span class="o">-</span><span class="nb">im</span><span class="o">*</span><span class="n">theta</span><span class="x">),</span><span class="n">thetas</span><span class="x">)</span>
<span class="n">arguments</span> <span class="o">=</span> <span class="n">x</span> <span class="o">.+</span> <span class="n">conj</span><span class="o">.</span><span class="x">(</span><span class="n">rotations</span><span class="x">)</span>
<span class="c">## calculate expectation: </span>
<span class="n">expectation</span> <span class="o">=</span> <span class="mf">2.0</span><span class="o">/</span><span class="n">N</span><span class="o">*</span><span class="n">real</span><span class="x">(</span><span class="n">sum</span><span class="x">(</span><span class="n">map</span><span class="x">(</span><span class="n">f</span><span class="x">,</span><span class="n">arguments</span><span class="x">)</span><span class="o">.*</span><span class="n">rotations</span><span class="x">))</span>
<span class="k">return</span> <span class="n">expectation</span>
<span class="k">end</span>
</code></pre></div></div>
<p>I’d like to add that by sampling randomly using intrinsic noise we are taking into account the epistemological uncertainty of the biological network, that doesn’t generally
know how to sample <script type="math/tex">x \in A</script> optimally for arbitrary functions <script type="math/tex">f</script>. So if we assume bounded computational resources this would generally be a better procedure to follow even in the absence of intrinsic noise.</p>
<h2 id="practical-applications">Practical Applications:</h2>
<h3 id="performing-gradient-descent">Performing gradient descent:</h3>
<p>Given the great importance of gradient descent in machine learning, the reader might wonder whether Cauchy’s definition of the complex derivative may be used
to perform gradient descent. The answer is yes:</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">function</span><span class="nf"> gradient_descent</span><span class="x">(</span><span class="n">f</span><span class="x">,</span><span class="n">x_p</span><span class="o">::</span><span class="kt">Float64</span><span class="x">,</span><span class="n">alpha</span><span class="o">::</span><span class="kt">Float64</span><span class="x">)</span>
<span class="c">## 100 steps</span>
<span class="k">for</span> <span class="n">i</span><span class="o">=</span><span class="mi">1</span><span class="o">:</span><span class="mi">100</span>
<span class="n">x_n</span> <span class="o">=</span> <span class="n">x_p</span> <span class="o">-</span> <span class="n">alpha</span><span class="o">*</span><span class="n">nabla</span><span class="x">(</span><span class="n">f</span><span class="x">,</span><span class="n">x_p</span><span class="x">,</span><span class="n">delta</span><span class="x">)</span>
<span class="n">x_p</span> <span class="o">=</span> <span class="n">x_n</span>
<span class="k">end</span>
<span class="k">return</span> <span class="n">x_p</span>
<span class="k">end</span>
</code></pre></div></div>
<p>…which may be tested on concrete problems such as this one:</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">## function:</span>
<span class="n">g</span><span class="x">(</span><span class="n">x</span><span class="x">)</span> <span class="o">=</span> <span class="x">(</span><span class="n">x</span><span class="o">-</span><span class="mi">1</span><span class="x">)</span><span class="o">^</span><span class="mi">2</span> <span class="o">+</span> <span class="x">(</span><span class="n">x</span><span class="o">-</span><span class="mi">2</span><span class="x">)</span><span class="o">^</span><span class="mi">4</span> <span class="o">+</span> <span class="x">(</span><span class="n">x</span><span class="o">-</span><span class="mi">3</span><span class="x">)</span><span class="o">^</span><span class="mi">6</span>
<span class="c">## initial value: </span>
<span class="n">x_p</span> <span class="o">=</span> <span class="mf">5.0</span>
<span class="c">## learning rate: </span>
<span class="n">alpha</span> <span class="o">=</span> <span class="mf">0.01</span>
<span class="n">x_min</span> <span class="o">=</span> <span class="n">gradient_descent</span><span class="x">(</span><span class="n">g</span><span class="x">,</span><span class="n">x_p</span><span class="x">,</span><span class="n">alpha</span><span class="x">)</span>
</code></pre></div></div>
<p>The reader should find a minimum-value around 2.17 which Wolfram Alpha agrees with. A good sign.</p>
<h3 id="computing-partial-derivatives">Computing Partial Derivatives:</h3>
<p>We may also use Cauchy’s formula to compute partial derivatives of functions of several variables such as:</p>
<p>\begin{equation}
q(x,y,z) = x + y^2 + \cos(z)
\end{equation}</p>
<p>even if (2) is only defined for functions of a single complex variable. Indeed, note that under quite <a href="https://keplerlounge.com/applied-math/2020/01/20/partial-derivative.html">general circumstances</a> we may introduce the auxiliary functions:</p>
<p>\begin{equation}
f(\lambda,x) = f(\lambda \cdot e_i + x \odot (1_n -e_i))
\end{equation}</p>
<p>and use (7) to exchange the order in which in the limits are taken:</p>
<p>\begin{equation}
\lim_{\lambda \to\hat{x_i}} \frac{\partial}{\partial \lambda} \lim_{x\to\hat{x}} \tilde{f}(\lambda,x)= \lim_{x \to \hat{x}} \frac{\partial f}{\partial x_i}
\end{equation}</p>
<p>so we have:</p>
<p>\begin{equation}
\lim_{\lambda \to\hat{x_i}} \frac{\partial \tilde{f}(\lambda,x=\hat{x})}{\partial \lambda} = \lim_{x \to \hat{x}} \frac{\partial f}{\partial x_i}
\end{equation}</p>
<p>where <script type="math/tex">\tilde{f}(\lambda,x=\hat{x})</script> is a function of a single variable.</p>
<p>In Julia, this mathematial analysis then allows us to define the partial derivative as follows:</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">function</span><span class="nf"> partial_nabla</span><span class="x">(</span><span class="n">f</span><span class="x">,</span> <span class="n">i</span><span class="o">::</span><span class="kt">Int64</span><span class="x">,</span> <span class="n">X</span><span class="o">::</span><span class="kt">Array</span><span class="x">{</span><span class="kt">Float64</span><span class="x">,</span><span class="mi">1</span><span class="x">},</span><span class="n">delta</span><span class="o">::</span><span class="kt">Float64</span><span class="x">)</span>
<span class="c">## f:= the function to be differentiated</span>
<span class="c">## i:= partial differentiation with respect to this index</span>
<span class="c">## X:= where the partial derivative is evaluated</span>
<span class="c">## delta:= the sampling frequency</span>
<span class="n">N</span> <span class="o">=</span> <span class="n">length</span><span class="x">(</span><span class="n">X</span><span class="x">)</span>
<span class="n">kd</span><span class="x">(</span><span class="n">i</span><span class="x">,</span><span class="n">n</span><span class="x">)</span> <span class="o">=</span> <span class="x">[</span><span class="n">j</span><span class="o">==</span><span class="n">i</span> <span class="k">for</span> <span class="n">j</span> <span class="k">in</span> <span class="mi">1</span><span class="o">:</span><span class="n">n</span><span class="x">]</span>
<span class="n">f_i</span> <span class="o">=</span> <span class="n">x</span> <span class="o">-></span> <span class="n">f</span><span class="x">(</span><span class="n">x</span><span class="o">*</span><span class="n">kd</span><span class="x">(</span><span class="n">i</span><span class="x">,</span><span class="n">N</span><span class="x">)</span> <span class="o">.+</span> <span class="n">X</span><span class="o">.*</span><span class="x">(</span><span class="n">ones</span><span class="x">(</span><span class="n">N</span><span class="x">)</span><span class="o">-</span><span class="n">kd</span><span class="x">(</span><span class="n">i</span><span class="x">,</span><span class="n">N</span><span class="x">)))</span>
<span class="k">return</span> <span class="n">nabla</span><span class="x">(</span><span class="n">f_i</span><span class="x">,</span><span class="n">X</span><span class="x">[</span><span class="n">i</span><span class="x">],</span><span class="n">delta</span><span class="x">)</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Given that the main purpose of the brain is to generate movements and consider their implications, we are now ready to
introduce the most important practical application of the partial derivative for organisms capable of counterfactual
reasoning.</p>
<h2 id="using-partial-derivatives-to-simulate-physical-systems">Using partial derivatives to simulate physical systems:</h2>
<center><img src="https://raw.githubusercontent.com/AidanRocke/AutoDiff/master/images/spring_phase_portrait.png" width="75%" height="75%" align="middle" /></center>
<center>A plot of angular velocity against the angle for a Hookean spring</center>
<p>If a physical system is conservative and describable by <script type="math/tex">N</script> Cartesian coordinates <script type="math/tex">\{x_i\}_{i=1}^n \in \mathbb{R}^n</script> the evolution of this system is completely
defined by its Lagrangian:</p>
<p>\begin{equation}
\mathcal{L}(x_1,…,x_n,\dot{x_1},…,\dot{x_n}) = T(\dot{x_1},…,\dot{x_n})-V(x_1,…,x_n)
\end{equation}</p>
<p>where <script type="math/tex">T(\dot{x_1},...,\dot{x_n})</script> is the kinetic energy of the system and <script type="math/tex">V(x_1,...,x_n)</script> is the potential energy
of that system. (Although this representation may seem strange, the reader should take my word for it that this will
allow us to simplify very complicated calculations.)</p>
<p>We may then simulate the evolution of this system using the Euler-Lagrange equations:</p>
<p>\begin{equation}
\forall i, \frac{\partial \mathcal{L}}{\partial x_i} = \frac{d}{dt}\frac{\partial \mathcal{L}}{\partial \dot{x_i}}
\end{equation}</p>
<p>where <script type="math/tex">\frac{\partial \mathcal{L}}{\partial x_i}</script> is the force acting on the object centered at <script type="math/tex">x_i</script> and <script type="math/tex">\frac{\partial \mathcal{L}}{\partial \dot{x_i}}</script>
is the momentum due to <script type="math/tex">x_i</script> respectively.</p>
<p>For concreteness, let’s consider the Lagrangian of a one-dimensional Hookean spring with mass <script type="math/tex">m</script> and stiffness <script type="math/tex">k</script>:</p>
<p>\begin{equation}
T(\dot{x}) = \frac{1}{2}m\dot{x}^2
\end{equation}</p>
<p>\begin{equation}
V(x) = \frac{1}{2}kx^2
\end{equation}</p>
<p>\begin{equation}
\mathcal{L}(x,\dot{x}) = T(\dot{x}) - V(x) = \frac{1}{2}m\dot{x}^2-\frac{1}{2}kx^2
\end{equation}</p>
<p>In the Julia Language we may describe this system as follows:</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">M</span><span class="x">,</span> <span class="n">K</span> <span class="o">=</span> <span class="mf">1.0</span><span class="x">,</span> <span class="mf">2.0</span><span class="x">,</span> <span class="mf">1.0</span>
<span class="n">T</span><span class="x">(</span><span class="n">X</span><span class="x">)</span> <span class="o">=</span> <span class="mf">0.5</span><span class="o">*</span><span class="n">M</span><span class="o">*</span><span class="x">(</span><span class="n">X</span><span class="x">[</span><span class="mi">2</span><span class="x">]</span><span class="o">^</span><span class="mi">2</span><span class="x">)</span>
<span class="n">V</span><span class="x">(</span><span class="n">X</span><span class="x">)</span> <span class="o">=</span> <span class="mf">0.5</span><span class="o">*</span><span class="n">K</span><span class="o">*</span><span class="n">X</span><span class="x">[</span><span class="mi">1</span><span class="x">]</span><span class="o">^</span><span class="mi">2</span>
<span class="n">L</span><span class="x">(</span><span class="n">X</span><span class="x">)</span> <span class="o">=</span> <span class="n">T</span><span class="x">(</span><span class="n">X</span><span class="x">)</span><span class="o">-</span><span class="n">V</span><span class="x">(</span><span class="n">X</span><span class="x">)</span>
</code></pre></div></div>
<p>Given initial conditions, we may simulate this system using repeated computations of the elastic force on <script type="math/tex">x</script>,</p>
<p>\begin{equation}
\frac{\partial \mathcal{L}}{\partial x} = -kx
\end{equation}</p>
<p>The full simulation in Julia for 100 time-steps using the leapfrog method may then be defined:</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">N</span> <span class="o">=</span> <span class="mi">100</span>
<span class="n">delta</span> <span class="o">=</span> <span class="x">(</span><span class="mi">2</span><span class="o">*</span><span class="nb">pi</span><span class="x">)</span><span class="o">/</span><span class="mi">1000</span>
<span class="n">dt</span> <span class="o">=</span> <span class="mf">0.1</span>
<span class="c">## simulate the pendulum system: </span>
<span class="k">for</span> <span class="n">i</span><span class="o">=</span><span class="mi">1</span><span class="o">:</span><span class="n">N</span><span class="o">-</span><span class="mi">1</span>
<span class="c">## update position: </span>
<span class="n">Z</span><span class="x">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="x">,</span><span class="mi">1</span><span class="x">]</span> <span class="o">=</span> <span class="n">Z</span><span class="x">[</span><span class="n">i</span><span class="x">,</span><span class="mi">1</span><span class="x">]</span> <span class="o">+</span> <span class="n">Z</span><span class="x">[</span><span class="n">i</span><span class="x">,</span><span class="mi">2</span><span class="x">]</span><span class="o">*</span><span class="n">dt</span> <span class="o">+</span> <span class="mf">0.5</span><span class="o">*</span><span class="x">(</span><span class="n">partial_nabla</span><span class="x">(</span><span class="n">L</span><span class="x">,</span><span class="mi">1</span><span class="x">,</span><span class="n">Z</span><span class="x">[</span><span class="n">i</span><span class="x">,</span><span class="o">:</span><span class="x">],</span><span class="n">delta</span><span class="x">))</span><span class="o">*</span><span class="n">dt</span><span class="o">^</span><span class="mi">2</span>
<span class="c">## update velocity:</span>
<span class="n">Z</span><span class="x">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="x">,</span><span class="mi">2</span><span class="x">]</span> <span class="o">=</span> <span class="n">Z</span><span class="x">[</span><span class="n">i</span><span class="x">,</span><span class="mi">2</span><span class="x">]</span> <span class="o">+</span> <span class="mf">0.5</span><span class="o">*</span><span class="x">(</span><span class="n">partial_nabla</span><span class="x">(</span><span class="n">L</span><span class="x">,</span><span class="mi">1</span><span class="x">,</span><span class="n">Z</span><span class="x">[</span><span class="n">i</span><span class="x">,</span><span class="o">:</span><span class="x">],</span><span class="n">delta</span><span class="x">)</span><span class="o">+</span><span class="n">partial_nabla</span><span class="x">(</span><span class="n">L</span><span class="x">,</span><span class="mi">1</span><span class="x">,</span><span class="n">Z</span><span class="x">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="x">,</span><span class="o">:</span><span class="x">],</span><span class="n">delta</span><span class="x">))</span><span class="o">*</span><span class="n">dt</span>
<span class="k">end</span>
</code></pre></div></div>
<p>The algorithmic procedure for more complex physical systems is similar and I invite the reader to explore
the <a href="https://github.com/AidanRocke/AutoDiff/blob/master/physics_simulations.ipynb">physics simulations notebook</a>
where I also consider the simple pendulum and the spring pendulum.</p>
<h2 id="discussion">Discussion:</h2>
<p>At this point the reader might still have questions. I can’t address all questions but today I will try to address the physical interpretation of complex numbers.</p>
<p>It’s worth noting that complex numbers, <script type="math/tex">re^{i\theta} \in \mathbb{C}</script> have both magnitude <script type="math/tex">r</script> and an angle <script type="math/tex">\theta</script>. Therefore multiplication by complex numbers has a geometric interpretation as re-scaling by <script type="math/tex">r</script> and rotation by <script type="math/tex">\theta</script>. This leads to a natural physical interpretation.</p>
<p>For any <script type="math/tex">\Delta \theta \in [0, 2\pi)</script> we may define the frequency <script type="math/tex">f = \frac{\Delta \theta}{2\pi}</script> which specifies a sampling rate and may be physically interpreted as the propagation of a wave whose amplitude corresponds to <script type="math/tex">r</script>. So complex numbers may be represented in biological networks through the propagation of waves.</p>
<h2 id="references">References:</h2>
<ol>
<li>L.D. Landau & E.M. Lifshitz. Mechanics ( Volume 1 of A Course of Theoretical Physics ). Pergamon Press 1969.</li>
<li>W. Rudin. Real and complex analysis. McGraw-Hill. 3rd ed.1986.</li>
<li>Aidan Rocke. AutoDiff(2020).GitHub repository, https://github.com/AidanRocke/AutoDiff</li>
<li>Aidan Rocke. Twitter Thread. https://twitter.com/bayesianbrain/status/1202650626653597698</li>
<li>Duncan E. Donohue,Giorgio A. Ascoli. A Comparative Computer Simulation of Dendritic Morphology. PLOS Biology. June 6, 2008.</li>
<li>Michael London & Michael Häusser. Dendritic Computation. Annu. Rev. Neurosci. 2005. 28:503–32</li>
<li>WARREN S. MCCULLOCH AND WALTER PITTS. A LOGICAL CALCULUS OF THE IDEAS IMMANENT IN NERVOUS ACTIVITY. Bulletin of Mothemnticnl Biology Vol. 52, No. l/2. pp. 99-115. 1990.</li>
</ol>Aidan RockeMight dendritic trees be used to compute partial derivatives?(image taken from [5])Scaling Laws for Dendritic Computation2020-01-12T00:00:00+00:002020-01-12T00:00:00+00:00/neural-computation/2020/01/12/scaling-laws-dendrites<center><img src="https://github.com/Kepler-Lounge/Kepler-Lounge.github.io/blob/master/_images/dendrites.png?raw=true" width="75%" height="75%" align="middle" /></center>
<center>Dendrite morphologies exhibit considerable variation(taken from [8])</center>
<h2 id="introduction">Introduction:</h2>
<p>Due to the compositional structure of neural computation, if dendritic trees in single neurons couldn’t compute interesting
functions the brain wouldn’t be able to generate any complex behaviour. Now, in order to figure out what dendrites can compute
we need to figure out the expressive power, computational complexity and function spaces associated with dendritic
computation. Eventually we would also need to take dynamics into account to determine when exactly these computations
should happen.</p>
<p>As function space identification is very hard we may reduce the complexity of our task by identifying scaling laws for the
time-complexity and expressive power of dendritic computation.</p>
<h2 id="dendrites-as-functions-computable-on-binary-trees">Dendrites as functions computable on binary trees:</h2>
<p>To a first-order approximation the typical dendrite has the morphology of a random binary tree [6]. However, as shown in the
illustration this allows considerable diversity in the range of dendrite morphologies. So how should we proceed with our analysis?</p>
<p>It’s worth noting that every random binary tree is embedded in a complete binary tree i.e. a binary tree where each node has two
children. Furthermore, if functions computable on a binary tree are the composition of simpler functions computable at each node
then any function computable on a random binary tree is also computable on a complete binary tree.</p>
<p>Complete binary trees are also special in the sense that functions computable on complete binary trees have maximal expressive power.
Assuming our demonstration is correct, to figure out what functions are computable by dendrites we may focus on the analysis of functions
computable on complete binary trees.</p>
<h2 id="the-expressiveness-of-functions-computable-on-trees">The expressiveness of functions computable on trees:</h2>
<p>Let’s define a function computable on a <script type="math/tex">k</script>-ary tree as a function composed with simpler computable functions defined at each node such that a function of this kind defined on a binary tree of depth <script type="math/tex">N</script> receives <script type="math/tex">2^{N-1}</script> inputs:</p>
<center><img src="https://raw.githubusercontent.com/Kepler-Lounge/Kepler-Lounge.github.io/master/_images/binary_tree.png?token=AH6QLOH24RZMFKQES65LJJ26ESOWA" width="75%" height="75%" align="middle" /></center>
<center>A tree function</center>
<p>Intuitively, if there are <script type="math/tex">k^n</script> functions at the nth level the expressiveness for <script type="math/tex">F_k^N</script> on a tree of depth <script type="math/tex">N</script> should grow on the order of:</p>
<p>\begin{equation}
\sim k^N
\end{equation}</p>
<h3 id="probabilistic-argument-using-kolmogorov-complexity">Probabilistic argument using Kolmogorov Complexity:</h3>
<p>If <script type="math/tex">F_k^N</script> is a composition of functions in <script type="math/tex">S</script> where <script type="math/tex">\lvert S \rvert = \frac{k^{N}-1}{k-1}</script> and <script type="math/tex">K(\cdot)</script> denotes Kolmogorov Complexity then we may define:</p>
<p>\begin{equation}
Q = \min_{f_i \in S} K(f_i)
\end{equation}</p>
<p>and we may show that for almost all <script type="math/tex">F_k^N</script> we must have:</p>
<p>\begin{equation}
K(F_k^N) \geq \frac{Q}{2} \cdot k^{N-1}
\end{equation}</p>
<p><strong>Proof:</strong></p>
<p>Let’s suppose each <script type="math/tex">f_i \in S</script> has an encoding as a binary string so <script type="math/tex">\forall i, f_i \in \{0,1\}^*</script>. If we compress each <script type="math/tex">f_i</script> then <script type="math/tex">F_k^N</script> is reduced to a program of length greater than:</p>
<p>\begin{equation}
n= Qk^{N -1}
\end{equation}</p>
<p>Now, the number of programs of length less than or equal to <script type="math/tex">\frac{n}{2}</script> is given by:</p>
<p>\begin{equation}
\sum_{l=1}^{\frac{n}{2}} 2^l \leq 2^{\frac{n}{2}+1}-1
\end{equation}</p>
<p>and so, using the principle of maximum entropy(i.e. uniform distribution) [4], we find that:</p>
<p>\begin{equation}
\lim_{n \to \infty} P(K(F_k^N) \geq \frac{n}{2}) \geq \lim_{n \to \infty} 1 - \frac{2^{\frac{n}{2}}}{2^n} = 1
\end{equation}</p>
<p>At this point the careful reader may remark that the typical dendritic tree doesn’t form a complete binary tree. I suspect the reason for this is
that besides computational considerations dendritic trees must take into account spatial, energetic and developmental constraints.</p>
<p>One reason for node sparsity may be the fact that for complete binary trees computational cost scales exponentially with tree depth.</p>
<h2 id="energy-as-a-robust-proxy-measure-for-total-computational-cost">Energy as a robust proxy measure for total computational cost:</h2>
<p>Consider that software running on given hardware must consume energy in order to perform computations so energy consumption scales with computational cost.
It follows that energy is a robust proxy measure for computational cost. We may also observe that if functions are defined at the nodes of dendritic trees and the energy consumption, per neural action potential, at each node is bounded by <script type="math/tex">C</script> joules then we may model the computational cost of evaluating functions computable on dendritic trees.</p>
<p>In particular, it may be easily shown that the total computational cost of evaluating functions defined on complete binary trees with depth <script type="math/tex">N</script> generally scales with:</p>
<p>\begin{equation}
\sim 2^N \cdot \text{Joules}
\end{equation}</p>
<h2 id="the-asymptotic-time-complexity-of-functions-computable-on-trees">The asymptotic time-complexity of functions computable on trees:</h2>
<p>Given a function <script type="math/tex">F_2^N</script> defined on a complete binary tree with depth <script type="math/tex">N</script> we may ask what is the fastest way to evaluate this function. In other
words, what is the fastest way to perform inference using <script type="math/tex">F_2^N</script>?</p>
<p>If we proceed in a sequential manner, starting with the base of the tree, the temporal cost or time-complexity will scale with the energy cost so
we will have:</p>
<p>\begin{equation}
{Time}(F_2^N) \sim \mathcal{O}(2^N)
\end{equation}</p>
<p>and if we note that the dimension of input to <script type="math/tex">F_2^N</script> equals the number of nodes at its base, <script type="math/tex">2^N</script>, we can say that serial computation yields:</p>
<p>\begin{equation}
{Time}(F_2^N(n)) \sim \mathcal{O}(n)
\end{equation}</p>
<p>However, if we assume parallel execution of functions at each level of <script type="math/tex">F_2^N</script> then the time-complexity scales with tree depth:</p>
<p>\begin{equation}
{Time}(F_2^N) \sim \mathcal{O}(N)
\end{equation}</p>
<p>which yields a time-complexity that scales logarithmically with dimension of the inputs:</p>
<p>\begin{equation}
{Time}(F_2^N(n)) \sim \mathcal{O}(\ln n)
\end{equation}</p>
<p>It turns out that such an algorithm for evaluating <script type="math/tex">F_2^N</script> is not only highly efficient, it is also biologically plausible [1].</p>
<h2 id="how-algorithmic-information-scales-with-compute-time">How algorithmic information scales with compute time:</h2>
<p>Given equations (3) and (10) we may deduce a very interesting scaling law:</p>
<p>\begin{equation}
K(F_2^N) \sim 2^{{Time}(F_2^N)}
\end{equation}</p>
<p>which means that the complexity of the functions <script type="math/tex">F_2^N</script> that are evaluated grows exponentially in the running time. This is comparable
to running all <script type="math/tex">2^N</script> programs of length <script type="math/tex">N</script> within <script type="math/tex">\sim N</script> seconds.</p>
<p>This may scale exponentially with energy (7) so there may be an inherent tradeoff between evaluating functions with great expressive power and
minimising energy consumption. From a biological perspective, node-sparsity may be a practical necessity.</p>
<h2 id="discussion">Discussion:</h2>
<p>The above analysis raises a question that may be explored through computer simulations on carefully chosen benchmarks. To be precise, what functions defined on binary trees do we obtain if we simultaneously maximise expressive power and minimise energy consumption? I think these are both biologically plausible and take maximal advantage of dendritic tree structure.</p>
<p>My hunch is that dendritic trees are a form of parse tree where the nodes represent mathematical expressions. I also think that dendritic trees
implement some kind of temporal logic and that this timing information comes from neural dynamics. For these reasons I think we need to unify
the dynamical systems and computer science perspectives of single-neuron computation.</p>
<p>Finally, the reader may note that I have said nothing about the time complexity of learning in dendritic trees. That is a problem for another day.</p>
<h2 id="references">References:</h2>
<ol>
<li>Michael London & Michael Häusser. Dendritic Computation. Annu. Rev. Neurosci. 2005. 28:503–32</li>
<li>M. Li and P. Vitányi. An Introduction to Kolmogorov Complexity and Its Applications. Graduate Texts in Computer Science. Springer, New York, second edition, 1997.</li>
<li>Roozbeh Farhoodi, Khashayar Filom, Ilenna Simone Jones, Konrad Paul Kording. On functions computed on trees. Arxiv. 2019.</li>
<li>Edwin Jaynes. Information Theory and Statistical Mechanics. The Physical Review. Vol. 106. No 4. 620-630. May 15, 1957.</li>
<li>Cuntz H, Borst A, Segev I.Optimization principles of dendritic structure. Theor Biol Med Model. 2007 Jun 8;4:21.</li>
<li>Alexandra Vormberg ,Felix Effenberger,Julia Muellerleile,Hermann Cuntz. Universal features of dendrites through centripetal branch ordering. PLOS Biology. 2017.</li>
<li>Hermann Cuntz ,Friedrich Forstner,Alexander Borst,Michael Häusser. One Rule to Grow Them All: A General Theory of Neuronal Branching and Its Practical Application. PLOS Biology. August 5, 2010.</li>
<li>Duncan E. Donohue,Giorgio A. Ascoli. A Comparative Computer Simulation of Dendritic Morphology. PLOS Biology. June 6, 2008.</li>
<li>WARREN S. MCCULLOCH AND WALTER PITTS. A LOGICAL CALCULUS OF THE IDEAS IMMANENT IN NERVOUS ACTIVITY. Bulletin of Mothemnticnl Biology Vol. 52, No. l/2. pp. 99-115. 1990.</li>
<li>Ujfalussy BB, Makara, Lengyel, Branco.Global and Multiplexed Dendritic Computations under In Vivo-like Conditions. Neuron. 2018 Nov 7.</li>
<li>Lars Buesing & Wolfgang Maass. A Spiking Neuron as Information Bottleneck. Neural Comput. 2010 Aug.</li>
<li>Anthony Zador & Barak A. Pearlmutter. VC Dimension of an Integrate-and-Fire Neuron Model. 1996.</li>
<li>Vladimir I Arnold. Representation of continuous functions of three variables by the superposition of continuous functions of two variables. Collected Works: Representations of Functions, Celestial Mechanics and KAM Theory, 1957–1965, pages 47–133, 2009.</li>
</ol>Aidan RockeDendrite morphologies exhibit considerable variation(taken from [8])Power Towers in Complex Networks2019-11-01T00:00:00+00:002019-11-01T00:00:00+00:00/complex/2019/11/01/power-towers-complex<p>In this short note I’d like to introduce a conceptual model for the emergence of higher-level abstractions in complex networks that allows us
to approximately quantify the number of constraints on a complex system. By higher-level abstraction I mean a system whose dynamics are consistent
with but not reducible to their elementary parts.</p>
<p>Let’s suppose we have a population of organisms capable of interaction and replication that is identified with <script type="math/tex">S</script> so <script type="math/tex">\lvert S \rvert = N</script> measures
the population. Now, if every subset of <script type="math/tex">S</script> may be identified with a clique of individual organisms we may say that:</p>
<p>\begin{equation}
\begin{split}
C_0 = \text{Pow}(S) \\
\lvert C_0 \rvert = 2^N - 1
\end{split}
\end{equation}</p>
<p>where <script type="math/tex">\text{Pow}</script> is used to define the space of possible undirected relations between organisms and this doesn’t include the empty set because nature abhors a vacuum.</p>
<p>We can also have cliques of cliques so:</p>
<p>\begin{equation}
\begin{split}
C_1 = \text{Pow} \circ \text{Pow} \circ S \\
\lvert C_1 \rvert = 2^{\lvert C_0 \rvert} - 1 = 2^{2^N - 1} - 1
\end{split}
\end{equation}</p>
<p>where the elements of <script type="math/tex">C_1</script> represent possible interactions between communities of organisms. So the elements of the power set represent higher-order objects
with more complex interactions.</p>
<p>Furthermore, in general we have:</p>
<p>\begin{equation}
\begin{split}
C_n = \text{Pow}^n \circ S \\
\lvert C_n \rvert = 2^{\lvert C_{n-1} \rvert} - 1
\end{split}
\end{equation}</p>
<p>and I think this idea formally captures to some degree what we mean by emergence in complex networks.</p>
<p>If each element of <script type="math/tex">C_n</script> is identified with an equation we may say with some confidence that the number of
constraints on a complex system grows super-exponentially in a manner that is most naturally expressed using tetration.</p>Aidan RockeIn this short note I’d like to introduce a conceptual model for the emergence of higher-level abstractions in complex networks that allows us to approximately quantify the number of constraints on a complex system. By higher-level abstraction I mean a system whose dynamics are consistent with but not reducible to their elementary parts.Derivation of the isoperimetric inequality from the ideal gas equation, part I2019-09-23T00:00:00+00:002019-09-23T00:00:00+00:00/nonlinear/elasticity/2019/09/23/isoperimetry<h2 id="introduction">Introduction:</h2>
<p>Why is it that whenever balloons are inflated they converge towards the shape of a sphere regardless of their initial geometry?</p>
<p>On one level this may be a purely geometrical problem due to thermodynamic constraints on a body with finite surface area.
This suggests the necessity of solving a global optimisation problem. However, the sequence of deformations undergone may be
facilitated by the elastic material the balloons are made of.</p>
<p>In this article I consider the contribution of the latter by analysing the problem in two dimensions and demonstrate that a minimal
surface may be entirely due to local mechanical instabilities.</p>
<h2 id="the-role-of-material-properties">The role of material properties:</h2>
<p>Let’s consider an object that is only allowed to extend in one dimension. If you were to elongate such an object it would assume
a roughly cylindrical shape.</p>
<p>It follows that we must pay careful attention to the material properties of the balloon.</p>
<h2 id="reasonable-assumptions">Reasonable assumptions:</h2>
<p>A two-dimensional balloon <script type="math/tex">\mathcal{B} \in \mathbb{R}^2</script> is essentially an elastic loop that initially has perimeter of length:</p>
<p>\begin{equation}
\lvert \partial \mathcal{B}(t=0) \rvert = l_0
\end{equation}</p>
<p>Furthermore, we may make the following reasonable assumptions:</p>
<ol>
<li>
<p>The balloon contains an astronomical number of gas particles that collectively satsify the ideal gas equation.</p>
</li>
<li>
<p>The balloon is surrounded by a heat bath.</p>
</li>
<li>
<p>The balloon itself is made of a macroscopic number of elastic filaments of equal length.</p>
</li>
</ol>
<p>Furthermore, we may assume that the mechanical behaviour of the balloon is largely driven by energy-minimisation processes that I shall
detail in the next couple sections.</p>
<h2 id="isobaric-inflation-as-a-consequence-of-energy-minimisation">Isobaric inflation as a consequence of energy minimisation:</h2>
<p>If we consider the force required to elongate the elastic boundary of the balloon we may define an associated potential energy function:</p>
<p>\begin{equation}
U((\lvert \partial \mathcal{B}(t) \rvert - l_0)^2) \geq 0
\end{equation}</p>
<p>such that:</p>
<p>\begin{equation}
U(\cdot) = 0 \iff \lvert \partial \mathcal{B}(t) \rvert = l_0
\end{equation}</p>
<p>Now, if we consider that physical systems tend to minimise potential energy we may infer that the balloon would tend to increase in volume
without increasing <script type="math/tex">\lvert \partial \mathcal{B}(t) \rvert</script>, the length of its perimeter.</p>
<p>In the case of inflation, after accumulating a pressure difference with respect to its environment the evolution of <script type="math/tex">\partial \mathcal{B}(t)</script>
would be guided by an approximately isobaric process provided that <script type="math/tex">\lvert \partial \mathcal{B}(t) \rvert \leq l_0</script>:</p>
<p>\begin{equation}
PV = nRT
\end{equation}</p>
<p>\begin{equation}
\frac{\Delta V}{V} = \frac{\Delta T}{T}
\end{equation}</p>
<p>We can go further with this type of reasoning. Not only does the elastic membrane constrain the type of thermodynamic processes that is likely to guide
inflation; it also constrains the mechanism for modifying the geometry of the balloon.</p>
<h2 id="local-deformations-of-elastic-filaments-lead-to-minimal-surfaces">Local deformations of elastic filaments lead to minimal surfaces:</h2>
<p>If we assume that the balloon constrains an ideal gas that my be modelled as an astronomical number of Newtonian particles, it’s reasonable to suppose
that equal pressure is applied to equal areas. Now, if this is the case we may consider pressure-driven deformations of <script type="math/tex">\partial \mathcal{B}</script> that
exploit a local mechanism that is operational everywhere on the boundary. What might such a mechanism look like?</p>
<p>Under a coarse-grained approximation, the boundary <script type="math/tex">\partial B</script> consists of a large chain of cylindrical elastic rods. If each individual rod is much larger
than the characteristic length where bending occurs any amount of bending will guarantee tensile stress.</p>
<p>It follows that the elastic membrane <script type="math/tex">\partial B</script> will try, as much as possible, to increase the enclosed volume while minimising the elongation globally.
This global minimisation happens by minimising the bending angle locally. No global coordination is required.</p>
<p>Another way of understanding this process is that deformations of the elastic membrane are mainly driven by local mechanical instabilities that lead to
a global minimisation of potential energies.</p>
<h2 id="a-polygonal-approximation-to-two-dimensional-elastic-boundaries">A polygonal approximation to two-dimensional elastic boundaries:</h2>
<p>One approach to modelling the activity of elastic boundaries is to approximate them as polygons with <script type="math/tex">N</script> sides of equal length where <script type="math/tex">N</script> is large.
Given that the sum of the interior angles <script type="math/tex">\theta_i \in (0,2\pi)</script> must add up to <script type="math/tex">(N-2)\cdot \pi</script> we may define the potential energy:</p>
<p>\begin{equation}
U = \frac{1}{2} \sum_{i=1}^N (\theta_i - \pi \cdot \big(\frac{N-2}{N}\big))^2
\end{equation}</p>
<p>\begin{equation}
\sum_{i=1}^N \theta_i = \pi \cdot (N-2)
\end{equation}</p>
<p>where:</p>
<p>\begin{equation}
\frac{\partial U}{\partial \theta_i} = \theta_i - \pi \cdot \big(\frac{N-2}{N}\big)
\end{equation}</p>
<p>\begin{equation}
\Delta \theta_i \propto \frac{\partial U}{\partial \theta_i}
\end{equation}</p>
<p>and we find that if we choose the local update with <script type="math/tex">\lambda \in (0,1)</script>:</p>
<p>\begin{equation}
\begin{split}
\theta_i^{t+1} & = \theta_i^{t} - \Delta \theta_i \\
& = \theta_i^{t} - \lambda \frac{\partial U}{\partial \theta_i} \\
& = (1-\lambda) \cdot \theta_i^t + \lambda \cdot \pi \cdot \big(\frac{N-2}{N}\big)
\end{split}
\end{equation}</p>
<p>and we can show that <script type="math/tex">\lim\limits_{t \to \infty} \theta_i^t = \big(\frac{N-2}{N}\big)</script> very quickly since:</p>
<p>\begin{equation}
x_{n+1} = (1-\lambda) \cdot x_n + \lambda \cdot \alpha \implies x_{n+1} - \alpha = (1-\lambda) \cdot (x_n - \alpha)
\end{equation}</p>
<p>\begin{equation}
\frac{(x_{n+1}-\alpha)^2}{(x_n - \alpha)^2} = (1-\lambda)^2
\end{equation}</p>
<p>so if we define:</p>
<p>\begin{equation}
\epsilon_{n+1}^2 = (x_{n+1}-\alpha)^2
\end{equation}</p>
<p>\begin{equation}
\epsilon_{n}^2 = (x_{n}-\alpha)^2
\end{equation}</p>
<p>we find that:</p>
<p>\begin{equation}
\lim_{n \to \infty} \epsilon_{n+1}^2 = \epsilon_1^2 \cdot \prod_{n=1}^\infty \frac{\epsilon_{n+1}^2}{\epsilon_{n}^2} = \lim_{n \to \infty} \epsilon_1^2 \cdot (1-\lambda)^{2n} = 0
\end{equation}</p>
<p>so we have exponentially fast convergence to a spherical geometry.</p>
<h2 id="discussion">Discussion:</h2>
<p>In this article I propose the existence of a local mechanical instability present everywhere in a closed elastic membrane with aspherical geometry. Surprisingly, the net action of this instability leads to exponentially fast convergence to the global minimum. But, this analysis may be refined.</p>
<p>The above analysis is entirely based on phenomenological studies of rubber bands by alternately dropping and manipulating rubber bands on a table. This led me to a useful
phenomenological model which may help simulate kinematics but doesn’t approximate the forces involved i.e. dynamics.</p>Aidan RockeIntroduction:Predicting doubling times during brain organoid development2019-09-06T00:00:00+00:002019-09-06T00:00:00+00:00/brain/organoids/2019/09/06/organoids-I<center><img src="https://raw.githubusercontent.com/Kepler-Lounge/Kepler-Lounge.github.io/master/_images/spherical_organoid.jpg?token=AH6QLODM3JMBDEAGYUQJWOS5PNX7I" width="75%" height="75%" align="middle" /></center>
<center>A spherical brain organoid grown in Berkeley [3]</center>
<h2 id="motivation">Motivation:</h2>
<p>Let’s suppose we have a lab which uses brain organoids to investigate human brain development. By seeding appropriate extracellular matrices(ECMs)
with thousands of human pluripotent stem cells(hPSCs) we may grow non-vascularized brain organoids. These tend to develop into spheroids for reasons that I try to
explain below.</p>
<p>Now, if the brain organoid’s spherical surface applies diffusion constraints on the transport of oxygen and nutrients to all cells in the interior, we may ask
how much time is required for the volume of a spherical brain organoid with radius <script type="math/tex">r</script> to double. The value of this analysis is that if we can estimate the
expected doubling time with reasonable confidence, we may predict the time to maturation.</p>
<p>Furthermore, I propose that predicting doubling times during brain organoid development as a fundamental challenge that could advance first principles
approaches to understanding organoid development.</p>
<p><strong>Caveat:</strong> Spherical Brain Organoids aren’t directly comparable to the human brain but they may be likened to the hydrogen atom for human brain development.</p>
<h2 id="assumptions">Assumptions:</h2>
<p>In order to proceed with our analysis a number of assumptions are necessary. The following are considered sufficient:</p>
<ol>
<li>
<p>An insignificant fraction of cells(< 5%) die before the spherical brain organoid has attained maximal volume, implying that the spherical organoid hasn’t grown too large.</p>
</li>
<li>
<p>During development, the distribution of each cell type converges to an equilibrium distribution where the distribution of each cell type(neurons,glia, oligodendrocytes)
is unimodal and tightly concentrated around its mean. Furthermore, we assume that the equilibrium distribution is isotropic i.e. spatially homogeneous.</p>
</li>
<li>
<p>Spherical symmetry is maintained via efficient mechanisms for cell signalling that coordinate the entire resource allocation process.</p>
</li>
<li>
<p>The packing density of cells is invariant to slight perturbations of the spherical geometry and therefore if the brain organoid’s geometry is denoted by <script type="math/tex">\mathcal{B}</script>:</p>
<p>\begin{equation}
\text{Mass}(\mathcal{B}) \approx k_1 \cdot \text{Vol}(\mathcal{B}) \approx k_2 \cdot N
\end{equation}</p>
<p>where <script type="math/tex">N</script> is the total number of cells and <script type="math/tex">k_1</script> and <script type="math/tex">k_2</script> are constants.</p>
</li>
<li>
<p>Half the organoid volume is exposed to air and the other half is embedded in ECM. Symmetry of this sort is necessary for our analytical arguments to be plausible.</p>
</li>
</ol>
<h2 id="a-rational-account-for-the-spherical-shape-of-brain-organoids">A rational account for the spherical shape of brain organoids:</h2>
<p>The isoperimetric inequality states that given a compact Euclidean manifold <script type="math/tex">\mathcal{B} \in \mathbb{R}^3</script> with fixed boundary <script type="math/tex">\text{Vol}(\partial \mathcal{B})</script>, then <script type="math/tex">\text{Vol}(\mathcal{B})</script> satisfies the following inequality:</p>
<p>\begin{equation}
\text{Vol}(\mathcal{B}) \leq \frac{1}{6 \sqrt{\pi}} \cdot \text{Vol}(\partial \mathcal{B})^{3/2}
\end{equation}</p>
<p>where we have equality if and only if <script type="math/tex">M</script> is a sphere.</p>
<p>Given the uniqueness of the sphere it’s reasonable to suppose that this shape isn’t an accident and that it’s probably advantageous to the brain organoid.
Here I posit two possible advantages in terms of energy loss and cell signalling.</p>
<ol>
<li>
<p>Minimisation of energy loss:</p>
<p>If heat is mainly lost by means of conduction via the boundary of the brain organoid then it would be advantageous to the brain organoid if this surface was
minimal.</p>
</li>
<li>
<p>Efficient cell signalling:</p>
<p>If we assume that the cells in an embryoid body communicate by means of some complex network and that the packing density of cells is isotropic then it’s sufficient
to minimise the average euclidean distance between cells. This minimisation process yields the sphere.</p>
</li>
</ol>
<p>At this point a mathematical biologist might remark that brain organoids aren’t vascularized and therefore resource allocation must be diffusion-constrained. Surely a flat
disk-like morphology would be more appropriate? The error in this argument is that it fails to consider that resource allocation is at the service of coordinating the
developmental process. Whatever is ideal for cell signalling shall constrain how resource allocation operates.</p>
<h2 id="the-expected-number-of-doubling-episodes-during-brain-organoid-development">The expected number of doubling episodes during brain organoid development:</h2>
<p>Before trying to estimate doubling times it might be instructive to analyse a related question. If <script type="math/tex">M_{\mathcal{B}}</script> is the mass of a spherical brain organoid,
<script type="math/tex">\rho_{\text{brain}}</script> is the average density of a human brain and the vast majority of cell divisions are symmetric:</p>
<p>\begin{equation}
M_{\mathcal{B}} \approx N_0 \cdot \overline{m_c} \cdot 2^D
\end{equation}</p>
<p>\begin{equation}
M_{\mathcal{B}} \approx \frac{4}{3} \pi r^3 \cdot \rho_{\text{brain}}
\end{equation}</p>
<p>\begin{equation}
\rho_{\text{brain}} \approx \frac{1400 g}{1260 \text{cm}^3} \approx \frac{1.1 \cdot 10^{-3} g}{1 \text{mm}^3}
\end{equation}</p>
<p>where <script type="math/tex">\overline{m_c}</script> is the average mass of a mature cell, <script type="math/tex">D</script> is the average number of cell divisions and <script type="math/tex">N_0</script> is the number of cells seeded per
embryoid body.</p>
<p>By equating (2) and (3) we find that:</p>
<p>\begin{equation}
D(N_0,r) = \frac{1}{\ln 2} \cdot \ln \big(\frac{4 r^3 \cdot \rho_{\text{brain}}}{N_0 \cdot \overline{m_c}}\big)
\end{equation}</p>
<p>Now, if we make the reasonable assumption that the mass of a eukaryotic cell is bounded between one nanogram and a thousand nanograms we may infer that [2]:</p>
<p>\begin{equation}
\overline{m_c} \approx 10^2 \text{ng} = 5 \cdot 10^{-7} \text{grams}
\end{equation}</p>
<p>so we have:</p>
<p>\begin{equation}
D(N_0,r) \approx \ln(4r^3 \cdot \rho_{\text{brain}}) - \ln (N_0) + 7\ln(10)
\end{equation}</p>
<p>and if we use the bounds from [1]:</p>
<p>\begin{equation}
5000 \leq N_0 \leq 10000
\end{equation}</p>
<p>\begin{equation}
1.5 \text{mm} \leq r \leq 2.5 \text{mm}
\end{equation}</p>
<p>we find that:</p>
<p>\begin{equation}
5.00 \leq D(N_0,r) \leq 5.83
\end{equation}</p>
<h2 id="estimating-the-doubling-times-during-brain-organoid-development">Estimating the doubling times during brain organoid development:</h2>
<p><strong>Disclaimer:</strong> In the analysis that follows we don’t make any assumptions on the proportion of cell divisions that are symmetric. This makes it more
robust than the previous analysis on the expected number of doubling episodes during brain organoid development.</p>
<p>Given the formula for the volume of a sphere, if <script type="math/tex">V_n</script> denotes the volume of a spheroid with radius <script type="math/tex">r_n \leq r_{\text{max}}</script> where <script type="math/tex">r_{\text{max}} = 2.5 \text{mm}</script> we have:</p>
<p>\begin{equation}
V_{n+1} = 2 \cdot V_n \implies r_{n+1} = 2^{\frac{1}{3}} \cdot r_n
\end{equation}</p>
<p>and given that brain organoids aren’t vascularized they must be diffusion-constrained. In this scenario, it’s reasonable to assume that:</p>
<p>\begin{equation}
\text{growth rate} \sim \text{metabolic rate} \sim \frac{\text{vol}(\partial \mathcal{B})}{\text{vol}(\mathcal{B})} \approx \frac{4 \pi r^2}{\frac{4}{3} \pi r^3} = \frac{3}{r}
\end{equation}</p>
<p>where we made the implicit assumption that during the elapsed time for doubling we have an approximate equality of the following averages:</p>
<p>\begin{equation}
\langle \text{growth rate of cell population} \rangle \approx \langle \text{growth rate of organoid volume} \rangle
\end{equation}</p>
<p>Now, given (13) if we denote the growth rate by <script type="math/tex">g_r</script> we have:</p>
<p>\begin{equation}
\frac{3k}{2^{\frac{1}{3}} \cdot r_n} \leq g_r \leq \frac{3k}{r_n}
\end{equation}</p>
<p>where <script type="math/tex">k</script> is an unknown constant.</p>
<p>It follows that if the volume of the brain organoid is currently <script type="math/tex">V_n</script> the expected doubling time <script type="math/tex">T_{n}</script> must be approximately:</p>
<p>\begin{equation}
T_n \cdot g_r = V_{n+1}
\end{equation}</p>
<p>\begin{equation}
V_{n+1} = 2 \cdot V_n = \frac{8}{3} \pi r_n^3
\end{equation}</p>
<p>using (15) we find that the doubling time must be in the interval:</p>
<p>\begin{equation}
\frac{8}{9k} \pi r_n^4 \leq T_n \leq \frac{8 \cdot 2^{\frac{1}{3}}}{9k} \pi r_n ^4
\end{equation}</p>
<p>and if our uncertainty over <script type="math/tex">T_n</script> is expressed as a uniform distribution on this interval the expected doubling time is given by:</p>
<p>\begin{equation}
\mathbb{E}[T_n] = \frac{4 \pi r_n^4}{9k} \cdot (1+2^{\frac{1}{3}})
\end{equation}</p>
<h2 id="discussion">Discussion:</h2>
<p>I must clarify that this theoretical analysis represents just the first attempt at a first-principles approach to predicting the time
required for a brain organoid to double its volume. The main objective of this analysis was to advance concepts that are useful for
analysing the development of brain organoids. This includes the metabolic activity of cells, their packing density, mechanisms for cell
signalling and equilibrium distributions over cell types at the terminal phase of development.</p>
<p>To validate my formulas that predict the expected waiting time for a spherical brain organoid to double its volume we may use tools
from data analysis. Specifically, we may use a combination of computer vision and non-linear regression to infer a functional relationship
between the doubling time and potentially relevant variables.</p>
<p>If the fourth derivative of the interpolated curve resulting from such an analysis is a positive constant then my theoretical analysis is
broadly correct.</p>
<p><strong>Acknowledgements:</strong> I would like to thank <a href="https://bradly-alicea.weebly.com">Bradly Alicea</a> for constructive feedback on this theoretical analysis.</p>
<h2 id="references">References:</h2>
<ol>
<li>Yakoub AM, Sadek M. Development and Characterization of Human Cerebral Organoids: An Optimized Protocol. 2018.</li>
<li>Haifei Zhang. Cell. http://soft-matter.seas.harvard.edu/index.php/Cell. 2009.</li>
<li>Modeling a neurodevelopmental disorder with human brain organoids: a new way to study conditions such as epilepsy and autism. https://neuroscience.berkeley.edu/modeling-neurodevelopmental-disorder-human-brain-organoids-new-way-study-conditions-epilepsy-autism/. 17/09/2018.</li>
</ol>Aidan RockeA spherical brain organoid grown in Berkeley [3]How neuroscientists can help address climate change2019-09-01T00:00:00+00:002019-09-01T00:00:00+00:00/neuroscience/2019/09/01/neuro4climate<center><img src="https://raw.githubusercontent.com/Kepler-Lounge/Kepler-Lounge.github.io/master/_images/flights.png?token=AH6QLOF7DV633JQD5LEK53K5SNMFO" width="75%" height="75%" align="middle" /></center>
<center>Total passengers carried by planes has grown by a factor of eight in the last 40 years(source: Data Bank)</center>
<h2 id="introduction">Introduction:</h2>
<p>The following analysis arose from a simple question. Why is it that the MathOverflow, an Internet forum for mathematicians, thrives and the analogous forum for neuroscientists doesn’t? What is the nature of the problem and if so how should it be addressed?</p>
<p>But, let’s start with an easier question. Why are all the neuroscientists on Twitter?</p>
<h2 id="why-are-all-the-neuroscientists-on-twitter">Why are all the neuroscientists on Twitter?:</h2>
<p>When Twitter was created with a constraint of 140 words per tweet I doubt that the Twitter product team expected their platform to be heavily used by scientists. You can’t render mathjax/latex and Twitter isn’t ideal for expressing subtleties but it has many desirable features for networking:</p>
<ol>
<li>Information exchange is efficient. A tweet represents <script type="math/tex">\sim 10^{-5}</script> kg CO2.</li>
<li>The message-length constraint incents quasi-synchronous exchanges.</li>
<li>All the scientists are already there for other reasons: sports, politics…etc.</li>
</ol>
<p>On any given day scientists on Twitter will share their preprints, explain how they achieved their results, and effectively conduct Q&A sessions on their research. Some scientists even joke that it has become the place for peer-review. I think it’s fair to say that Twitter has brought great value to the scientific community by allowing frictionless communication between scientists across the globe; scientists who probably wouldn’t communicate with each other unless they met at a conference. Crucially, relative to science conferences Twitter has a relatively small carbon footprint.</p>
<p>It seems almost like Twitter should be a public good except that it isn’t and this got me thinking about a public neuroscience forum for neuroscientists, a bit like the <a href="https://mathoverflow.net/">MathOverflow</a> for mathematicians.</p>
<h2 id="the-psychology-and-neuroscience-stackexchange-and-its-limits">The Psychology and Neuroscience stackexchange and its limits:</h2>
<p>What drew me to the Psychology and Neuroscience stack-exchange was that it had several functionalities that weren’t available on Twitter:</p>
<ol>
<li>You can easily find whether an identical/related question was asked.</li>
<li>Latex is available for mathematical formulas.</li>
<li>Shared tags for easy discoverability of posts.</li>
</ol>
<p>But, as I started making regular use of the forum I noticed sexist and racist behaviour at all user-reputation levels on the forum including the moderator-level:</p>
<ol>
<li><a href="https://psychology.stackexchange.com/questions/8277/how-to-interpret-a-bbc-news-article-on-the-effect-of-race-on-intelligence/8286#8286">How to interpret a BBC news article on the effect of race on intelligence?</a></li>
<li><a href="https://psychology.stackexchange.com/questions/20652/iq-gap-by-race-truth-or-myth/20664#20664">IQ gap by race, truth or myth?</a></li>
<li><a href="https://psychology.stackexchange.com/questions/10701/is-the-logic-of-herrnsteins-syllogism-sound-and-are-its-premises-true/17269#17269">Is the logic of “Herrnstein’s syllogism” sound, and are its premises true?</a></li>
</ol>
<p>On the balance, the current moderators nurtured an environment where racist and sexist views can coexist with research-level neuroscience questions thus offering them legitimacy. These questions are also terribly outdated, hailing back to a time when intelligence tests were used to justify colonial mentalities i.e. right-to-rule.</p>
<p>Having said this, I am not here to incite outrage and would stop short of labelling them as racist/sexist. We should be skeptical of the desire to punish as it often prevents us from seeing the bigger picture.</p>
<p>There are a couple problems with stack exchange forums which make the problem of sexism and racism difficult to tackle:</p>
<ol>
<li>Users can create one or more accounts under a pseudonym.</li>
<li>A moderator of a stack-exchange forum may be one of several accounts controlled by a single user.</li>
</ol>
<p>This allows the possibility of sockpuppeting at the moderator level, a fault that is exploitable on forums where users might want to share racist/sexist viewpoints. For the above reasons, I am not sure the neuroscience and psychology stack exchange is salvageable in its current form.</p>
<p>Finally, I’d like to address the view that scientists should be free to do research of a sexist/racist nature. First, if you had to list the twenty most important problems in neuroscience you would be hard-pressed to find any that require torturing data to ‘discover’ differences in intelligence between people of different gender or race. Second, there have always been ethical limits on scientific inquiry.</p>
<p>In a globalized world, scientists are free to pursue their scientific interests provided that it benefits a multicultural and inclusive society.</p>
<h2 id="the-sociology-of-neuroscience">The sociology of neuroscience:</h2>
<p>Besides sexist and racist behaviour, the Psychology and Neuroscience stack-exchange faces rather unique challenges unlike the MathOverflow, the Physics stack-exchange or the Theoretical Computer Science stack-exchange.</p>
<p>In neuroscience unlike math, physics or theoretical computer science the fundamental concepts are still in development. This is partly due to the complexity of the brain, possibly the most complex object in the universe, and partly due to a relative lack of data. This greater degree of uncertainty in neuroscience encourages much greater specialisation. In fact, it’s fair to say that neuroscience has many tribes that don’t share a common language.</p>
<p>This has a couple consequences:</p>
<ol>
<li>The probability that a research-level question will be answered on a forum lacking a critical-mass of researchers with diverse research backgrounds is small.</li>
<li>A researcher or masters student is more likely to address another specialist directly via email.</li>
</ol>
<p>For these reasons, the ratio of research-level questions to lower-tier questions is going to be biased towards lower-level questions and the Psychology and Neuroscience stack-exchange is unlikely to be dominated by research-level queries.</p>
<p>In contrast, neuroscience conferences are much better places for exchanging scientific information and building trust. This is partly because communication isn’t bandwidth-limited, face-to-face communication builds trust and a conference effectively directs the collective intelligence of scientists via synchronous communication. These are the types of forums that really matter to neuroscientists.</p>
<p>No wonder neuroscientists fly to hundreds of conferences per year. But, at what cost?</p>
<h2 id="the-carbon-footprint-of-neuroscience-conferences">The carbon footprint of neuroscience conferences:</h2>
<p>Plane flights emit on the order of ~,1 kg of CO2 per person per km. This explains why at the level of universities, the carbon footprint of academic conferences may represent up to a third of that universities’ carbon budget. At the level of the individual scientist the situation is much worse, easily representing more than 40% of their carbon
footprint.</p>
<p>Most scientists I have met care about the environment and wouldn’t deviate significantly from the European average of ~10 tonnes of CO2 per person per year. However, if we take into account that planes emit ~,1 kg of CO2 per person per km a scientist can easily add six tonnes of CO2 to their carbon footprint. They simply have to get on three return flights from San Francisco to Berlin which represents about 60,000 km of plane flight in total.</p>
<p>Does this mean that scientists should travel less? Probably. But, this doesn’t mean scientists should attend fewer conferences.</p>
<p>In the same way that Twitter has allowed scientists to communicate seamlessly across the Atlantic, I believe virtual reality may replace most brick-and-mortar conferences in the next five years.</p>
<h2 id="are-virtual-science-conferences-possible">Are virtual science conferences possible?</h2>
<p>Regarding the feasibility of virtual science conferences, we are not talking about science fiction. Virtual Reality is a technology which is already on the market in the form of the Oculus Rift S and the HTC Vive.</p>
<p>More than a technology, VR realises the vision of philosophers and mathematicians dating back a thousand years who believed that the world we perceive is a construction of the mind:</p>
<blockquote>
<p>Nothing of what is visible, apart from light and color, can be perceived by pure sensation, but only by discernment, inference, and recognition, in addition to sensation. -Alhazen</p>
</blockquote>
<p>From this perspective, robust VR requires progress on multi-sensory integration theories in order to understand how our senses can be tricked. The challenge is to find the right priors over multi-sensory data streams. On this front, we must recognise the important contributions made by behavioural and perceptual neuroscientists to VR
research and development.</p>
<p>If VR technologists can solve multi-party face-to-face interaction in the same way that Twitter has solved global public messaging, this would remove the need for almost all brick-and-mortar conferences. In the process, neuroscientists will make a historic contribution to addressing climate change.</p>
<h2 id="discussion">Discussion:</h2>
<p>While I think that behavioural and perceptual neuroscientists will play a crucial role in realising the vision of VR and consequently make most flights unnecessary I believe that the broader community of neuroscientists must do better in communicating their role to help address climate change. This will have the effect of unifying neuroscientists around neuroscience-driven solutions for climate change.</p>
<p>I also think this is only the beginning. There are other substantial ways neuroscientists can help address climate change.</p>
<h2 id="references">References:</h2>
<ol>
<li>WorldBank Data Bank. Air transport, passengers carried. https://data.worldbank.org/indicator/IS.AIR.PSGR. 01/09/2019.</li>
<li>Amanda Thompson. Scientific Racism: The Justification of Slavery and Segregated Education in America. 2003.</li>
<li>Cesare V. Parise , Marc O. Ernst. Noise, multisensory integration, and previous response in perceptual disambiguation. PLOS Biology. 2017.</li>
<li>Fast Company. How Much Energy Does a Tweet Consume? https://www.fastcompany.com/1620676/how-much-energy-does-tweet-consume. 19/04/2010. 01/09/2019.</li>
</ol>Aidan RockeTotal passengers carried by planes has grown by a factor of eight in the last 40 years(source: Data Bank)A comparison of credit-assignment models in mathematics and biology2019-08-27T00:00:00+00:002019-08-27T00:00:00+00:00/complex/networks/2019/08/27/authorship-I<h2 id="introduction">Introduction:</h2>
<p>In the world of science, scientists are rewarded for the quality of their publications. But, sometimes they are also rewarded for the relative ordering
of author names-what we may call the first-author model. This incents different kinds of citation behaviour and these distinct credit-assignment models
probably lead to different citation networks.</p>
<p>Among mathematicians and physicists who adhere to the alphabetical ordering of author names, this incents scientists to find brilliant collaborators.
On the other hand, if relative author ordering matters as is the case with biologists we might expect scientists to prioritise finding brilliant collaborators
and first-authorship, probably not in equal measure.</p>
<p>To a first-order approximation, we may understand the difference between these two types of credit-assignment systems by comparing the number of alphabetical
orderings with the number of first-author orderings as a function of <script type="math/tex">N</script>, the number of co-authors.</p>
<h2 id="alphabetical-order">Alphabetical order:</h2>
<p>Traditionally, in math and physics a group of <script type="math/tex">N</script> researchers that co-author a paper use alphabetical orderings by default so we have:</p>
<p>\begin{equation}
\forall N \in \mathbb{N}, A(N)=1
\end{equation}</p>
<p>where <script type="math/tex">A(\cdot)</script> stands for the number of alphabetical orders as a function of <script type="math/tex">N</script>. Although some information may be lost by adhering to alphabetical ordering one of its
advantages is that it reduces the risk of internal friction within the group of authors.</p>
<h2 id="an-upper-bound-on-author-orderings">An upper-bound on author orderings:</h2>
<p>In a world as complex as ours, each author might have their own metric so <script type="math/tex">3^{N \choose 2}</script> orderings are possible where each author is a node in a fully-connected
graph and it’s assumed that there are three possible labels <script type="math/tex">% <![CDATA[
\{<,>,=\} %]]></script> for each edge in the graph.</p>
<p>However, most of these orderings aren’t linear orders. In order to have a linear order all authors must organise to use a single metric. How does this consensus emerge?
Politics? Meritocracy? I have no idea. In any case, if <script type="math/tex">F(\cdot)</script> is the number of first-author orderings it’s reasonable to believe that:</p>
<p>\begin{equation}
\forall N \in \mathbb{N}, F(N) \ll 3^{N \choose 2}
\end{equation}</p>
<p>where <script type="math/tex">3^{N \choose 2}</script> represents a maximally diverse number of orderings.</p>
<h2 id="first-author-orderings">First-author orderings:</h2>
<p>If no ties between authors are possible then <script type="math/tex">F(\cdot)</script> is simply the number of hamiltonian paths in the fully-connected graph with <script type="math/tex">N</script> nodes so we have:</p>
<p>\begin{equation}
F(N) \geq N!
\end{equation}</p>
<p>But, if we allow ties then for each of the <script type="math/tex">N-1</script> edges in a hamiltonian path there are two options, <script type="math/tex">% <![CDATA[
\{<,=\} %]]></script>. So in general we have:</p>
<p>\begin{equation}
F(N) = 2^{N-1} \cdot N!
\end{equation}</p>
<p>In a group of <script type="math/tex">N</script> co-authors we might deduce that the fraction of orderings where a particular author comes first is given by:</p>
<p>\begin{equation}
\frac{F(N-1)}{F(N)} = \frac{1}{2N}
\end{equation}</p>
<p>so there’s a risk that the degree of selfish behaviour might increase as the number of co-authors increases because you might need
to do more work to convince the other co-authors that you contributed more than them.</p>
<p>Furthermore, we may intuit that <script type="math/tex">2^{N} \cdot N! \ll 3^{N \choose 2}</script> but we can make this comparison precise using:</p>
<p>\begin{equation}
\forall e \leq A \leq B, \frac{A}{B} \leq \frac{\ln A}{\ln B}
\end{equation}</p>
<p>Using the above inequality we find that:</p>
<p>\begin{equation}
\frac{2^{N-1} \cdot N!}{3^{N \choose 2}} \sim \frac{2^{N-1} (\frac{N}{e})^{N}}{3^{\frac{N^2}{2}}} \leq \frac{2N \ln N}{\frac{N^2}{2} \ln 3} < \frac{4 \ln N}{N}
\end{equation}</p>
<p>so the extent to which first-author orderings can capture a diversity of views vanishes faster than <script type="math/tex">\frac{4 \ln N}{N}</script>. From this analysis I can infer that the first-author
model is more suitable for small numbers of authors.</p>
<h2 id="discussion">Discussion:</h2>
<p>At this point I must acknowledge that this constitutes the beginning of a mathematical analysis which must be refined. How can we model the outcome of sequential self-centered
behaviour under both paradigms?</p>
<p>What kind of citation dynamics do these different credit-assignment models encourage? What if authors regularly co-author papers together? These are questions to be addressed
in a future article.</p>Aidan RockeIntroduction:A simple proof of Euler’s product formula2019-08-21T00:00:00+00:002019-08-21T00:00:00+00:00/number/theory/2019/08/21/euler_1<h2 id="introduction">Introduction:</h2>
<p>The Euler product formula states that if <script type="math/tex">\zeta(s)</script> is the Riemann zeta function and <script type="math/tex">p</script> is prime:</p>
<p>\begin{equation}
\zeta(s) = \sum_{n=1}^\infty \frac{1}{n^s} = \prod_{p} \frac{1}{1-p^{-s}}
\end{equation}</p>
<p>holds for all <script type="math/tex">s \in \mathbb{C}</script> such that <script type="math/tex">\zeta(s)</script> is absolutely convergent.</p>
<h2 id="proof">Proof:</h2>
<p>Every positive integer <script type="math/tex">n \in \mathbb{N^*}</script> has a unique prime factorization:</p>
<p>\begin{equation}
\forall n \in \mathbb{N^*} \exists c_p \in \mathbb{N}, n = \prod_p p^{c_p}
\end{equation}</p>
<p>where <script type="math/tex">% <![CDATA[
\sum c_p < \infty %]]></script>.</p>
<p>Furthermore, we note that:</p>
<p>\begin{equation}
\prod_p \frac{1}{1-p^{-s}} = \prod_p \big(\sum_{c_p = 0}^\infty p^{-c_p s} \big)
\end{equation}</p>
<p>due to elementary properties of geometric series.</p>
<p>In the formal expansion (2) we note that each term has a unique prime factorization and that every possible prime factorization occurs once. It follows that
if <script type="math/tex">\sum n^{-s}</script> converges absolutely we may rearrange the sum however we wish and so:</p>
<p>\begin{equation}
\zeta(s) = \prod_{p} \frac{1}{1-p^{-s}}
\end{equation}</p>
<p>provided that the hypotheses on <script type="math/tex">s</script> are satisfied.</p>Aidan RockeIntroduction:Almost all simple graphs are small world networks2019-07-31T00:00:00+00:002019-07-31T00:00:00+00:00/graphs/2019/07/31/small-worlds<h2 id="introduction">Introduction:</h2>
<p>Two days ago, while thinking about brain networks, it occurred to me that almost all simple graphs are small world networks in the sense that if <script type="math/tex">G_N</script> is a simple graph with <script type="math/tex">N</script> nodes sampled from the Erdös-Rényi random graph distribution with probability half then when <script type="math/tex">N</script> is large:</p>
<p>\begin{equation}
\mathbb{E}[d(v_i,v_j)] \leq \log_2 N
\end{equation}</p>
<p>My strategy for proving this was to show that when <script type="math/tex">N</script> is large, <script type="math/tex">\forall v_i \in G_N</script> there exists a chain of distinct nodes of length <script type="math/tex">\log_2 N</script> originating from <script type="math/tex">v_i</script> almost surely. This implies that:</p>
<p>\begin{equation}
\forall v_i, v_j \in G_N, d(v_i,v_j) \leq \log_2 N
\end{equation}</p>
<p>almost surely when <script type="math/tex">N</script> is large.</p>
<p>Now, by using the above method of proof I managed to show that almost all simple graphs are <em>very small</em> in the sense that:</p>
<p>\begin{equation}
\mathbb{E}[d(v_i,v_j)] \leq \log_2\log_2 N
\end{equation}</p>
<p>when <script type="math/tex">N</script> tends to infinity. We can actually do even better.</p>
<p>Using my proof that <a href="https://keplerlounge.com/math/2019/07/02/connected-graphs.html">almost all simple graphs are connected</a>, I can show that almost all simple graphs have diameter 2. However, I think there is more value in going through my original proof which in my opinion provides greater insight into the problem.</p>
<h2 id="degrees-of-separation-and-the-neighborhood-of-a-node">Degrees of separation and the neighborhood of a node:</h2>
<p>We may think of degrees of separation as a sequence of ‘hops’ between the neighborhoods of distinct nodes <script type="math/tex">v_i</script>. Given a node <script type="math/tex">v_i</script> we may define <script type="math/tex">\mathcal{N}(v_i)</script>
as follows:</p>
<p>\begin{equation}
\mathcal{N}(v_i) = \{v_j \in G_N: \overline{v_i v_j} \in G_N \}
\end{equation}</p>
<p>where <script type="math/tex">G_N = (V,E)</script> is a graph with <script type="math/tex">N</script> nodes.</p>
<p>Now, given the E-R model we can say that <script type="math/tex">v_i \neq v_j</script> implies:</p>
<p>\begin{equation}
P(v_k \notin \mathcal{N}(v_i) \land v_k \notin \mathcal{N}(v_j)) = P(v_k \notin \mathcal{N}(v_i)) \cdot P(v_k \notin \mathcal{N}(v_j)) = \frac{1}{4}
\end{equation}</p>
<p>and by induction:</p>
<p>\begin{equation}
P(v_k \notin \mathcal{N}(v_{1}) \land … \land v_k \notin \mathcal{N}(v_{n})) = \frac{1}{2^n}
\end{equation}</p>
<p>It follows that if there is a chain of distinct nodes <script type="math/tex">\overline{v_1 ... v_n}</script> we can say that:</p>
<p>\begin{equation}
P(d(v_1,v_k) \leq n) = 1- \frac{1}{2^n}
\end{equation}</p>
<h2 id="almost-all-simple-graphs-are-very-small-world-networks">Almost all simple graphs are very small world networks:</h2>
<h3 id="a-chain-of-distinct-nodes-v_i_i1log_2log_2-n-exists-almost-surely">A chain of distinct nodes <script type="math/tex">\{v_i\}_{i=1}^{\log_2\log_2 N}</script> exists almost surely:</h3>
<p>The probability that there exists a chain of nodes of length <script type="math/tex">\log_2\log_2 N</script>:</p>
<p>\begin{equation}
\overline{v_1 … v_{\log_2\log_2 N}}
\end{equation}</p>
<p>such that <script type="math/tex">v_i = v_j \iff i=j</script> is given by:</p>
<p>\begin{equation}
P(\overline{v_1 … v_{\log_2\log_2 N}} \in G_N) = \prod_{k=1}^{\log_2\log_2 N} \big(1-\frac{1}{2^{N-k}} \big) \geq \big(1- \frac{\log_2 N}{2^N}\big)^{\log_2\log_2 N}
\end{equation}</p>
<p>and we note that:</p>
<p>\begin{equation}
\lim\limits_{N \to \infty} \big(1- \frac{\log_2 N}{2^N}\big)^{\log_2\log_2 N} = 1
\end{equation}</p>
<p>this guarantees the existence of a chain of distinct nodes of length <script type="math/tex">\log_2 N</script> originating from any <script type="math/tex">v_i \in G_N</script> almost surely.</p>
<h3 id="given-that-overlinev_1--v_log_2log_2-n-exists-almost-surely-we-may-deduce-that-forall-i-in-1log_2log_2-n-dv_iv_k-leq-log_2log_2-n-almost-surely">Given that <script type="math/tex">\overline{v_1 ... v_{\log_2\log_2 N}}</script> exists almost surely we may deduce that <script type="math/tex">\forall i \in [1,\log_2\log_2 N], d(v_i,v_k) \leq \log_2\log_2 N</script> almost surely:</h3>
<p>If <script type="math/tex">\overline{v_1 ... v_{\log_2\log_2 N}}</script> exists we have:</p>
<p>\begin{equation}
\forall \{v_i\}_{i=1}^n, v_k \in G_N, P(d(v_1,v_k) \leq \log_2\log_2 N) = 1 - \frac{1}{2^{\log_2\log_2 N}}
\end{equation}</p>
<p>and so we have:</p>
<p>\begin{equation}
\lim\limits_{N \to \infty} \forall \{v_i\}_{i=1}^n, v_k \in G_N, P(d(v_1,v_k) \leq \log_2\log_2 N) = 1
\end{equation}</p>
<h2 id="discussion">Discussion:</h2>
<p>I must say that initially I found the above result quite surprising and I think it partially explains why small world networks frequently occur in nature.
Granted, in natural settings the graph is typically embedded in some kind of Euclidean space so in addition to the degrees of separation we must consider
the Euclidean distance. But, I suspect that in real-world networks with small world effects the Euclidean distance plays a marginal role.</p>
<p>In particular, I believe that wherever small-world networks prevail the Euclidean distance is dominated by ergodic dynamics between nodes. There is probably
some kind of stochastic communication process between the nodes.</p>Aidan RockeIntroduction: