Jekyll2019-06-18T12:01:13+00:00/feed.xmlKepler Lounge
The math journal of Aidan Rocke
Understanding Polyá, Hardy and Littlewood’s definition of similarly-ordered2019-06-15T00:00:00+00:002019-06-15T00:00:00+00:00/probability/2019/06/15/monotone<h2 id="motivating-the-monotone-relation">Motivating the monotone relation:</h2>
<p>Hardy, Polyá and Littlewood, made precise the notion of <em>similarly-ordered</em> in [2] as follows:</p>
<p><strong>Definition 1</strong>. Two functions <script type="math/tex">f : X → \mathbb{R},</script> and <script type="math/tex">g : X → \mathbb{R},</script> are said to be <em>similarly
ordered</em>, in short <script type="math/tex">f \propto g,</script> if</p>
<p>\begin{equation}
\forall x,y \in X, (f(x) − f(y))(g(x) − g(y)) \geq 0
\end{equation}</p>
<p>Interestingly, within the context of high-dimensional probability an identical definition occurred to me for the following reasons:</p>
<ol>
<li>
<p><script type="math/tex">P \propto Q \land Q \propto Z \implies P \propto Z</script> and we may actually show that <script type="math/tex">\propto</script> forms an equivalence relation over probability distributions.</p>
</li>
<li>
<p>Given a very complex distribution <script type="math/tex">P</script>, I have personally found it very useful to study simpler distributions <script type="math/tex">Q</script> which satisfy <script type="math/tex">P \propto Q</script> in order to deduce key properties of <script type="math/tex">P</script>.</p>
</li>
<li>
<p>Given the last two statements, a central problem I have encountered in probability involves demonstrating that <script type="math/tex">P \propto Q</script> where <script type="math/tex">Q</script> is a well-understood probability distribution.</p>
</li>
</ol>
<p>Given (1), we might ask whether <script type="math/tex">\propto</script> is the appropriate choice of notation. This is the <a href="https://mathoverflow.net/questions/333990/symbol-for-monotone-relationship-between-two-probability-distributions">first question that occurred to me</a>. But, there is a more fundamental question.</p>
<p>Does <script type="math/tex">f \propto g</script> imply that <script type="math/tex">f</script> and <script type="math/tex">g</script> have the same level sets? What if <script type="math/tex">g</script> is constant on <script type="math/tex">X</script>?</p>
<h2 id="level-sets-and-monotone-probability-distributions">Level sets and monotone probability distributions:</h2>
<h3 id="defining-level-sets">Defining level sets:</h3>
<p>For the purpose of this discussion let’s assume that <script type="math/tex">P</script> and <script type="math/tex">Q</script> are continuous probability distributions defined on <script type="math/tex">\mathbb{R}^n</script> and that they share the same domain:</p>
<p>\begin{equation}
\text{dom}(P) = \text{dom}(Q)
\end{equation}</p>
<p>We may define the level sets of <script type="math/tex">P</script> as follows:</p>
<p>\begin{equation}
\exists y \in [0,1], l_y^P = \{x \in \mathbb{R}^n:P(x)=y\}
\end{equation}</p>
<p>where some levels sets may be empty:</p>
<p>\begin{equation}
\exists \alpha \in [0,1] \forall x \in \mathbb{R}^n, P(x) \neq \alpha \implies l_\alpha^P = \emptyset
\end{equation}</p>
<p>and so <script type="math/tex">\lvert l_\alpha^P \rvert = 0</script>.</p>
<p>Given the assumed properties of <script type="math/tex">P</script> and our definition of level sets we may deduce the following:</p>
<ol>
<li>
<p>Due to the assumption that <script type="math/tex">P</script> and <script type="math/tex">Q</script> are continuous, every level set of <script type="math/tex">P</script> forms a connected component.</p>
</li>
<li>
<p>We may now define the level sets of a probability distribution <script type="math/tex">P</script> as follows:</p>
<p>\begin{equation}
L[P] = \bigcup_{z \in [0,1]} l_z^P
\end{equation}</p>
<p>which forms a connected manifold.</p>
</li>
<li>
<p>If <script type="math/tex">P</script> and <script type="math/tex">Q</script> have the same level sets then we have:</p>
<p>\begin{equation}
L[P] = L[Q]
\end{equation}</p>
<p>but this doesn’t guarantee that <script type="math/tex">P = Q</script>.</p>
</li>
</ol>
<h3 id="a-clever-work-around">A clever work-around:</h3>
<p>Now, a clever mathematician might say that we don’t actually need to say more about level sets. In order to guarantee that <script type="math/tex">f \propto g</script> and
that they have the same level sets we may simply replace (1) with:</p>
<p><strong>Definition 2</strong>. Two functions <script type="math/tex">f : X → \mathbb{R},</script> and <script type="math/tex">g : X → \mathbb{R},</script> are said to be <em>similarly
ordered</em>, in short <script type="math/tex">f \sim g,</script> if</p>
<p>\begin{equation}
\forall x, y \in X, \text{sgn}(f(x) − f(y)) = \text{sgn}(g(x) − g(y))
\end{equation}</p>
<p>However, while this second definition certainly guarantees that <script type="math/tex">f \sim g</script> implies that <script type="math/tex">f</script> and <script type="math/tex">g</script> have the same level sets it is hiding the fundamental structures on which <em>monotonicity</em> is founded, level sets. Does this make life any easier when, given that <script type="math/tex">P \sim Q</script>, we attempt to show that <script type="math/tex">P</script> and <script type="math/tex">Q</script> share the
same level sets?</p>
<h3 id="a-better-definition">A better definition:</h3>
<p>Due to the conceptual importance of level sets, I think it’s important to explicitly define <em>monotonicity</em> in terms of level sets:</p>
<p><strong>Definition 3</strong>. Two continuous probability distributions <script type="math/tex">P : \mathbb{R}^n \rightarrow [0,1],</script> and <script type="math/tex">Q : \mathbb{R}^n \rightarrow [0,1],</script> are said to be <em>monotone</em>, in short <script type="math/tex">P \bowtie Q,</script> if given <script type="math/tex">\exists w,x,y,z \in [0,1]</script> such that:</p>
<p>\begin{equation}
(w-x) \cdot (y-z) \neq 0
\end{equation}</p>
<p>\begin{equation}
\lvert l_w^P \rvert \cdot \lvert l_x^P \rvert \cdot \lvert l_y^Q \rvert \cdot \lvert l_z^Q \rvert \neq 0
\end{equation}</p>
<p>then <script type="math/tex">P</script> and <script type="math/tex">Q</script> necessarily satisfy the following inequality:</p>
<p>\begin{equation}
\forall (\alpha,\beta) \in l_w^P \cup l_y^Q \times l_x^P \cup l_z^Q , (P(\beta)-P(\alpha)) \cdot (Q(\beta)-Q(\alpha)) > 0
\end{equation}</p>
<p>and it can be easily shown that <script type="math/tex">P \bowtie Q</script> implies that they have the same level sets and that they are <em>similarly-ordered</em> whereas the converse is false.</p>
<p>The value of the last definition will become clear in our proof that <script type="math/tex">P \bowtie Q</script> implies that <script type="math/tex">P</script> and <script type="math/tex">Q</script> have the same level sets.</p>
<h2 id="p-bowtie-q-implies-that-p-and-q-have-the-same-level-sets"><script type="math/tex">P \bowtie Q</script> implies that <script type="math/tex">P</script> and <script type="math/tex">Q</script> have the same level sets:</h2>
<p>Let’s suppose <script type="math/tex">P \bowtie Q</script> and without loss of generality let’s suppose there exists <script type="math/tex">l_z^Q \neq \emptyset</script> such that <script type="math/tex">P</script> is not level on <script type="math/tex">l_z^Q</script>. It follows that <script type="math/tex">P</script> varies on <script type="math/tex">l_z^Q</script> and due to the continuity of <script type="math/tex">P</script> there exists <script type="math/tex">\lambda \in [0,1]</script> and <script type="math/tex">\epsilon > 0</script> such that:</p>
<p>\begin{equation}
\lvert l_z^Q \cap l_{\lambda + \epsilon}^P \rvert \cdot \lvert l_z^Q \cap l_{\lambda}^P \rvert \cdot \lvert l_z^Q \cap l_{\lambda - \epsilon}^P \rvert \neq 0
\end{equation}</p>
<p>so whatever the value of:</p>
<p>\begin{equation}
Q(\beta)-Q(\alpha) \neq 0
\end{equation}</p>
<p>we may set <script type="math/tex">l_w^P := l_{\lambda}^P</script> and sample <script type="math/tex">\beta</script> from either <script type="math/tex">l_{\lambda + \epsilon}^P</script> or <script type="math/tex">l_{\lambda - \epsilon}^P</script> so we have:</p>
<p>\begin{equation}
\forall (\alpha, \beta) \in l_{\lambda}^P \times l_{\lambda + \epsilon}^P, P(\beta)-P(\alpha)=\epsilon > 0
\end{equation}</p>
<p>or</p>
<p>\begin{equation}
\forall (\alpha, \beta) \in l_{\lambda}^P \times l_{\lambda-\epsilon}^P, P(\beta)-P(\alpha)=-\epsilon < 0
\end{equation}</p>
<p>one of which contradicts (10) and so our proof is complete.</p>
<h2 id="discussion">Discussion:</h2>
<p>I have actually considered Polyá, Hardy and Littlewood’s definition and example(problem 236) on page 168 of their book on inequalities but given that my definition of <em>monotone</em> implies <em>similarly-ordered</em> I am looking for examples where the strictly weaker ‘similarly-ordered’ criterion might be more pragmatic in certain settings.
Under what circumstances might equivalence of level sets be unnecessary? Some mathematicians have suggested that I should call my definition <em>strictly similarly-ordered</em>
but that’s a mouthful and it still isn’t clear to me what is the marginal value of the weaker criterion.</p>
<p>I’d actually be very interested in learning more about the motivations of Hardy, Polyá and Littlewood. Even a modern expository article on this subject would be very useful as I consider this idea very powerful for solving problems.</p>
<h3 id="references">References:</h3>
<ol>
<li>Heinz J. Skala. On the characterization of certain similarly ordered supper-additive functionals. 1996.</li>
<li>G. H. Hardy, J. E. Littlewood and G. Polya, Inequalities, Cambridge Univ. Press, Cambridge (1934).</li>
</ol>Aidan RockeMotivating the monotone relation:A constructive proof of the Vitali Covering Lemma2019-06-14T00:00:00+00:002019-06-14T00:00:00+00:00/real/analysis/2019/06/14/vitali<h2 id="theorem">Theorem:</h2>
<p>Let <script type="math/tex">\{B_i\}_{i=1}^n</script> be a finite collection of balls in <script type="math/tex">\mathbb{R}^d</script>. Then there exists a sub-collection of balls <script type="math/tex">\{B_{j_i}\}_{i=1}^m</script>
that are disjoint and satisfy:</p>
<p>\begin{equation}
\bigcup_{i=1}^n B_i \subseteq \bigcup_{i=1}^m 3 \cdot B_{j_i}
\end{equation}</p>
<h2 id="demonstration">Demonstration:</h2>
<p>Let’s define:</p>
<p>\begin{equation}
\mathcal{B_1} := \bigcup_{i=1}^n B_i
\end{equation}</p>
<p>such that we re-index <script type="math/tex">B_i</script> so we have:</p>
<p>\begin{equation}
\mathcal{B_{1,1}} := B_1
\end{equation}</p>
<p>\begin{equation}
Vol(\mathcal{B_{1,i}}) \geq Vol(\mathcal{B_{1,i+1}})
\end{equation}</p>
<p>and given <script type="math/tex">\mathcal{B_1}</script> we may define:</p>
<p>\begin{equation}
C_1 = \{\mathcal{B_{1,j}}: \mathcal{B_{1,j}} \cap \mathcal{B_{1,1}} \neq \emptyset \}
\end{equation}</p>
<p>so we have:</p>
<p>\begin{equation}
C_1 \subseteq 3 \cdot \mathcal{B_{1,1}}
\end{equation}</p>
<p>Now, using <script type="math/tex">\mathcal{B_i}</script> and <script type="math/tex">C_i</script> we may construct the following:</p>
<p>\begin{equation}
\mathcal{B_{i+1}} = \mathcal{B_i} \setminus C_i
\end{equation}</p>
<p>\begin{equation}
\lvert \mathcal{B_{i+1}} \rvert < \lvert \mathcal{B_{i}} \rvert
\end{equation}</p>
<p>\begin{equation}
Vol(\mathcal{B_{i,j}}) > Vol(\mathcal{B_{i,j+1}})
\end{equation}</p>
<p>and by induction we have:</p>
<p>\begin{equation}
\bigcup_{i=1}^n B_i \subseteq \bigcup_{j=1}^m 3 \cdot \mathcal{B_{j,1}}
\end{equation}</p>
<p>where <script type="math/tex">m</script> is the smallest integer such that <script type="math/tex">\lvert \mathcal{B_{m+1}} \rvert = 0</script>.</p>Aidan RockeTheorem:False Dichotomies2019-05-21T00:00:00+00:002019-05-21T00:00:00+00:00/logic/2019/05/21/false-dichotomies<p>My philosophy of science, if I have one, can be summarised by the principle that we should ensure that our intellectual
constructs aren’t merely diversions. One way I apply this principle is by trying to work out my own solution to problems
before reading the accepted scientific solution. If the previous approach isn’t applicable, I try and determine
empirically and/or analytically whether we are trying to fit circular pegs into square holes.</p>
<p>This is frequently the case with dichotomies, which are very often mere figments of your imagination, and to illustrate
my point I shall provide a few examples.</p>
<ol>
<li>
<p>All organisms are either terrestrial or not terrestrial.</p>
<p>What about amphibians?</p>
</li>
<li>
<p>Humans have free will or they don’t have free will.</p>
<p>Here we are assuming that ‘free will’ is a scientifically useful notion although it is grounded in introspection and not empirical observation.
We can define the spatial freedom of a Newtonian particle in some sense but the meaning of ‘free will’ is sufficiently flexible to survive any
experimental test.</p>
<p>Free will is a metaphysical idea and therefore outside the domain of science.</p>
</li>
<li>
<p>Benjamin is a good person or he isn’t a good person.</p>
<p>Most people are complicated characters that don’t fit into simplistic Disney categories. Benjamin might be a great scientist but
a nasty football player on weekends. If you ask his mates that play football they’ll tell you that he’s an ass and if you ask his
scientific colleagues they’ll tell you that he’s a great person.</p>
<p>Which account is true? On one level it depends who you ask. On another level, the ‘good person’ category is much too simplistic to
describe people.</p>
</li>
<li>
<p>Anne is conscious or not conscious.</p>
<p>This intellectual construct is similar to ‘free will’ in the sense that it isn’t something we can observe empirically. We can have a
vague notion of an internal mental model of ourself in our environment but so does a fish or a rat. So how does consciousness set us
apart from any organism capable of adapting to its environment?</p>
<p>In fairness to human knowledge, consciousness and free will are part of a pre-scientific and anthropocentric view of the Universe.</p>
</li>
<li>
<p>An elephant is either less than 100 m long or more than 100 m long.</p>
<p>In this case we are trying to ascribe a length to an object that has three dimensions so there isn’t a unique method for measuring an
elephant. In fact, there is an infinite number of ways to measure the length of an object with more than one dimension.</p>
<p>It follows that in this circumstance, like the others, our intellectual construct is merely a diversion.</p>
</li>
</ol>
<p>The reader might wonder what stimulated this reflection. Well, a couple weeks ago I reflected upon Luitzen Brouwer’s criticism of the law of
excluded middle. This law basically states that for any proposition, either that proposition is true or its negation is true. Intuitively,
it makes sense but I provided five concrete examples where the law isn’t applicable.</p>
<p>Once in a while it’s useful to reconsider the things we take for granted.</p>Aidan RockeMy philosophy of science, if I have one, can be summarised by the principle that we should ensure that our intellectual constructs aren’t merely diversions. One way I apply this principle is by trying to work out my own solution to problems before reading the accepted scientific solution. If the previous approach isn’t applicable, I try and determine empirically and/or analytically whether we are trying to fit circular pegs into square holes.Mimesis as random graph coloring, Part I2019-04-29T00:00:00+00:002019-04-29T00:00:00+00:00/self-organisation/2019/04/29/mimetic-I<h2 id="motivation">Motivation:</h2>
<p>While reading ‘The physics of brain network structure, function and control’ by Chris Lynn and Dani Bassett I learned that the statistical mechanics
of complex networks by Réka Albert & Albert-László Barabási was an essential reference [1,2]. But, I didn’t know much graph theory and even less about random graphs, which play a central role in that reference. In order to develop an intuition about random graphs I decided to map this idea to a phenomenon I observed on a daily basis at all levels of society, mimetic behavior.</p>
<p>Here, I propose a simple and tractable mechanism for mimetic desire [3]. When we change our beliefs, we do so not because of their intrinsic value. Our desire to switch from belief <script type="math/tex">A</script> to belief <script type="math/tex">B</script> is proportional to the number of adherents of belief <script type="math/tex">B</script> that we know. Technically, I modeled the problem of two conflicting beliefs that propagate through a network with <script type="math/tex">N</script> nodes in a decentralised manner.</p>
<p>Using vertex notation, two individuals <script type="math/tex">v_i</script> and <script type="math/tex">v_j</script> with identical beliefs are connected with probability <script type="math/tex">q</script>, and <script type="math/tex">1-q</script> otherwise. <script type="math/tex">v_i</script> changes its belief
with a probability proportional to the number of nodes connected to <script type="math/tex">v_i</script> that have opposing views. Two key motivating questions are:</p>
<ol>
<li>Under what circumstances does a belief get completely wiped out.</li>
<li>Under what circumstances does a belief completely dominate(i.e. wipe out) all other beliefs.</li>
</ol>
<p>In the scenario where there are only two possible beliefs these two questions are equivalent and I show that on average it’s sufficient that
<script type="math/tex">q > 1-q</script> and that initially, one belief has a greater number of adherents than the other.</p>
<h2 id="representation-of-the-problem">Representation of the problem:</h2>
<h3 id="virtual-weights-as-a-representation-of-potential-connections">Virtual weights as a representation of potential connections:</h3>
<p>Nodes carrying the first belief were assigned to the set of red vertices, <script type="math/tex">R</script>, and nodes carrying the second belief were assigned to the
set of blue vertices, <script type="math/tex">B</script>. However, I wasn’t satisfied with this representation.</p>
<p>After some reflection, I chose <script type="math/tex">+1</script> and <script type="math/tex">-1</script> as labels. The reason being that a change of belief using this representation would be equivalent
to multiplication by <script type="math/tex">-1</script>. As a result, the <script type="math/tex">N</script> vertices could be represented by an N-dimensional vector:</p>
<p>\begin{equation}
\vec{v} \in \{-1,1\}^N
\end{equation}</p>
<p>where <script type="math/tex">N= \lvert v_i \in R \rvert + \lvert v_j \in B \rvert</script>.</p>
<p>Using this representation, between each pair of vertices we may define a virtual weight matrix <script type="math/tex">W</script>:</p>
<p>\begin{equation}
w_{ij} = v_i \cdot v_j
\end{equation}</p>
<p>where <script type="math/tex">w_{ij}=+1</script> implies identical beliefs and we have <script type="math/tex">w_{ij}=-1</script> otherwise.</p>
<p>Now, we note that <script type="math/tex">W</script> may be conveniently decomposed as follows:</p>
<p>\begin{equation}
W= W^+ + W^-
\end{equation}</p>
<p>where <script type="math/tex">W^-</script> denotes potential connections between nodes of different color and <script type="math/tex">W^+</script> denotes potential connections between nodes
of identical colors.</p>
<h3 id="modelling-the-adjacency-matrix-as-a-combination-of-random-matrices">Modelling the adjacency matrix as a combination of random matrices:</h3>
<p>In order to simulate variations in connectivity we may assume that nodes of the same color are connected with probability:</p>
<p>\begin{equation}
\frac{1}{2} < q < 1
\end{equation}</p>
<p>and nodes of different color are connected with probability <script type="math/tex">1-q</script>.</p>
<p>Given <script type="math/tex">W</script> we may therefore construct the adjacency matrix <script type="math/tex">A</script> by sampling random matrices:</p>
<p>\begin{equation}
M_1, M_2 \sim \mathcal{U}([0,1])^{N \times N}
\end{equation}</p>
<p>\begin{equation}
M^+ = 1_{[0,q)} \circ M_1
\end{equation}</p>
<p>\begin{equation}
M^- = 1_{(1-q,1]} \circ M_2
\end{equation}</p>
<p>where <script type="math/tex">1_{[0,q)}</script> denotes the characteristic function over the set <script type="math/tex">[0,q)</script> and then we compute the Hadamard products:</p>
<p>\begin{equation}
A^+ = M^+ \cdot W^+
\end{equation}</p>
<p>\begin{equation}
A^- = M^- \cdot W^-
\end{equation}</p>
<p>so the adjacency matrix is given by <script type="math/tex">A = A^+ + A^-</script>.</p>
<p>Now, in order to simulate variations in connectivity we simply use majority vote:</p>
<p>\begin{equation}
p(v_i^{n+1}=v_i^n) = \frac{\bar{N_i}}{N_i}
\end{equation}</p>
<p>\begin{equation}
p(v_i^{n+1}=-1 \cdot v_i^n) = 1- \frac{\bar{N_i}}{N_i}
\end{equation}</p>
<p>\begin{equation}
\bar{N_i} = \lvert A(i,-) > 0 \rvert -1
\end{equation}</p>
<p>\begin{equation}
N_i = \bar{N_i} + \lvert A(i,-) < 0 \rvert
\end{equation}</p>
<p>where <script type="math/tex">\lvert A(i,-) > 0 \rvert -1</script> denotes the number of connections between <script type="math/tex">v_i</script> and nodes sharing the same belief without
counting a connection to itself.</p>
<h2 id="simulation">Simulation:</h2>
<p>Putting everything together with Julia we may create the following mimesis function which takes as input the number of nodes in
the random graph <script type="math/tex">n</script>, the number of iterates of the system <script type="math/tex">N</script>, the fraction of nodes that are red <script type="math/tex">p</script>, and finally the</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="k">function</span><span class="nf"> mimesis</span><span class="x">(</span><span class="n">n</span><span class="o">::</span><span class="kt">Int64</span><span class="x">,</span><span class="n">N</span><span class="o">::</span><span class="kt">Int64</span><span class="x">,</span><span class="n">p</span><span class="o">::</span><span class="kt">Float64</span><span class="x">,</span><span class="n">q</span><span class="o">::</span><span class="kt">Float64</span><span class="x">)</span>
<span class="sb">``</span><span class="err">`</span>
<span class="n">n</span><span class="x">:</span> <span class="n">number</span> <span class="n">of</span> <span class="n">nodes</span> <span class="k">in</span> <span class="n">the</span> <span class="n">random</span> <span class="n">graph</span>
<span class="n">N</span><span class="x">:</span> <span class="n">number</span> <span class="n">of</span> <span class="n">iterations</span> <span class="n">of</span> <span class="n">the</span> <span class="n">random</span> <span class="n">graph</span> <span class="n">coloring</span> <span class="n">process</span>
<span class="n">p</span><span class="x">:</span> <span class="n">the</span> <span class="n">fraction</span> <span class="n">of</span> <span class="n">nodes</span> <span class="n">that</span> <span class="n">are</span> <span class="n">red</span>
<span class="n">q</span><span class="x">:</span> <span class="n">the</span> <span class="n">probability</span> <span class="n">that</span> <span class="n">nodes</span> <span class="n">of</span> <span class="n">the</span> <span class="n">same</span> <span class="n">color</span> <span class="n">form</span> <span class="n">connections</span>
<span class="sb">``</span><span class="err">`</span>
<span class="c">## initialisation:</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">zeros</span><span class="x">(</span><span class="n">N</span><span class="x">,</span><span class="n">n</span><span class="x">)</span>
<span class="n">color_ratio</span> <span class="o">=</span> <span class="n">zeros</span><span class="x">(</span><span class="n">N</span><span class="x">)</span>
<span class="c">## generate positive terms:</span>
<span class="n">v1</span> <span class="o">=</span> <span class="x">(</span><span class="n">rand</span><span class="x">(</span><span class="n">n</span><span class="x">)</span> <span class="o">.<</span> <span class="n">p</span><span class="x">)</span><span class="o">*</span><span class="mf">1.0</span>
<span class="c">## generate negative terms:</span>
<span class="n">v2</span> <span class="o">=</span> <span class="n">ones</span><span class="x">(</span><span class="n">n</span><span class="x">)</span> <span class="o">-</span> <span class="n">v1</span> <span class="c">## negative terms</span>
<span class="n">v</span><span class="x">[</span><span class="mi">1</span><span class="x">,:]</span> <span class="o">=</span> <span class="n">v1</span> <span class="o">-</span> <span class="n">v2</span>
<span class="n">color_ratio</span><span class="x">[</span><span class="mi">1</span><span class="x">]</span> <span class="o">=</span> <span class="n">sum</span><span class="x">((</span><span class="n">v</span><span class="x">[</span><span class="mi">1</span><span class="x">,:]</span> <span class="o">.></span> <span class="mi">0</span><span class="x">)</span><span class="o">*</span><span class="mf">1.0</span><span class="x">)</span><span class="o">/</span><span class="n">n</span>
<span class="k">for</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">1</span><span class="x">:</span><span class="n">N</span><span class="o">-</span><span class="mi">1</span>
<span class="n">W</span> <span class="o">=</span> <span class="n">zeros</span><span class="x">(</span><span class="n">n</span><span class="x">,</span><span class="n">n</span><span class="x">)</span>
<span class="k">for</span> <span class="n">j</span> <span class="o">=</span><span class="mi">1</span><span class="x">:</span><span class="n">n</span>
<span class="k">for</span> <span class="n">k</span> <span class="o">=</span><span class="mi">1</span><span class="x">:</span><span class="n">n</span>
<span class="c">## ignore self-connections</span>
<span class="n">W</span><span class="x">[</span><span class="n">j</span><span class="x">,</span><span class="n">k</span><span class="x">]</span> <span class="o">=</span> <span class="n">v</span><span class="x">[</span><span class="n">i</span><span class="x">,</span><span class="n">j</span><span class="x">]</span><span class="o">*</span><span class="n">v</span><span class="x">[</span><span class="n">i</span><span class="x">,</span><span class="n">k</span><span class="x">]</span><span class="o">*</span><span class="x">(</span><span class="n">j</span> <span class="o">!=</span> <span class="n">k</span><span class="x">)</span><span class="o">*</span><span class="mf">1.0</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">if</span> <span class="n">abs</span><span class="x">(</span><span class="n">sum</span><span class="x">(</span><span class="n">v</span><span class="x">[</span><span class="n">i</span><span class="x">,:]))</span> <span class="o"><</span> <span class="n">n</span>
<span class="c">## split W into positive and negative parts...</span>
<span class="n">Z1</span> <span class="o">=</span> <span class="x">(</span><span class="n">W</span> <span class="o">.<</span> <span class="mi">0</span><span class="x">)</span><span class="o">*</span><span class="mf">1.0</span> <span class="c">## vertices with different colors</span>
<span class="n">Z2</span> <span class="o">=</span> <span class="n">W</span> <span class="o">+</span> <span class="n">Z1</span> <span class="c">## vertices with the same color</span>
<span class="c">## sample random zero-one matrices for new connections:</span>
<span class="n">M1</span> <span class="o">=</span> <span class="x">(</span><span class="n">rand</span><span class="x">(</span><span class="n">n</span><span class="x">,</span><span class="n">n</span><span class="x">)</span> <span class="o">.<</span> <span class="n">q</span><span class="x">)</span><span class="o">*</span><span class="mf">1.0</span>
<span class="n">M2</span> <span class="o">=</span> <span class="x">(</span><span class="n">rand</span><span class="x">(</span><span class="n">n</span><span class="x">,</span><span class="n">n</span><span class="x">)</span> <span class="o">.<</span> <span class="mi">1</span><span class="o">-</span><span class="n">q</span><span class="x">)</span><span class="o">*</span><span class="mf">1.0</span>
<span class="c">## use Hadamard products to construct new adjacency matrices:</span>
<span class="n">A1</span> <span class="o">=</span> <span class="n">M1</span> <span class="o">.*</span> <span class="n">Z2</span>
<span class="n">A2</span> <span class="o">=</span> <span class="n">M2</span> <span class="o">.*</span> <span class="n">Z1</span>
<span class="n">A</span> <span class="o">=</span> <span class="n">A1</span> <span class="o">-</span> <span class="n">A2</span>
<span class="c">## generate probabilities for color transformations:</span>
<span class="n">M_hat</span> <span class="o">=</span> <span class="x">[</span><span class="n">sum</span><span class="x">(</span><span class="n">A</span><span class="x">[</span><span class="n">i</span><span class="x">,:]</span> <span class="o">.></span> <span class="mi">0</span><span class="x">)</span> <span class="k">for</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">1</span><span class="x">:</span><span class="n">n</span><span class="x">]</span>
<span class="n">M</span> <span class="o">=</span> <span class="n">M_hat</span> <span class="o">+</span> <span class="x">[</span><span class="n">sum</span><span class="x">(</span><span class="n">A</span><span class="x">[</span><span class="n">i</span><span class="x">,:]</span> <span class="o">.<</span> <span class="mi">0</span><span class="x">)</span> <span class="k">for</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">1</span><span class="x">:</span><span class="n">n</span><span class="x">]</span>
<span class="n">P</span> <span class="o">=</span> <span class="x">[</span><span class="n">M_hat</span><span class="x">[</span><span class="n">i</span><span class="x">]</span><span class="o">/</span><span class="x">(</span><span class="n">M</span><span class="x">[</span><span class="n">i</span><span class="x">]</span><span class="o">+</span><span class="mf">1.0</span><span class="x">)</span> <span class="k">for</span> <span class="n">i</span> <span class="o">=</span><span class="mi">1</span><span class="x">:</span><span class="n">n</span><span class="x">]</span>
<span class="c">## zero-one vectors based on P:</span>
<span class="n">p_q</span> <span class="o">=</span> <span class="x">(</span><span class="n">rand</span><span class="x">(</span><span class="n">n</span><span class="x">)</span> <span class="o">.<</span> <span class="n">P</span><span class="x">)</span><span class="o">*</span><span class="mf">1.0</span>
<span class="n">Q1</span> <span class="o">=</span> <span class="n">p_q</span> <span class="o">.*</span> <span class="n">v</span><span class="x">[</span><span class="n">i</span><span class="x">,:]</span>
<span class="n">Q2</span> <span class="o">=</span> <span class="o">-</span><span class="mf">1.0</span><span class="o">*</span><span class="x">(</span><span class="n">ones</span><span class="x">(</span><span class="n">n</span><span class="x">)</span> <span class="o">.-</span> <span class="n">p_q</span><span class="x">)</span> <span class="o">.*</span> <span class="n">v</span><span class="x">[</span><span class="n">i</span><span class="x">,:]</span>
<span class="n">v</span><span class="x">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="x">,:]</span> <span class="o">=</span> <span class="n">Q1</span> <span class="o">+</span> <span class="n">Q2</span>
<span class="n">color_ratio</span><span class="x">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="x">]</span> <span class="o">=</span> <span class="n">sum</span><span class="x">((</span><span class="n">v</span><span class="x">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="x">,:]</span> <span class="o">.></span> <span class="mi">0</span><span class="x">)</span><span class="o">*</span><span class="mf">1.0</span><span class="x">)</span><span class="o">/</span><span class="n">n</span>
<span class="k">else</span>
<span class="n">v</span><span class="x">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="x">,:]</span> <span class="o">=</span> <span class="n">v</span><span class="x">[</span><span class="n">i</span><span class="x">,</span><span class="mi">1</span><span class="x">]</span><span class="o">*</span><span class="n">ones</span><span class="x">(</span><span class="n">n</span><span class="x">)</span>
<span class="n">color_ratio</span><span class="x">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="x">]</span> <span class="o">=</span> <span class="n">v</span><span class="x">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="x">,</span><span class="mi">1</span><span class="x">]</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">return</span> <span class="n">color_ratio</span>
<span class="k">end</span>
</code></pre></div></div>
<p>The above function may be called as follows for example:</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">c_ratio</span> <span class="o">=</span> <span class="n">mimesis</span><span class="x">(</span><span class="mi">10</span><span class="x">,</span> <span class="mi">100</span><span class="x">,</span> <span class="mf">0.1</span><span class="x">,</span><span class="mf">0.6</span><span class="x">);</span>
</code></pre></div></div>
<h2 id="analysis">Analysis:</h2>
<h3 id="the-expected-number-of-neighbors">The expected number of neighbors:</h3>
<p>If we denote the number of red vertices at instant <script type="math/tex">n</script> by <script type="math/tex">\alpha_n</script> and the number of blue vertices by <script type="math/tex">\beta_n</script> we may observe that the expected number of
neighbors is given by:</p>
<p>\begin{equation}
\langle N(v_i \in R) \rangle = q \cdot (\alpha_n -1) + (1-q) \cdot \beta_n
\end{equation}</p>
<p>\begin{equation}
\langle N(v_i \in B) \rangle = q \cdot (\beta_n -1) + (1-q) \cdot \alpha_n
\end{equation}</p>
<p>Using the above equations we may define:</p>
<p>\begin{equation}
\langle \alpha_{n+1} \rangle = \alpha_n \left( \frac{q \cdot (\alpha_n -1)}{q \cdot (\alpha_n -1) + (1-q) \cdot \beta_n} \right) + \beta_n \left(\frac{(1-q) \cdot \alpha_n}{q \cdot (\beta_n -1) + (1-q) \cdot \alpha_n} \right)
\end{equation}</p>
<p>and we may deduce that <script type="math/tex">\langle \beta_{n+1} \rangle = N - \langle \alpha_{n+1} \rangle</script>. It must be noted that this is an approximation
that works particularly well when <script type="math/tex">% <![CDATA[
0< q < 0.2 %]]></script> or when <script type="math/tex">% <![CDATA[
0.8 \leq q < 1.0 %]]></script>.</p>
<h3 id="alpha_n--beta_n-implies-that-limlimits_n-to-infty-langle-alpha_n-rangle--n"><script type="math/tex">\alpha_n > \beta_n</script> implies that <script type="math/tex">\lim\limits_{n \to \infty} \langle \alpha_n \rangle = N</script>:</h3>
<p>Assuming that <script type="math/tex">q > 1-q</script>, a simple calculation shows that:</p>
<p>\begin{equation}
\langle \alpha_{n+1} \rangle - \alpha_n \geq 0 \iff \alpha_n \geq \beta_n
\end{equation}</p>
<p>and since:</p>
<p>\begin{equation}
\langle \alpha_{n+1} \rangle - \alpha_n= 0 \iff \beta_n = 0
\end{equation}</p>
<p>we may deduce that:</p>
<p>\begin{equation}
\lim\limits_{n \to \infty} \langle \alpha_{n} \rangle = N
\end{equation}</p>
<h3 id="analysis-of-delta-alpha">Analysis of <script type="math/tex">\Delta \alpha</script>:</h3>
<p>Using the fact that <script type="math/tex">\beta_n = N - \alpha_n</script> we may derive the following continuous-space variant of <script type="math/tex">\Delta \alpha_n = \langle \alpha_{n+1} \rangle - \alpha_n</script>:</p>
<p>\begin{equation}
\Delta \alpha(\alpha,q) = \frac{q \cdot (\alpha^2 -\alpha)}{2q \cdot \alpha + N \cdot (1-q) - \alpha - q} + \frac{(1-q) \cdot \alpha \cdot (N-\alpha)}{q \cdot N -2 \cdot q \cdot \alpha + \alpha - q}
\end{equation}</p>
<p>Assuming that <script type="math/tex">N</script> is a constant(ex. <script type="math/tex">N=100</script>) I obtained the following graph for various values of <script type="math/tex">q</script>:</p>
<center><img src="https://raw.githubusercontent.com/Kepler-Lounge/blog_images/master/_images/evolution.png" width="75%" height="75%" align="middle" /></center>
<p>It’s interesting to note that the curve doesn’t exactly behave in a symmetric manner, which is a little bit surprising. Specifically, the behavior of <script type="math/tex">\Delta \alpha</script>
when <script type="math/tex">q=0.2</script> doesn’t resemble the behavior of <script type="math/tex">\Delta \alpha</script> when <script type="math/tex">q=0.8</script>.</p>
<h2 id="generalising-to-the-case-of-more-than-two-beliefs">Generalising to the case of more than two beliefs:</h2>
<p>After going through less satisfactory options, it occurred to me that in order to label <script type="math/tex">n</script> colors we may use the nth roots of unity:</p>
<p>\begin{equation}
S_n = \{e^{i \frac{2 k \pi}{n}} : k \in [0,n-1] \}
\end{equation}</p>
<p>In fact, we may note that <script type="math/tex">-1</script> and <script type="math/tex">+1</script> correspond to <script type="math/tex">S_2</script> and similarly we may define:</p>
<p>\begin{equation}
w_{ij}= v_i \cdot \bar{v_j}
\end{equation}</p>
<p>where <script type="math/tex">\bar{v_j}</script> indicates complex conjugation and <script type="math/tex">w_{ij}=1</script> implies that <script type="math/tex">v_i</script> and <script type="math/tex">v_j</script> have the same color.</p>
<p>I believe these last two equations, in addition to everything else I shared here, are a sufficiently strong basis to model mimetic behavior in networks where there are a lot more than two competing belief systems.</p>
<h3 id="note">Note:</h3>
<p>The random graph colouring representation used here was done independently of the random graph colouring conventions established by other mathematicians [4,5]. But, I must credit the original authors for the idea of random graph colouring as I probably wouldn’t have thought of using random graph colouring to model mimetic behaviour if
combinatorialists didn’t discuss this idea with me in the past. For an online discussion of the differences between my representation and the conventional random graph coloring representations, I recommend the reader to visit <a href="https://mathoverflow.net/questions/330375/comparison-of-random-graph-colouring-representations">the following MathOverflow question</a>.</p>
<h2 id="references">References:</h2>
<ol>
<li>Chris Lynn & Dani Bassett. The physics of brain network structure, function and control. 2019.</li>
<li>Réka Albert & Albert-László Barabási. Statistical mechanics of complex networks. 2002.</li>
<li>René Girard. Le Bouc émissaire. 1986.</li>
<li>P. Erdös and A. Rényi. On the evolution of random graphs. 1960.</li>
<li>G. Grimmett and C. McDiarmid. On colouring random graphs. 1975.</li>
</ol>Aidan RockeMotivation:Probability in High Dimension Part II2019-04-21T00:00:00+00:002019-04-21T00:00:00+00:00/probability/2019/04/21/high-dimension-prob-2<h2 id="motivation">Motivation:</h2>
<p>A couple weeks ago I was working on a problem that involved the expected value of a ratio of two random variables:</p>
<p>\begin{equation}
\mathbb{E}\big[\frac{X_n}{Z_n}\big] \approx \frac{\mu_{X_n}}{\mu_{Z_n}} - \frac{\mathrm{Cov}(X_n,Z_n)}{\mu_{Z_n}^2} + \frac{\mathrm{Var(Z_n)}\mu_{X_n}}{\mu_{Z_n}^3}
\end{equation}</p>
<p>where <script type="math/tex">Z_n</script> was defined as a sum of <script type="math/tex">n</script> i.i.d. random variables with a symmetric distribution centred at zero.</p>
<p>Everything about this approximation worked fine in computer simulations where <script type="math/tex">n</script> was large but mathematically there appeared to be a problem since:</p>
<p>\begin{equation}
\mathbb{E}\big[Z_n\big] = 0
\end{equation}</p>
<p>Given that (2) didn’t appear to be an issue in simulation, I went through the code several times to check whether there was an error
but found none. After thinking about the problem for a bit longer it occurred to me formalise the problem and analyse:</p>
<p>\begin{equation}
P(\sum_{n=1}^N a_n = 0)
\end{equation}</p>
<p>where <script type="math/tex">a_n</script> are i.i.d. random variables with a uniform distribution centred at zero so <script type="math/tex">\mathbb{E}[a_i]=0</script>. We may think of this as a measure-theoretic phenomenon in high-dimensional spaces where <script type="math/tex">N \in \mathbb{N}</script> is our dimension and <script type="math/tex">\vec{a} \in \mathbb{R}^N</script> is a random vector.</p>
<p>Now, while in a <a href="https://keplerlounge.com/probability/2019/04/20/high-dimension-prob-1.html">previous article</a> I analysed this problem as an infinite series for the special case of <script type="math/tex">a_i \sim \mathcal{U}(\{-1,1\})</script>, for the more general case of
<script type="math/tex">a_i \sim \mathcal{U}([-N,N])</script> where <script type="math/tex">[-N,N] \subset \mathbb{Z}</script> it occurred to me that modelling this problem as a random walk on <script type="math/tex">\mathbb{Z}</script> might be an effective
approach.</p>
<h2 id="a-random-walk-on-mathbbz">A random walk on <script type="math/tex">\mathbb{Z}</script>:</h2>
<p>Let’s suppose <script type="math/tex">a_i \sim \mathcal{U}([-N,N])</script> where <script type="math/tex">[-N,N] \subset \mathbb{Z}</script>. We may then define:</p>
<p>\begin{equation}
S_n = \sum_{i=1}^n a_i
\end{equation}</p>
<p>Due to the i.i.d. assumption we have:</p>
<p>\begin{equation}
\mathbb{E}\big[S_n\big]= n \cdot \mathbb{E}\big[a_i\big]=0
\end{equation}</p>
<p>We may now define:</p>
<p>\begin{equation}
u_n = P(S_n=0)
\end{equation}</p>
<p>and ask whether <script type="math/tex">u_n</script> is decreasing. In other words, what is the probability that we observe the expected value as <script type="math/tex">n</script> becomes large?</p>
<h2 id="small-and-large-deviations">Small and Large deviations:</h2>
<p>It’s useful to observe the following nested structure:</p>
<p>\begin{equation}
\forall k \in [0,N], \{\lvert S_n \rvert \leq k\} \subset \{\lvert S_n \rvert \leq k+1 \}
\end{equation}</p>
<p>From (7), we may deduce that:</p>
<p>\begin{equation}
P(\lvert S_n \rvert \leq N) + P(\lvert S_n \rvert > N) = 1
\end{equation}</p>
<p>So we are now ready to define the probability of a ‘small’ deviation:</p>
<p>\begin{equation}
\alpha_n = P(\lvert S_n \rvert \leq N)
\end{equation}</p>
<p>as well as the probability of ‘large’ deviations:</p>
<p>\begin{equation}
\beta_n = P(\lvert S_n \rvert > N)
\end{equation}</p>
<p>Additional motivation for analysing <script type="math/tex">\alpha_n</script> and <script type="math/tex">\beta_n</script> arises from:</p>
<p>\begin{equation}
P(S_{n+1}| \lvert S_n \rvert > N) = 0
\end{equation}</p>
<p>\begin{equation}
P(S_{n+1}| \lvert S_n \rvert \leq N) = \frac{1}{2N+1}
\end{equation}</p>
<p>Furthermore, by the law of total probability we have:</p>
<p>\begin{equation}
\begin{split}
P(S_{n+1}) & = \sum_{i=1}^N P(S_{n+1}|\lvert S_n \rvert \leq N) \cdot P(\lvert S_n \rvert \leq N) + P(S_{n+1}|\lvert S_n \rvert > N) \cdot P(\lvert S_n \rvert > N) \\
& = P(S_{n+1}|\lvert S_n \rvert \leq N) \cdot P(\lvert S_n \rvert \leq N) \\
& = \frac{P(\lvert S_n \rvert \leq N)}{2N+1}
\end{split}
\end{equation}</p>
<h2 id="a-remark-on-symmetry">A remark on symmetry:</h2>
<p>It’s useful to note the following alternative definitions of <script type="math/tex">\alpha_n</script> and <script type="math/tex">\beta_n</script> that emerge
due to symmetries intrinsic to the problem:</p>
<p>\begin{equation}
\beta_n = P(\lvert S_n \rvert > N) = 2 \cdot P(S_n > N) = 2 \cdot P(S_n < -N)
\end{equation}</p>
<p>\begin{equation}
\alpha_n = P(\lvert S_n \rvert \leq N) = 1-2 \cdot P(S_n > N)=1-2 \cdot P(S_n < -N)
\end{equation}</p>
<h2 id="the-case-of-n1-and-n2">The case of <script type="math/tex">n=1</script> and <script type="math/tex">n=2</script>:</h2>
<p>Given that <script type="math/tex">S_0=0</script>:</p>
<p>\begin{equation}
P(S_1=0)=\frac{P(\lvert S_0 \rvert \leq N)}{2N+1}= \frac{1}{2N+1}
\end{equation}</p>
<p>As for the case of <script type="math/tex">n=2</script>:</p>
<p>\begin{equation}
P(\lvert S_2 \rvert \leq N) =1 \implies P(S_2=0) = \frac{1}{2N+1}
\end{equation}</p>
<h2 id="the-case-of-n3">The case of <script type="math/tex">n=3</script>:</h2>
<p>The case of <script type="math/tex">n=3</script> requires that we calculate:</p>
<p>\begin{equation}
P(S_3=0)=\frac{P(\lvert S_2 \rvert \leq N)}{2N+1}= \frac{1}{2N+1}
\end{equation}</p>
<p>\begin{equation}
\begin{split}
P(S_{2} > N) & = P(S_{2}| S_1 = i) \cdot P( S_1 = i) \\
& = \frac{1}{2N+1} \sum_{i=1}^N (\frac{1}{2N+1} + … + \frac{N}{2N+1}) \\
& = \frac{N \cdot (N-1)}{2 \cdot (2N+1)^2}
\end{split}
\end{equation}</p>
<p>and using (19) we may derive <script type="math/tex">P(S_{2} \leq N)</script>:</p>
<p>\begin{equation}
\begin{split}
P(S_{2} \leq N) & = 1 - 2 \cdot P(S_{2} > N) \\
& = 1- \frac{N \cdot (N-1)}{(2N+1)^2} \\
& = \frac{3N^2+5N+1}{(2N+1)^2} \sim \frac{3}{4}
\end{split}
\end{equation}</p>
<p>and so for <script type="math/tex">n=3</script> we have:</p>
<p>\begin{equation}
\begin{split}
P(S_{3} = 0) & = P(S_{3} = 0 | \lvert S_2 \rvert \leq N) \cdot P(\lvert S_2 \rvert \leq N) \\
& = \frac{3N^2+5N+1}{(2N+1)^3} \sim \frac{3}{8N}
\end{split}
\end{equation}</p>
<h2 id="average-drift-or-why-ps_nk--ps_n--k1">Average drift or why <script type="math/tex">P(S_n=k) > P(S_n = k+1)</script>:</h2>
<p>It’s useful to note that we may decompose <script type="math/tex">n</script> into:</p>
<p>\begin{equation}
n = \hat{n} + n_z
\end{equation}</p>
<p>where <script type="math/tex">\hat{n}</script> represents the total number of positive and negative terms, ignoring the null
contribution of zero terms <script type="math/tex">n_z</script>.</p>
<p>For the above reason, it’s convenient to decompose <script type="math/tex">S_n</script> into:</p>
<p>\begin{equation}
S_n = S_n^+ + S_n^{-}
\end{equation}</p>
<p>where <script type="math/tex">S_n^+</script> defines the sum of the positive terms and <script type="math/tex">S_n^{-}</script> defines the sum
of the negative terms.</p>
<p>By grouping the terms in the manner of (23) we may observe that when
<script type="math/tex">\hat{n}</script> is large the average positive/negative step length is given by:</p>
<p>\begin{equation}
\Delta = \frac{N}{2}
\end{equation}</p>
<p>so that if <script type="math/tex">\tau</script> positive steps and <script type="math/tex">\hat{n}-\tau</script> negative steps are taken:</p>
<p>\begin{equation}
\mathbb{E}[S_n^+] = \tau \cdot \Delta
\end{equation}</p>
<p>\begin{equation}
\mathbb{E}[S_n^-] = (\hat{n}-\tau) \cdot (-\Delta)
\end{equation}</p>
<p>\begin{equation}
\mathbb{E}[S_n] = \mathbb{E}[S_n^+] + \mathbb{E}[S_n^-] = \Delta \cdot (2\tau-\hat{n})
\end{equation}</p>
<p>and we note that:</p>
<p>\begin{equation}
\mathbb{E}[S_n] \geq 0 \implies \tau \geq \lfloor \frac{\hat{n}}{2} \rfloor
\end{equation}</p>
<p>Furthermore, due to symmetry:</p>
<p>\begin{equation}
P(\lvert S_n \rvert =k) > P(\lvert S_n \rvert =k+1) \iff P(S_n =k) > P(S_n =k+1)
\end{equation}</p>
<p>so it suffices to demonstrate <script type="math/tex">P(S_n =k) > P(S_n =k+1)</script>.</p>
<p>In order to proceed with our demonstration we choose <script type="math/tex">\tau \in [\lfloor \frac{\hat{n}}{2} \rfloor + 1,\hat{n}N-1]</script> and find
that <script type="math/tex">P</script> has a monotone relationship with the binomial distribution:</p>
<p>\begin{equation}
P(S_{n} = \lfloor \Delta \cdot (2\tau-\hat{n}) \rfloor) \propto {\hat{n} \choose \tau} \frac{1}{2^{\hat{n}}}
\end{equation}</p>
<p>where <script type="math/tex">\tau \geq \lfloor \frac{\hat{n}}{2} \rfloor</script> implies that:</p>
<p>\begin{equation}
\forall k \geq 0, \frac{P(S_n=k)}{P(S_n=k+1)} \sim \frac{(\tau+1)!(\hat{n}-\tau-1)!}{\tau!(\hat{n}-\tau)!} = \frac{\tau+1}{\hat{n}-\tau} > 1
\end{equation}</p>
<p>which holds for all <script type="math/tex">n_z \leq n</script>.</p>
<p><strong>Note:</strong> I wrote a <a href="https://gist.github.com/AidanRocke/a4898097ce572bc8bc5a977fcbda6ed8">julia function that provides experimental evidence</a> for equation (30).</p>
<h2 id="proof-that-u_n-is-decreasing">Proof that <script type="math/tex">u_n</script> is decreasing:</h2>
<p>Given (13) we may derive the following ratio:</p>
<p>\begin{equation}
\frac{u_{n+1}}{u_n} = \frac{P(\lvert S_n \rvert \leq N)}{(2N+1) \cdot P(S_n = 0)}
\end{equation}</p>
<p>So in order to prove that <script type="math/tex">u_n</script> is decreasing we must show that:</p>
<p>\begin{equation}
P(\lvert S_n \rvert \leq N) < (2N+1) \cdot P(S_n=0)
\end{equation}</p>
<p>and we note that this follows immediately from (31) since:</p>
<p>\begin{equation}
P(\lvert S_n \rvert \leq N) = 2 \sum_{k=1}^N P(S_n=k) + P(S_n=0) < (2N+1) \cdot P(S_n=0)
\end{equation}</p>
<h2 id="proof-that-limlimits_n-to-infty-u_n--limlimits_n-to-infty-alpha_n--0">Proof that <script type="math/tex">\lim\limits_{n \to \infty} u_n = \lim\limits_{n \to \infty} \alpha_n = 0</script>:</h2>
<p>Now, given (34) we may define:</p>
<p>\begin{equation}
\forall N \in \mathbb{N}, q_n = \frac{P(\lvert S_n \rvert \leq N)}{(2N+1)P(S_n=0)} < 1
\end{equation}</p>
<p>We may easily show that <script type="math/tex">q_n</script> is decreasing and therefore:</p>
<p>\begin{equation}
\lim_{n \to \infty} \frac{P(S_{n+1}=0)}{P(S_1=0)} = \prod_{n=1}^\infty \frac{P(S_{n+1}=0)}{P(S_n=0)} = \prod_{n=1}^\infty q_n = 0
\end{equation}</p>
<p>so we may deduce that <script type="math/tex">u_n</script> decreases exponentially fast and that:</p>
<p>\begin{equation}
\lim_{n \to \infty} u_n = \lim_{n \to \infty} P(S_{n+1}=0) = \frac{0}{2N+1}=0
\end{equation}</p>
<p>Likewise, given that:</p>
<p>\begin{equation}
\alpha_n = P(\lvert S_n \rvert \leq N) = (2N+1) \cdot P(S_{n+1}=0)
\end{equation}</p>
<p>we may conclude that large deviations are exponentially more likely as <script type="math/tex">n</script> becomes large:</p>
<p>\begin{equation}
\lim_{n \to \infty} \alpha_n = \lim_{n \to \infty} (2N+1) \cdot P(S_{n+1}=0) = 0
\end{equation}</p>
<p>\begin{equation}
\lim_{n \to \infty} \beta_n = \lim_{n \to \infty} P(\lvert S_n \rvert > N) = \lim_{n \to \infty} 1 - \alpha_n = 1
\end{equation}</p>
<p>One interpretation of the last two limits is that the mass of the discrete hypercube moves away from the centre and towards the corners
which is a concentration-of-measure phenomenon.</p>
<h2 id="discussion">Discussion:</h2>
<p>I find it quite surprising that random structures, in this case a random walk, are useful for analysing high-dimensional systems.
Indeed, I have to say that for such a general result thirty four equations isn’t much. But, what about the case of uniform distributions
on closed intervals of the form <script type="math/tex">[-N,N] \subset \mathbb{R}</script>?</p>
<p>It’s useful to note that <script type="math/tex">[-N,N]^n \subset \mathbb{R}^n</script> defines the hypercube with volume <script type="math/tex">(2N)^n</script> and I suspect that in the continuous
setting, hypercube geometry and convex analysis might be particularly insightful.</p>Aidan RockeMotivation:Probability in High Dimension Part I2019-04-20T00:00:00+00:002019-04-20T00:00:00+00:00/probability/2019/04/20/high-dimension-prob-1<h2 id="motivation">Motivation:</h2>
<p>A couple weeks ago I was working on a problem that involved the expected value of a ratio of two random variables:</p>
<p>\begin{equation}
\mathbb{E}\big[\frac{X_n}{Z_n}\big] \approx \frac{\mu_{X_n}}{\mu_{Z_n}} - \frac{\mathrm{Cov}(X_n,Z_n)}{\mu_{Z_n}^2} + \frac{\mathrm{Var(Z_n)}\mu_{X_n}}{\mu_{Z_n}^3}
\end{equation}</p>
<p>where <script type="math/tex">Z_n</script> was a sum of <script type="math/tex">n</script> i.i.d. random variables with a symmetric distribution centred at zero.</p>
<p>Everything about this approximation worked fine in computer simulations where <script type="math/tex">n</script> was large but mathematically there appeared to be a problem since:</p>
<p>\begin{equation}
\mathbb{E}\big[Z_n\big] = 0
\end{equation}</p>
<p>Given that (2) didn’t appear to be an issue in simulation, I went through the code several times to check whether there was an error
but found none. After thinking about the problem for a bit longer it occurred to me to formalise the problem and analyse:</p>
<p>\begin{equation}
P(\sum_{n=1}^N a_n = 0)
\end{equation}</p>
<p>where <script type="math/tex">a_n</script> are i.i.d. random variables with a uniform distribution centred at zero so <script type="math/tex">\mathbb{E}[a_i]=0</script>. My intuition suggested that under
relatively weak assumptions:</p>
<p>\begin{equation}
\lim_{N \to \infty} P(\sum_{n=1}^N a_n = 0) = 0
\end{equation}</p>
<p>We may think of this as a measure-theoretic phenomenon in high-dimensional spaces where <script type="math/tex">N \in \mathbb{N}</script> is our dimension and <script type="math/tex">\vec{a} \in \mathbb{R}^N</script> is a random vector.</p>
<h2 id="analysis-of-a-special-case">Analysis of a special case:</h2>
<p>Given that (3) is a very general problem, I decided to start by analysing the special case of <script type="math/tex">a_i \sim \mathcal{U}(\{-1,1\})</script> where:</p>
<p>\begin{equation}
\forall n \in \mathbb{N}, P(a_n=1)=P(a_n=-1)=\frac{1}{2}
\end{equation}</p>
<p>\begin{equation}
S_0 = \{ (a_n)_{n=1}^N \in \{-1,1\} : \sum_n a_n = 0\}
\end{equation}</p>
<p>Knowing that <script type="math/tex">S_0</script> is non-empty only if we have parity of positive and negative terms, we may deduce that:</p>
<p>\begin{equation}
S_0 \neq \emptyset \iff N \in 2\mathbb{N}
\end{equation}</p>
<p>For the above reason, I focused my analysis on the following sequence:</p>
<p>\begin{equation}
u_N = P(\sum_{n=1}^{2N} a_n = 0)= \frac{2N \choose N}{2^{2N}} = \frac{(2N)!}{2^{2N}(N!)^2}
\end{equation}</p>
<h2 id="proof-that-u_n-is-decreasing">Proof that <script type="math/tex">u_N</script> is decreasing:</h2>
<p>We can demonstrate that <script type="math/tex">u_N</script> is strictly decreasing by considering the ratio:</p>
<p>\begin{equation}
\frac{u_{N+1}}{u_N}=\frac{\frac{(2N+2)!}{2^{2N+2}((N+1)!)^2}}{\frac{(2N)!}{2^{2N}(N!)^2}}=\frac{(2N+2)(2N+1)}{4(N+1)^2}=\frac{2N+1}{2N+2} < 1
\end{equation}</p>
<p>Now, with (9) we have what is necessary to show that:</p>
<p>\begin{equation}
\lim_{n \to \infty} u_N = 0
\end{equation}</p>
<h2 id="analysis-of-the-limit-limlimits_n-to-infty-u_n">Analysis of the limit <script type="math/tex">\lim\limits_{N \to \infty} u_N</script>:</h2>
<p>Using (9) we may derive a recursive definition of <script type="math/tex">u_N</script>:</p>
<p>\begin{equation}
u_{N+1}=\frac{2N+1}{2N+2} \cdot u_N
\end{equation}</p>
<p>and given that <script type="math/tex">u_0=1</script> we have:</p>
<p>\begin{equation}
u_{N}=\prod_{n=0}^{N-1} \frac{2n+1}{2n+2}= \frac{1}{3} \cdot \frac{3}{4} \cdot \frac{5}{6} \cdot …
\end{equation}</p>
<p>At this point we can make the useful observation:</p>
<p>\begin{equation}
\lim_{N \to \infty} u_N = 0 \implies \lim_{N \to \infty} - \ln u_N = \infty
\end{equation}</p>
<h2 id="proof-that-limlimits_n-to-infty-u_n0">Proof that <script type="math/tex">\lim\limits_{N \to \infty} u_N=0</script>:</h2>
<p>By combining (12) and (13) we find that:</p>
<p>\begin{equation}
-\ln u_N = -\ln \prod_{n=0}^{N-1} \frac{2n+1}{2n+2}= \sum_{n=0}^{N-1} \ln \frac{2n+2}{2n+1}= \sum_{n=0}^{N-1} \ln \big(1+\frac{1}{2n+1}\big)
\end{equation}</p>
<p>We note that when <script type="math/tex">n\in \mathbb{N}</script> is large:</p>
<p>\begin{equation}
\ln \big(1+\frac{1}{n}\big) \approx \frac{1}{n}
\end{equation}</p>
<p>Now, from (15) it follows that:</p>
<p>\begin{equation}
\sum_{n=1}^\infty \frac{1}{2n+1} = \infty \implies \sum_{n=0}^{\infty} \ln \big(1+\frac{1}{2n+1}\big) = \infty
\end{equation}</p>
<p>Using (15) we may conclude that (10) is indeed true. In some sense, when <script type="math/tex">n</script> is large we can expect to observe the expected value
with vanishing probability.</p>
<h2 id="discussion">Discussion:</h2>
<p>A natural question that follows is whether the above method may be used to handle other cases. Let’s consider <script type="math/tex">a_i \sim \mathcal{U}(\{-1,0,1\})</script> where:</p>
<p>\begin{equation}
\forall n \in \mathbb{N}, P(a_n=1)=P(a_n=0)=P(a_n=-1)=\frac{1}{3}
\end{equation}</p>
<p>so we may define:</p>
<p>\begin{equation}
u_N = P(\sum_{n=1}^{4N} a_n = 0)= \frac{(4N)!}{3^{4N}} \sum_{k=1}^N \frac{1}{(2k)!^2 (4N-4k)!}
\end{equation}</p>
<p>I actually tried to analyse the combinatorics of this sequence but quickly realised that even if I managed to show that this sequence converged to zero,
it wasn’t clear how this method would manage to handle the most general setting, the case of all integer dimensions <script type="math/tex">N \in \mathbb{N}</script>, and it didn’t
appear to be very effective in terms of the number of calculations per case.</p>
<p>In order to make progress, I decided to model this problem from a <a href="https://keplerlounge.com/probability/2019/04/21/high-dimension-prob-2.html">different perspective</a>.</p>Aidan RockeMotivation:Kinematics of a random walk on the special linear group2019-04-02T00:00:00+00:002019-04-02T00:00:00+00:00/mathematics/2019/04/02/kinematics-special-linear-group<center><img src="https://raw.githubusercontent.com/Kepler-Lounge/blog_images/master/_images/sequence_1.jpg" width="50%" height="50%" align="middle" /></center>
<center>A stochastic sequence with symmetry hiding in plain sight</center>
<h2 id="introduction">Introduction:</h2>
<p>A few days ago I decided to analyse the symmetries of the two-thirds power law [1] and this analysis naturally led to the following kinematic sequence:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{equation}\begin{bmatrix}\ddot{x}_{n+1}\\\dot{x}_{n+1}\end{bmatrix} = M_n \cdot \begin{bmatrix}\ddot{x}_{n}\\\dot{x}_{n}\end{bmatrix}= \begin{bmatrix}a & b\\c & d\end{bmatrix} \cdot \begin{bmatrix}\ddot{x}_{n}\\\dot{x}_{n}\end{bmatrix} = \begin{bmatrix}a\ddot{x}_n + b\dot{x}_n\\c\ddot{x}_n + d\dot{x}_n\end{bmatrix}\end{equation} %]]></script>
<script type="math/tex; mode=display">% <![CDATA[
\begin{equation}\begin{bmatrix}\ddot{y}_{n+1}\\\dot{y}_{n+1}\end{bmatrix} = M_n \cdot \begin{bmatrix}\ddot{y}_{n}\\\dot{y}_{n}\end{bmatrix}= \begin{bmatrix}a & b\\c & d\end{bmatrix} \cdot \begin{bmatrix}\ddot{y}_{n}\\\dot{y}_{n}\end{bmatrix} = \begin{bmatrix}a\ddot{y}_n + b\dot{y}_n\\c\ddot{y}_n + d\dot{y}_n\end{bmatrix}\end{equation} %]]></script>
<p>where <script type="math/tex">M_n \in SL(2, \mathbb{R})</script> is a volume-preserving transformation and the position is updated using:</p>
<p>\begin{equation}
x_{n+1} = x_n + \dot{x}_n\cdot \Delta t + \frac{1}{2} \ddot{x}_n \cdot \Delta t^2
\end{equation}</p>
<p>Now, in order to make sure that <script type="math/tex">ad-bc=1</script> I decided to use the trigonometric identity:</p>
<p>\begin{equation}
cos^2(\theta) + sin^2(\theta) = 1
\end{equation}</p>
<p>so we only have to sample three random numbers <script type="math/tex">\alpha, \beta,\theta \in \mathbb{R}</script> so we have:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{equation}
M_n = \begin{bmatrix}\frac{cos(2 \pi \theta)}{\alpha} & \beta \cdot sin(2 \pi \theta) \\ \frac{-sin(2 \pi \theta)}{\beta}& \alpha \cdot cos(2 \pi \theta)\end{bmatrix}
\end{equation} %]]></script>
<p>For the rest of the discussion we shall assume that <script type="math/tex">\alpha,\beta \sim (-1)^{\operatorname{Bern}(0.5)} \cdot U(0.1,10)</script> and <script type="math/tex">\theta \sim U(0,1)</script> .</p>
<p>Now, the key question I have is whether:</p>
<p>\begin{equation}
\mathbb{E}\big[\frac{\Delta y_n}{\Delta x_n}\big] = \text{Cst}
\end{equation}</p>
<p>i.e. whether the expected value of the rate of change is constant.</p>
<h2 id="using-a-symmetry-to-simplify-calculations">Using a symmetry to simplify calculations:</h2>
<h3 id="a-tale-of-two-branching-processes">A tale of two branching processes:</h3>
<p>The following diagram, derived from the first figure, is a particularly useful method for visualising the trajectory of our stochastic sequence:</p>
<center><img src="https://raw.githubusercontent.com/Kepler-Lounge/blog_images/master/_images/sequence_2.jpg" width="75%" height="75%" align="middle" /></center>
<center>A tale of two branching processes</center>
<p>If we use <script type="math/tex">\Sigma_{1}^n</script> and <script type="math/tex">\Sigma_{2}^n</script> to denote random variables associated
with the first and second kinds of branching processes, we may simplify (1) so we have:</p>
<p>\begin{equation}
\ddot{x_n} = \ddot{x_0} \cdot c_0 \cdot \Sigma_{2}^n + \dot{x_0} \cdot d_0 \cdot \Sigma_{2}^n = q_1 \Sigma_{2}^n
\end{equation}</p>
<p>\begin{equation}
\dot{x_n} = \ddot{x_0} \cdot a_0 \cdot \Sigma_{1}^n + \dot{x_0} \cdot b_0 \cdot \Sigma_{1}^n = q_2 \Sigma_{1}^n
\end{equation}</p>
<p>Similarly, we find that for <script type="math/tex">\ddot{y}_n</script> and <script type="math/tex">\dot{y}_n</script> we have:</p>
<p>\begin{equation}
\ddot{y_n} = \ddot{y_0} \cdot c_0 \cdot \Sigma_{2}^n + \dot{y_0} \cdot d_0 \cdot \Sigma_{2}^n = q_3 \Sigma_{2}^n
\end{equation}</p>
<p>\begin{equation}
\dot{y_n} = \ddot{y_0} \cdot a_0 \cdot \Sigma_{1}^n + \dot{y_0} \cdot b_0 \cdot \Sigma_{1}^n = q_4 \Sigma_{1}^n
\end{equation}</p>
<h3 id="analysis-of-the-rate-of-change">Analysis of the rate of change:</h3>
<p>Given equation (3) we may deduce that:</p>
<p>\begin{equation}
\frac{\Delta y_n}{\Delta x_n} = \frac{y_{n+1}-y_n}{x_{n+1}-x_n} = \frac{\dot{y_n} \Delta t + \frac{1}{2} \ddot{y_n} \Delta t^2}{\dot{x_n} \Delta t + \frac{1}{2} \ddot{x_n} \Delta t^2} = \frac{\dot{y_n} + h\ddot{y_n}}{\dot{x_n} + h \ddot{x_n}}
\end{equation}</p>
<p>where <script type="math/tex">h = \frac{\Delta t}{2}</script>.</p>
<p>Now, using equations (7), (8), (9) and (10) we find that:</p>
<p>\begin{equation}
\frac{\Delta y_n}{\Delta x_n} = \frac{\dot{y_n} + h\ddot{y_n}}{\dot{x_n} + h \ddot{x_n}} = \frac{q_4 \Sigma_{1}^n + h \cdot q_3 \Sigma_{2}^n}{q_2 \Sigma_{1}^n + h \cdot q_1 \Sigma_{2}^n}
\end{equation}</p>
<h2 id="an-experimental-observation">An experimental observation:</h2>
<h3 id="expected-values-of-sigma_1n-and-sigma_2n">Expected values of <script type="math/tex">\Sigma_{1}^n</script> and <script type="math/tex">\Sigma_{2}^n</script>:</h3>
<p>It’s useful to note that given that the matries <script type="math/tex">M_n</script> are independent and:</p>
<p>\begin{equation}
\forall n \in \mathbb{N}, \mathbb{E}[M_n] = 0
\end{equation}</p>
<p>we may deduce that:</p>
<p>\begin{equation}
\mathbb{E}[\Sigma_{1}^n] =\mathbb{E}[\Sigma_{2}^n]= 0
\end{equation}</p>
<h3 id="numerical-experiments-with-fracdelta-y_ndelta-x_n">Numerical experiments with <script type="math/tex">\frac{\Delta y_n}{\Delta x_n}</script>:</h3>
<p>My intuition told me from the beginning that (12) might be useful for analysing the expected value of <script type="math/tex">\frac{\Delta y_n}{\Delta x_n}</script>. In fact,
numerical experiments suggest:</p>
<p>\begin{equation}
\frac{\Delta y_n}{\Delta x_n} \approx \frac{q_4}{q_2}
\end{equation}</p>
<p>To be precise, <a href="https://gist.github.com/AidanRocke/33c1d5268d8f8c3b395cc81ba6397f47">numerical experiments</a> show that 100% of the time the sign of <script type="math/tex">\frac{q_4}{q_2}</script> is in agreement with the sign of <script type="math/tex">\frac{\Delta y_n}{\Delta x_n}</script> and more than 70% of the time these two numbers disagree with each other by less than a factor of 1.5 i.e. a 30% difference.</p>
<h2 id="analysis">Analysis:</h2>
<p>If we take the limit as <script type="math/tex">h \rightarrow 0</script>:</p>
<p>\begin{equation}
\lim_{h \to 0} \frac{\Delta y_n}{\Delta x_n} = \lim_{h \to 0} \frac{q_4 \Sigma_1^n+h \cdot q_3 \cdot \Sigma_2^n}{q_2 \Sigma_1^n+h \cdot q_1 \cdot \Sigma_2^n} = \frac{q_4}{q_2}
\end{equation}</p>
<p>so it appears that what I observed numerically depends on <script type="math/tex">h</script> and it’s still not clear to me how to calculate <script type="math/tex">\mathbb{E}\big[\frac{\Delta y_n}{\Delta x_n}\big]</script> directly, which was my original question.</p>
<h2 id="conjecture">Conjecture:</h2>
<p>While I’m still looking for a closed form expression for <script type="math/tex">\mathbb{E}\big[\frac{\Delta y_n}{\Delta x_n}\big]</script> my previous analysis leads me to conclude that, for any random matrices <script type="math/tex">M_i</script> sampled i.i.d., as <script type="math/tex">h \rightarrow 0</script>:</p>
<p>\begin{equation}
\lim_{h \to 0} \frac{\Delta y_n}{\Delta x_n} = \frac{q_4}{q_2}
\end{equation}</p>
<p>which is a general result I didn’t expect in advance.</p>
<p>Now, given that there is strong numerical evidence for (6) regardless of the magnitude of <script type="math/tex">\Delta t</script>, I wonder whether we can show:</p>
<p>\begin{equation}
\lim_{h \to 0} \frac{\Delta y_n}{\Delta x_n} = \mathbb{E}\big[\frac{\Delta y_n}{\Delta x_n} \big]
\end{equation}</p>
<h1 id="references">References:</h1>
<ol>
<li>D. Huh & T. Sejnowski. Spectrum of power laws for curved hand movements. 2015.</li>
</ol>Aidan RockeA stochastic sequence with symmetry hiding in plain sightThe true cost of AlphaGo Zero2019-03-24T00:00:00+00:002019-03-24T00:00:00+00:00/artificial/intelligence/2019/03/24/alpha-go-zero<h1 id="motivation">Motivation:</h1>
<p>Rich Sutton’s <a href="http://www.incompleteideas.net/IncIdeas/BitterLesson.html">Bitter Lesson</a> for AI essentially argues that we
should focus on meta-methods that scale well with compute instead of trying to understand the structure and function of biological minds.
The latter according to him are <em>endlessly complex</em> and therefore unlikely to scale. Furthermore, Rich Sutton(who works at Deep Mind) considers Deep Mind’s work on AlphaGo
an exemplary model of AI research.</p>
<p>After reading <a href="https://twitter.com/shimon8282/status/1106534178676506624">Shimon Whiteson’s detailed rebuttal</a> as well as <a href="https://twitter.com/SussilloDavid/status/1106643708626137089">David Sussilo’s reflection on loss functions</a> I think it’s time to re-evaluate
the scientific value of AlphaGo Zero research and the carbon footprint of a win-at-any-cost research culture. In particular, I’d like to address the following questions:</p>
<ol>
<li>What kinds of real problems can be solved with AlphaGo Zero algorithms?</li>
<li>What is the true cost of AlphaGo Zero? (i.e. the carbon footprint)</li>
<li>Does Google’s carbon offsetting scheme accomplish more than virtue signalling?</li>
<li>Should AI researchers be noting their carbon footprint in their publications?</li>
<li>Finally, might energetic constraints present an opportunity for better AI research?</li>
</ol>
<p>I haven’t seen these questions addressed in one manuscript but I believe they are related and timely, hence this article. Now, I’d like to add that we can’t seriously
entertain notions of <em>safe</em> AI without carefully developing <em>environmentally friendly</em> AI especially when the only thing that we know for certain is that we will do exponentially more FLOPs(i.e. computations) in the future.</p>
<p><strong>Note:</strong> This post builds on the analyses of <a href="https://medium.com/@karpathy/alphago-in-context-c47718cb95a5">Andrej Karpathy</a> and <a href="https://www.yuzeh.com/data/agz-cost.html">Dan Huang</a>.</p>
<h2 id="alphago-zeros-contribution-to-humanity">AlphaGo Zero’s contribution to humanity:</h2>
<center><img src="https://raw.githubusercontent.com/Kepler-Lounge/blog_images/master/_images/netflix_alphago.jpeg" width="75%" height="75%" align="middle" /></center>
<center>Deep Mind's AlphaGo movie on Netflix</center>
<p>Before measuring the carbon footprint of AlphaGo Zero it’s a good idea to remind ourselves of the types of environments this ‘meta-method’ can handle:</p>
<ol>
<li>The environments must have <em>deterministic</em> dynamics which simplifies planning considerably.</li>
<li>The environment must be <em>fully-observable</em> which rules out large and complex environments.</li>
<li>A <em>perfect simulator</em> must be available to the agent which rules out any biologically-plausible environments.</li>
<li>Evaluation is <em>simple</em> and <em>objective</em>: win/lose. For biological organisms all rewards are <em>internal</em> and <em>subjective</em>.</li>
<li>Static state-spaces and action-spaces: so we can’t generalise…not even to <script type="math/tex">N \times N</script> Go where <script type="math/tex">N \neq 19</script>.</li>
</ol>
<p>These constraints effectively rule out the application of AlphaGo Zero’s algorithms to any practical problem in robotics because perfect simulators are non-existent. But, it may be used to solve any two-person board game which is a historic achievement and a great publicity stunt for Google assuming that the carbon footprint of this
project is reasonable. This consideration is doubly important when you take into account the influence of Deep Mind on the modern AI research culture.</p>
<p>However, before estimating the metric tonnes of CO2 blasted into the atmosphere by Deep Mind let’s consider a related question. How much would it cost an entity outside of Google to replicate this type of research?</p>
<h2 id="the-cost-of-alphago-zero-in-us-dollars">The cost of AlphaGo Zero in US dollars:</h2>
<p>In ‘Mastering the game of Go without human knowledge’ [2] they had both a three day experiment as well as
a forty day experiment. Let’s start with the three day experiment.</p>
<h3 id="the-three-day-experiment">The three day experiment:</h3>
<ol>
<li>Over 72 hours, ~ 5 million games were played.</li>
<li>Each move of self-play used ~ 0.4 seconds of computer time and each self-play machine consisted of 4 TPUs.</li>
<li>How many self-play machines <script type="math/tex">N_{SP}</script> were used?</li>
<li>
<p>If the average game has 200 moves we have:</p>
<p>\begin{equation}
\frac{72\cdot 60 \cdot 60 \cdot N_{SP}}{200 \cdot 5 \cdot 10^6} \approx 0.4 \implies N_{SP} \approx 1500
\end{equation}</p>
</li>
<li>
<p>Given that each self-play machine contained 4 TPUs we have:</p>
<p>\begin{equation}
N_{TPU} = 4 \cdot N_{SP} \approx 6000
\end{equation}</p>
</li>
</ol>
<p>If we use the <a href="https://cloud.google.com/tpu/docs/pricing">Google’s TPU pricing</a> as of March 2019, the cost for an organisation outside of
Google to replicate this experiment is therefore:</p>
<p>\begin{equation}
\text{Cost} > 6000 \cdot 72 \cdot 4.5 \approx 2 \cdot 10^6 \quad \text{US dollars}
\end{equation}</p>
<h3 id="the-forty-day-experiment">The forty day experiment:</h3>
<p>For the forty day experiment one thing that’s different is that the policy network has twice as many layers so,
as Dan Huang pointed out in <a href="https://www.yuzeh.com/data/agz-cost.html">his article</a>, it’s reasonable to infer
that twice the amount of time was used per move. So ~ 0.8 seconds rather than ~ 0.4 seconds.</p>
<ol>
<li>Over 40 days, ~ 29 million games were played.</li>
<li>Each move of self-play used ~ 0.8 seconds of computer time where each self-play machine consisted of 4 TPUs.</li>
<li>How many self-play machines <script type="math/tex">N_{SP}</script> were used?</li>
<li>
<p>If the average game has 200 moves we have:</p>
<p>\begin{equation}
\frac{40 \cdot 24 \cdot 3600 \cdot N_{SP}}{200 \cdot 29 \cdot 10^6} \approx 0.8 \implies N_{SP} \approx 1300
\end{equation}</p>
</li>
<li>
<p>Given that each self-play machine contained 4 TPUs we have:</p>
<p>\begin{equation}
N_{TPU} = 4 \cdot N_{SP} \approx 5000
\end{equation}</p>
</li>
</ol>
<p>If we use the <a href="https://cloud.google.com/tpu/docs/pricing">Google’s TPU pricing</a> as of March 2019, the cost for an organisation outside of
Google to replicate this experiment is therefore:</p>
<p>\begin{equation}
\text{Cost} > 5000 \cdot 960 \cdot 4.5 \approx 2 \cdot 10^7 \quad \text{US dollars}
\end{equation}</p>
<p>It goes without saying that this is well outside the budget of any AI lab in academia.</p>
<h2 id="the-true-cost-of-googles-40-day-experiment">The true cost of Google’s 40 day experiment:</h2>
<center><img src="https://raw.githubusercontent.com/Kepler-Lounge/blog_images/master/_images/climate_cartoon.jpeg" width="75%" height="75%" align="middle" /></center>
<center>Climate agreements since 1990: progress or the illusion thereof</center>
<p>This in my opinion is the more important calculation. While it’s not at all clear that AI research will ‘save the world’ in the long term,
in the short term what is certain is that compute-intensive AI experiments have a non-trivial carbon footprint. So I think it would be wise
to use our energy budget carefully and, realistically, the only way to do this is to calculate the carbon footprint of any AI research project and place it on
the front page of your research paper. Meanwhile, let’s proceed with the calculation.</p>
<p>The nature of this calculation involves first converting TPU hours into KiloWatt Hours(KWH) and then converting this value to metric tonnes of CO2:</p>
<ol>
<li>~5000 TPUs were used for 960 hours.</li>
<li>~40 Watts per TPU according to [6].</li>
<li>
<p>This means that we have:</p>
<p>\begin{equation}
\text{KWH} = 5000 \cdot 960 \cdot 40 \approx 1.9 \cdot 10^5
\end{equation}</p>
</li>
<li>
<p>This is approximately 23 American homes’ electricity for a year according to the <a href="https://www.epa.gov/energy/greenhouse-gas-equivalencies-calculator">EPA</a>.</p>
</li>
<li>In the USA, where Google Cloud TPUs are located, we have ~ ,5 kg of CO2/KWH so AlphaGo Zero was responsible for approximately 96 tonnes of CO2 into the atmosphere.</li>
</ol>
<p>To appreciate the significance of 96 tonnes of CO2 over 40 days…this is approximately equivalent to 1000 hours of air travel and also approximately the carbon footprint of
23 American homes for a <em>year</em>. Relatively speaking, this is a large footprint for a board game ‘experiment’ that lasts 40 days.</p>
<p>Is this reasonable? At this point a Googler might start talking to me about Google’s carbon offsetting scheme.</p>
<h2 id="googles-carbon-offsetting-scheme">Google’s carbon offsetting scheme:</h2>
<center><img src="https://raw.githubusercontent.com/Kepler-Lounge/blog_images/master/_images/charts.png" width="100%" height="100%" align="middle" /></center>
<center>Google's carbon offsetting strategy in 2018: ~92% wind and ~8% solar</center>
<p>I don’t have much time for this section because Google’s carbon offsetting scheme is basically a joke but let’s break it down anyway:</p>
<ol>
<li>
<p><a href="https://cloud.google.com/renewable-energy/">According to Google</a>, the Google Cloud is supposedly 100% sustainable because Google purchases an equal amount of renewable energy for the total amount of energy used by their Cloud infrastructure.</p>
</li>
<li>
<p>If you check <a href="https://www.blog.google/outreach-initiatives/environment/meeting-our-match-buying-100-percent-renewable-energy/">the charts of Urs Hölze</a>, the Senior VP of technical infrastructure at Google, this means that they buy a lot of wind(~ 92%) and some solar(~ 8%).</p>
</li>
<li>
<p>Let’s suppose we can take these points at face value. Does this carbon offsetting scheme actually work out?</p>
</li>
</ol>
<p>David J.C. Mackay, a giant of 20th century machine learning, would probably be rolling in his grave right now because he spent the last part of his life carefully assessing
the potential contribution of wind and solar to humanity’s energy budget [7]. He was in fact <em>Scientific Advisor to the Department of Energy and Climate Change</em>
and his essential contribution was to explain how the fundamental limits to wind and solar energy technologies weren’t technological; we are talking about hard physical limits. I will refer the reader to ‘Sustainable Energy-without the hot air’ by David J.C. Mackay which is <a href="https://www.withouthotair.com/">freely available online</a> rather
than repeat his thorough calculations here.</p>
<p>Unfortunately, no combination of wind and solar energy can provide energy security for a country with the USA’s energy requirements. In the best case scenario, Google’s carbon offsetting scheme is thinly veiled virtue signalling. What then are the serious clean energy solutions?</p>
<p>Past the year 2050 it’s possible to make a strong case for nuclear fusion as being necessary for human civilisation to
continue. Between now and the day we figure out how to engineer reliable nuclear fusion reactors we should use our energy budget wisely.</p>
<h2 id="boltzmanns-razor">Boltzmann’s razor:</h2>
<center><img src="https://raw.githubusercontent.com/Kepler-Lounge/blog_images/master/_images/boltzmann.jpg" width="75%" height="75%" align="middle" /></center>
<center>Boltzmann's theory complements Darwin's theory in many ways</center>
<p>According to various sources the human brain uses ~20 Watts which is incredibly efficient compared to the 200 KiloWatts used by 5000 TPUs. In other words, AlphaGo Zero was ten thousand times less energy efficient than a human being for a comparable result. I don’t see how this is a strong argument for <em>scalability</em> at all.</p>
<p>The human brain isn’t an outlier. All biological organisms are energy efficient because they must first survive the second law of thermodynamics which
is a minimum energy principle. Now, there are two ways organisms perform computations in an economical manner that I am aware of:</p>
<ol>
<li>
<p>Morphological computation:</p>
<p>a. If you check the work of Tad McGeer [8] you will realise that it’s possible to build a walking robot without any electronics that simply exploits the laws of classical mechanics. It does computations by virtue of having a body. Some researchers might say that this is an instance of <em>embodied cognition</em> [12].</p>
<p>b. Romain Brette and his collaborators have been working on a <a href="http://romainbrette.fr/neuroscience-of-a-swimming-neuron/">project that involves a <em>swimming neuron</em></a>. This
is an organism, the Paramecium, that has a single cell yet it’s capable of navigation, hunting, and procreation in very complex environments. How does the Paramecium do this? What is the reward function? Is it doing reinforcement learning?</p>
</li>
<li>
<p>The role of development:</p>
<p>a. If you consider any growing organism you will realise that its <em>state space</em> and <em>action space</em> are rapidly changing. This should make learning very hard. Yet, development is in some sense a form of curriculum learning and makes learning simpler.</p>
<p>b. I must add that during development the brain of the organism is rapidly changing. Shouldn’t this make learning impossible?</p>
</li>
</ol>
<p>Morphospaces and developmental trajectories are fundamentally physical considerations. In some fundamental way organisms succeed in reorganizing physics locally. Termites in the desert construct mounds whose physical behavior is consistent with but not reducible to the physics of sand. Birds build nests whose physics isn’t reducible to its constituent parts. The resulting systems do <em>computations</em> in an economical manner by taking <em>thermodynamics</em> into
account.</p>
<p>This is why energy efficiency is both a challenge and opportunity. It will force researchers to recognize the importance of understanding the biophysics of organisms at every scale where such biophysics contributes to survival. If I may distill this into a single principle I would call it <em>Boltzmann’s razor</em>:</p>
<p><em>Given two comparably effective intelligent systems focus on the research and development of those systems which consume less energy.</em></p>
<p>Naturally, the more economical system would be capable of accomplishing more tasks given the same amount of energy.</p>
<h2 id="discussion">Discussion:</h2>
<p>Of the AI researchers I have discussed the above issues with I noted a bimodal distribution. Roughly 30% agreed with me and roughly 70% pushed back really
hard. Among the counter-arguments of the second group I remember the following:</p>
<ol>
<li>If you force AI researchers to reduce their carbon footprint you will <em>kill</em> AI research.</li>
<li>Why do you care about what Google does? It’s their own money and they can do whatever they want with it.</li>
<li>You’re not a real AI researcher anyway. Why do you care about things outside your field?</li>
</ol>
<p>I think these are all terrible arguments. Regarding the ad hominem, like many masters students I’m 1.5 years away from starting a PhD. I have already met a potential PhD supervisor that I have been in touch with since 2017. I will add that last year I worked as a consultant on an object detection project where I engineered a state-of-the-art object detection system inspired by Polygon RNN for a Central European computer vision company using only one NVIDIA GTX 1080 Ti [10]. Part of this system is <a href="https://github.com/AidanRocke/vertex_prediction">on Github</a>.</p>
<p>So not only do I know what I’m talking about but I have experience building reliable systems in a resourceful manner. In fact, resourcefulness is a direct implication of <em>Boltzmann’s razor</em>.</p>
<h1 id="references">References:</h1>
<ol>
<li>D. Sutton. The Bitter Lesson. 2019.</li>
<li>D. Silver et al. Mastering the game of Go without human knowledge. 2017.</li>
<li>A. Karpathy. AlphaGo, in context. 2017.</li>
<li>D. Huang. How much did AlphaGo Zero cost? 2018.</li>
<li>The Twitter Thread of Shimon Whiteson: https://twitter.com/shimon8282/status/1106534178676506624</li>
<li>This Tweet by David Sussillo: https://twitter.com/SussilloDavid/status/1106643708626137089</li>
<li>Google Inc. In-Datacenter Performance Analysis of a Tensor Processing Unit. 2017.</li>
<li>D. McKay. Sustainable Energy-without the hot air. 2008.</li>
<li>T. McGeer. Passive Dynamic Walking. 1990.</li>
<li>L. Castrejon et al. Annotating Object Instances with a Polygon-RNN. 2017.</li>
<li>D. B. Chklovskii & C. F. Stevens. Wiring optimization in the brain. 2000.</li>
<li>G. Montufar et al. A Theory of Cheap Control in Embodied Systems. 2014.</li>
</ol>Aidan RockeMotivation:Understanding the two-thirds power law2019-03-23T00:00:00+00:002019-03-23T00:00:00+00:00/biomechanics/2019/03/23/two-thirds-law<center><img src="https://raw.githubusercontent.com/Kepler-Lounge/blog_images/master/_images/galoisnotes.jpg" width="75%" height="75%" align="middle" /></center>
<h2 id="introduction">Introduction:</h2>
<p>If you consider the above scribbles of Évariste Galois, who developed Galois theory, you will note that some of the scribbles appear random. Yet, upon closer inspection none of the scribbles are completely random. Many of the scribbles are rather smooth which would be improbable if the trajectories were generated
by some kind of Brownian-type motion.</p>
<p>This isn’t really surprising if you consider the biomechanical constraints on handwritten text. In fact, some scientists have attempted to distill this observation into
a physical law known as the two-thirds power law which I analyse here. Briefly speaking, here’s a breakdown of my analysis:</p>
<ol>
<li>I provide a mathematical description of the law and describe how it may be used as a discriminative model.</li>
<li>We may also use this equation as a generative model if we consider symmetries of the equation. <a href="https://gist.github.com/AidanRocke/33c1d5268d8f8c3b395cc81ba6397f47">Here is the code.</a></li>
<li>The limitations of the ‘law’ are considered and arguments are given to shift focus on plausible generative models.</li>
</ol>
<p>In spite of its limitations I think that the <script type="math/tex">2/3</script> power law is a very good starting point for understanding biomechanical constraints
on realistic drawing tasks.</p>
<h2 id="description-of-the-law">Description of the law:</h2>
<h3 id="brief-description">Brief description:</h3>
<p>The <script type="math/tex">2/3</script> power law for the motion of the endpoint of the human upper-limb during drawing motion may be formulated as follows:</p>
<p>\begin{equation}
v(t) = K \cdot k(t)^\beta
\end{equation}</p>
<p>where <script type="math/tex">k(t)</script> is the instantaneous curvature of the path and the <script type="math/tex">2/3</script> law is satisfied when <script type="math/tex">\beta \approx -\frac{1}{3}</script>. By taking logarithms
of both sides of the equation we have:</p>
<p>\begin{equation}
\ln v(t) = K - \frac{1}{3} \ln k(t)
\end{equation}</p>
<h3 id="frenet-serret-formulas">Frenet-Serret formulas:</h3>
<p>To clarify what we mean by instantaneous curvature <script type="math/tex">k(t)</script> in (2) it’s necessary to use a moving reference frame, aka Frenet-Serret frame, where
in two dimensions our reference frame is described by the unit vector tangent to the curve and a unit vector normal to the curve.</p>
<p>With this moving frame we may define the curvature of regular curves(i.e. curves whose derivatives never vanish) parametrized by time as follows:</p>
<p>\begin{equation}
k(t) = \frac{\lvert \ddot{x}\dot{y} - \ddot{y}\dot{x} \rvert}{(\dot{x}^2 + \dot{y}^2)^{3/2}} = \frac{\lvert \ddot{x}\dot{y} - \ddot{y}\dot{x} \rvert}{v^3(t)}
\end{equation}</p>
<p>Now, if we denote:</p>
<p>\begin{equation}
\alpha(t) = \lvert \ddot{x}\dot{y} - \ddot{y}\dot{x} \rvert
\end{equation}</p>
<p>we have:</p>
<p>\begin{equation}
\ln v(t) = \frac{1}{3} \ln \alpha(t) - \frac{1}{3} \ln k(t)
\end{equation}</p>
<p>and we note that our law is satisfied when <script type="math/tex">\forall t, \alpha(t)=K</script>. Given that this is a linear equation we may use this equation as a discriminative model by
performing a linear regression analysis on drawing data.</p>
<h3 id="parallelograms">Parallelograms:</h3>
<p>If we focus on <script type="math/tex">(4)</script> we may note that this value corresponds to the determinant of a particular matrix:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{equation}
H = \begin{bmatrix}\ddot{x} & \ddot{y}\\\dot{x} & \dot{y}\end{bmatrix}
\end{equation} %]]></script>
<p>Furthermore, we may note that this determinant may be identified with the area <script type="math/tex">K</script> of a parallegram with the following vertices:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{equation}
\begin{split}
O & = (0,0) \\
A & = (\ddot{x}, \dot{x})\\
B & = (\ddot{y}, \dot{y}) \\
C & = A + B
\end{split}
\end{equation} %]]></script>
<p>This formulation is useful as invariants of <script type="math/tex">\lvert \ddot{x}\dot{y} - \ddot{y}\dot{x} \rvert=K</script> now correspond to volume-preserving transformations applied to the above parallelogram.</p>
<h2 id="generative-modelling-via-invariants">Generative modelling via Invariants:</h2>
<h3 id="invariance-via-volume-preserving-transforms">Invariance via volume-preserving transforms:</h3>
<p>Let’s first note that if we always have:</p>
<p>\begin{equation}
\lvert \ddot{x}\dot{y} - \ddot{y}\dot{x} \rvert=K
\end{equation}</p>
<p>for some <script type="math/tex">K \in \mathbb{R}</script> then we must have:</p>
<p>\begin{equation}
\lvert \ddot{x}(0)\dot{y}(0) - \ddot{y}(0)\dot{x}(0) \rvert=K
\end{equation}</p>
<p>Now, given that</p>
<p>\begin{equation}
\mathcal{M} = \{ M \in \mathbb{R}^{2 \times 2}: det(M)=1 \}
\end{equation}</p>
<p>are volume-preserving transformations, we may use <script type="math/tex">M \in \mathcal{M}</script> to simulate arbitrary trajectories that satisfy <script type="math/tex">(2)</script>. We may
think of this as the Jacobian of a linear, hence differentiable, transformation.</p>
<h3 id="computer-simulation">Computer simulation:</h3>
<p>In order to simulate these trajectories, we note that:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{equation}\begin{bmatrix}\ddot{x}_{n+1}\\\dot{x}_{n+1}\end{bmatrix} = M \cdot \begin{bmatrix}\ddot{x}_{n}\\\dot{x}_{n}\end{bmatrix}= \begin{bmatrix}a & b\\c & d\end{bmatrix} \cdot \begin{bmatrix}\ddot{x}_{n}\\\dot{x}_{n}\end{bmatrix} = \begin{bmatrix}a\ddot{x}_n + b\dot{x}_n\\c\ddot{x}_n + d\dot{x}_n\end{bmatrix}\end{equation} %]]></script>
<p>where the position is updated using:</p>
<p>\begin{equation}
x_{n+1} = x_n + \dot{x}_n\cdot \Delta t + \frac{1}{2} \ddot{x}_n \cdot \Delta t^2
\end{equation}</p>
<p>and in order to make sure that <script type="math/tex">ad-bc=1</script> we may use the trigonometric identity:</p>
<p>\begin{equation}
cos^2(\theta) + sin^2(\theta) = 1
\end{equation}</p>
<p>so we have:</p>
<p>\begin{equation}
ad = cos^2(\theta)
\end{equation}</p>
<p>\begin{equation}
bc = -sin^2(\theta)
\end{equation}</p>
<p>and as a result we have a generative variant of the 2/3 power law. Ok, but are these ‘scribbles’ ecologically plausible? I don’t think so, which is why I call <a href="https://gist.github.com/AidanRocke/33c1d5268d8f8c3b395cc81ba6397f47">the main Julia function I used to simulate these trajectories ‘crazy paths’</a>.</p>
<h2 id="criticism">Criticism:</h2>
<ol>
<li>The <script type="math/tex">2/3</script> law is a pretty weak discriminative model because as shown by [2] the exponent varies with the viscosity of the drawing medium and as shown by [1] the exponent also depends on the complexity of the shape drawn.</li>
<li>The <script type="math/tex">2/3</script> law is an even weaker generative model as it completely ignores environmental cues. The output ‘scribbles’ aren’t the result of any plausible interaction of an agent with an ecologically realistic environment.</li>
</ol>
<p>This point is even more clear when you consider the underlying minimum-jerk theory that is supposed to justify this ‘law’. A verbatim interpretation of jerk minimisation would
imply that humans should mainly draw straight lines. However, there’s certainly a tradeoff between energy minimisation and the expressiveness of the figure drawn since drawing is an activity that involves <em>communicating</em> a particular message.</p>
<h1 id="references">References:</h1>
<ol>
<li>D. Huh & T. Sejnowski. Spectrum of power laws for curved hand movements. 2015.</li>
<li>M. Zago et al. The speed-curvature power law of movements: a reappraisal. 2017.</li>
<li>U. Maoz et al. Noise and the two-thirds power law. 2006.</li>
<li>M. Richardson & T. Flash. Comparing Smooth Arm Movements with the Two-Thirds Power Law and the Related Segmented-Control Hypothesis. 2002.</li>
</ol>Aidan RockeWhat is the exact value of culture?2019-03-20T00:00:00+00:002019-03-20T00:00:00+00:00/culture/2019/03/20/value-culture<p>As someone who thinks about the origins of intelligence everyday, the importance of culture has grown on me over time. I can’t overstate its importance.</p>
<p>Some people have asked me what’s the exact value of culture and expect this question to stump me. But, that’s an absurd question as without culture it would be impossible to ask such questions. Without culture there’s no language, art, political systems, science or technology. An essential question therefore is what are the minimal conditions
for culture to emerge in a particular species? Can complex cultures develop in animals without the capacity for language?</p>
<p>I wouldn’t say that there can be no intelligent behaviour without culture but I can confidently say-based on empirical evidence and analysis-that without culture
there would be a strict upper-bound on the kinds of intelligent systems that are possible.</p>
<h3 id="note-i-would-normally-have-a-list-of-references-here-but-this-time-i-would-advise-the-curious-reader-to-gather-their-own-data-and-try-various-thought-experiments">Note: I would normally have a list of references here but this time I would advise the curious reader to gather their own data and try various thought experiments.</h3>Aidan RockeAs someone who thinks about the origins of intelligence everyday, the importance of culture has grown on me over time. I can’t overstate its importance.