Jekyll2019-08-21T16:51:13+00:00/feed.xmlKepler Lounge The math journal of Aidan Rocke A simple proof of Euler’s product formula2019-08-21T00:00:00+00:002019-08-21T00:00:00+00:00/number/theory/2019/08/21/euler_1<h2 id="introduction">Introduction:</h2> <p>The Euler product formula states that if <script type="math/tex">\zeta(s)</script> is the Riemann zeta function and <script type="math/tex">p</script> is prime:</p> <p>\begin{equation} \zeta(s) = \sum_{n=1}^\infty \frac{1}{n^s} = \prod_{p} \frac{1}{1-p^{-s}} \end{equation}</p> <p>holds for all <script type="math/tex">s \in \mathbb{C}</script> such that <script type="math/tex">\zeta(s)</script> is absolutely convergent.</p> <h2 id="proof">Proof:</h2> <p>Every positive integer <script type="math/tex">n \in \mathbb{N^*}</script> has a unique prime factorization:</p> <p>\begin{equation} \forall n \in \mathbb{N^*} \exists c_p \in \mathbb{N}, n = \prod_p p^{c_p} \end{equation}</p> <p>where <script type="math/tex">% <![CDATA[ \sum c_p < \infty %]]></script>.</p> <p>Furthermore, we note that:</p> <p>\begin{equation} \prod_p \frac{1}{1-p^{-s}} = \prod_p \big(\sum_{c_p = 0}^\infty p^{-c_p s} \big) \end{equation}</p> <p>due to elementary properties of geometric series.</p> <p>In the formal expansion (2) we note that each term has a unique prime factorization and that every possible prime factorization occurs once. It follows that if <script type="math/tex">\sum n^{-s}</script> converges absolutely we may rearrange the sum however we wish and so:</p> <p>\begin{equation} \zeta(s) = \prod_{p} \frac{1}{1-p^{-s}} \end{equation}</p> <p>provided that the hypotheses on <script type="math/tex">s</script> are satisfied.</p>Aidan RockeIntroduction:Almost all simple graphs are small world networks2019-07-31T00:00:00+00:002019-07-31T00:00:00+00:00/graphs/2019/07/31/small-worlds<h2 id="introduction">Introduction:</h2> <p>Two days ago, while thinking about brain networks, it occurred to me that almost all simple graphs are small world networks in the sense that if <script type="math/tex">G_N</script> is a simple graph with <script type="math/tex">N</script> nodes sampled from the Erdös-Rényi random graph distribution with probability half then when <script type="math/tex">N</script> is large:</p> <p>\begin{equation} \mathbb{E}[d(v_i,v_j)] \leq \log_2 N \end{equation}</p> <p>My strategy for proving this was to show that when <script type="math/tex">N</script> is large, <script type="math/tex">\forall v_i \in G_N</script> there exists a chain of distinct nodes of length <script type="math/tex">\log_2 N</script> originating from <script type="math/tex">v_i</script> almost surely. This implies that:</p> <p>\begin{equation} \forall v_i, v_j \in G_N, d(v_i,v_j) \leq \log_2 N \end{equation}</p> <p>almost surely when <script type="math/tex">N</script> is large.</p> <p>Now, by using the above method of proof I managed to show that almost all simple graphs are <em>very small</em> in the sense that:</p> <p>\begin{equation} \mathbb{E}[d(v_i,v_j)] \leq \log_2\log_2 N \end{equation}</p> <p>when <script type="math/tex">N</script> tends to infinity. We can actually do even better.</p> <p>Using my proof that <a href="https://keplerlounge.com/math/2019/07/02/connected-graphs.html">almost all simple graphs are connected</a>, I can show that almost all simple graphs have diameter 2. However, I think there is more value in going through my original proof which in my opinion provides greater insight into the problem.</p> <h2 id="degrees-of-separation-and-the-neighborhood-of-a-node">Degrees of separation and the neighborhood of a node:</h2> <p>We may think of degrees of separation as a sequence of ‘hops’ between the neighborhoods of distinct nodes <script type="math/tex">v_i</script>. Given a node <script type="math/tex">v_i</script> we may define <script type="math/tex">\mathcal{N}(v_i)</script> as follows:</p> <p>\begin{equation} \mathcal{N}(v_i) = \{v_j \in G_N: \overline{v_i v_j} \in G_N \} \end{equation}</p> <p>where <script type="math/tex">G_N = (V,E)</script> is a graph with <script type="math/tex">N</script> nodes.</p> <p>Now, given the E-R model we can say that <script type="math/tex">v_i \neq v_j</script> implies:</p> <p>\begin{equation} P(v_k \notin \mathcal{N}(v_i) \land v_k \notin \mathcal{N}(v_j)) = P(v_k \notin \mathcal{N}(v_i)) \cdot P(v_k \notin \mathcal{N}(v_j)) = \frac{1}{4} \end{equation}</p> <p>and by induction:</p> <p>\begin{equation} P(v_k \notin \mathcal{N}(v_{1}) \land … \land v_k \notin \mathcal{N}(v_{n})) = \frac{1}{2^n} \end{equation}</p> <p>It follows that if there is a chain of distinct nodes <script type="math/tex">\overline{v_1 ... v_n}</script> we can say that:</p> <p>\begin{equation} P(d(v_1,v_k) \leq n) = 1- \frac{1}{2^n} \end{equation}</p> <h2 id="almost-all-simple-graphs-are-very-small-world-networks">Almost all simple graphs are very small world networks:</h2> <h3 id="a-chain-of-distinct-nodes-v_i_i1log_2log_2-n-exists-almost-surely">A chain of distinct nodes <script type="math/tex">\{v_i\}_{i=1}^{\log_2\log_2 N}</script> exists almost surely:</h3> <p>The probability that there exists a chain of nodes of length <script type="math/tex">\log_2\log_2 N</script>:</p> <p>\begin{equation} \overline{v_1 … v_{\log_2\log_2 N}} \end{equation}</p> <p>such that <script type="math/tex">v_i = v_j \iff i=j</script> is given by:</p> <p>\begin{equation} P(\overline{v_1 … v_{\log_2\log_2 N}} \in G_N) = \prod_{k=1}^{\log_2\log_2 N} \big(1-\frac{1}{2^{N-k}} \big) \geq \big(1- \frac{\log_2 N}{2^N}\big)^{\log_2\log_2 N} \end{equation}</p> <p>and we note that:</p> <p>\begin{equation} \lim\limits_{N \to \infty} \big(1- \frac{\log_2 N}{2^N}\big)^{\log_2\log_2 N} = 1 \end{equation}</p> <p>this guarantees the existence of a chain of distinct nodes of length <script type="math/tex">\log_2 N</script> originating from any <script type="math/tex">v_i \in G_N</script> almost surely.</p> <h3 id="given-that-overlinev_1--v_log_2log_2-n-exists-almost-surely-we-may-deduce-that-forall-i-in-1log_2log_2-n-dv_iv_k-leq-log_2log_2-n-almost-surely">Given that <script type="math/tex">\overline{v_1 ... v_{\log_2\log_2 N}}</script> exists almost surely we may deduce that <script type="math/tex">\forall i \in [1,\log_2\log_2 N], d(v_i,v_k) \leq \log_2\log_2 N</script> almost surely:</h3> <p>If <script type="math/tex">\overline{v_1 ... v_{\log_2\log_2 N}}</script> exists we have:</p> <p>\begin{equation} \forall \{v_i\}_{i=1}^n, v_k \in G_N, P(d(v_1,v_k) \leq \log_2\log_2 N) = 1 - \frac{1}{2^{\log_2\log_2 N}} \end{equation}</p> <p>and so we have:</p> <p>\begin{equation} \lim\limits_{N \to \infty} \forall \{v_i\}_{i=1}^n, v_k \in G_N, P(d(v_1,v_k) \leq \log_2\log_2 N) = 1 \end{equation}</p> <h2 id="discussion">Discussion:</h2> <p>I must say that initially I found the above result quite surprising and I think it partially explains why small world networks frequently occur in nature. Granted, in natural settings the graph is typically embedded in some kind of Euclidean space so in addition to the degrees of separation we must consider the Euclidean distance. But, I suspect that in real-world networks with small world effects the Euclidean distance plays a marginal role.</p> <p>In particular, I believe that wherever small-world networks prevail the Euclidean distance is dominated by ergodic dynamics between nodes. There is probably some kind of stochastic communication process between the nodes.</p>Aidan RockeIntroduction:Fractional Cartesian Products2019-07-08T00:00:00+00:002019-07-08T00:00:00+00:00/set/theory/2019/07/08/fractional_cartesian<h2 id="introduction">Introduction:</h2> <p>Recently, I wondered whether we could define hypercubes with non-integer dimension. It occurred to me that this would require a generalisation of the usual Cartesian Product to fractional dimensions.</p> <p>A few Google searches indicated that previous work ,  has been done on this subject by Ron C. Blei. However, I usually try to develop my own ideas first as this sometimes allows me to develop a perspective that is particularly insightful. For this problem I decided to start by considering hypercube volumes.</p> <h2 id="hypercube-volumes">Hypercube volumes:</h2> <p>If the volume of a regular hypercube with integer dimension is given by:</p> <p>\begin{equation} \forall n \in \mathbb{N}, \text{Vol}([-1,1]^n) = 2^n \end{equation}</p> <p>then I think we may define the volume of hypercubes with non-integer dimension as follows:</p> <p>\begin{equation} \forall x \in \mathbb{R}_+ \setminus \mathbb{N}, \text{Vol}([-1,1]^x) = 2^x \end{equation}</p> <p>but the challenge is how should we define <script type="math/tex">[-1,1]^x</script> analytically so that this hypercube reduces to the usual hypercube when <script type="math/tex">x \in \mathbb{N}</script>. I think this requires a suitable representation of the Cartesian Product.</p> <p>One idea that occurred to me was to represent Cartesian Products as multipartite graphs.</p> <h2 id="references">References:</h2> <ol> <li>Ron C Blei. Fractional cartesian products of sets. 1979.</li> <li>Ron Blei, Fuchang Gao. Combinatorial dimension in fractional Cartesian products. 2005.</li> </ol>Aidan RockeIntroduction:The number of ways to partition a graph2019-07-05T00:00:00+00:002019-07-05T00:00:00+00:00/graph/theory/2019/07/05/graph-partition<h2 id="introduction">Introduction:</h2> <p>Let’s suppose we have a graph with <script type="math/tex">N</script> vertices. How many ways can these vertices be wired to each other assuming that these vertices are distinct and each vertex <script type="math/tex">v_i</script> may be connected to at most <script type="math/tex">N-1</script> distinct vertices? Alternately, let’s consider the set <script type="math/tex">G_N</script> of simple graphs with <script type="math/tex">N</script> vertices. This set may correspond to the set of potential social networks among a community of <script type="math/tex">N</script> individuals.</p> <p>What is the cardinality of <script type="math/tex">G_N</script>? We may show that:</p> <p>\begin{equation} \lvert G_N \rvert = \sum_{k=0}^{N \choose 2} { {N \choose 2} \choose k} = 2^{N \choose 2} \end{equation}</p> <p>and we may note that <script type="math/tex">\lvert G_N \rvert</script> very quickly becomes astronomical:</p> <p>\begin{equation} \forall N &gt; 50, \lvert G_N \rvert &gt; 10^{368} \end{equation}</p> <p>which is many times greater than the number of atoms in the universe.</p> <h2 id="a-few-observations">A few observations:</h2> <h3 id="lvert-g_n-rvert-grows-more-than-exponentially-fast"><script type="math/tex">\lvert G_N \rvert</script> grows more than exponentially fast:</h3> <p>It’s worth noting that <script type="math/tex">\lvert G_N \rvert</script> grows more than exponentially fast as a function of <script type="math/tex">N</script> since:</p> <p>\begin{equation} \frac{\lvert G_{N+1} \rvert}{\lvert G_{N} \rvert} = 2^N \end{equation}</p> <p>so we have:</p> <p>\begin{equation} \lvert G_{N+1} \rvert = 2^N \cdot \lvert G_{N} \rvert \end{equation}</p> <p>and this means that whenever we add a vertex <script type="math/tex">\widehat{v_{N+1}}</script> to a network with <script type="math/tex">N</script> vertices the number of possible networks grows by a factor of <script type="math/tex">2^N</script>. The reason for this is that when a new vertex <script type="math/tex">\widehat{v_{N+1}}</script> is added to a graph with <script type="math/tex">N</script> vertices there are <script type="math/tex">N</script> possible new edges between <script type="math/tex">\widehat{v_{N+1}}</script> and the existing set of vertices.</p> <p>Another way to think about (3) is that given a graph with <script type="math/tex">N</script> vertices an additional vertex <script type="math/tex">\widehat{v_{N+1}}</script> adds <script type="math/tex">N</script> bits of information. Between any two vertices we have either a connection or we don’t so:</p> <p>\begin{equation} \log_2(\lvert G_{N+1} \rvert) - \log_2(\lvert G_{N} \rvert) = N \end{equation}</p> <h3 id="probabilistic-analysis">Probabilistic analysis:</h3> <p>Now, let’s consider the probability of a connection between a random pair of vertices <script type="math/tex">(v_i,v_j)</script> in a graph <script type="math/tex">\Gamma_N</script> sampled uniformly from <script type="math/tex">G_N</script>:</p> <p>\begin{equation} P(\overline{v_iv_j} \in \Gamma_N) = \frac{ {N \choose 2} }{2^{N \choose 2}} \end{equation}</p> <p>and this probability goes down exponentially quickly since:</p> <p>\begin{equation} \frac{P(\overline{v_iv_j} \in \Gamma_{N+1})}{P(\overline{v_lv_k} \in \Gamma_N)} = \frac{(N+1) \cdot 2^{-N}}{N-1} \approx 2^{-N} \end{equation}</p> <p>Within the context of social networks, if we suppose that a connection between any pair of individuals occurs with probability half then the probability of a connection between a randomly chosen pair of individuals drops off to zero exponentially fast as the size of the network, i.e. number of individuals, grows.</p> <p>We can make one more relatively simple observation that is also useful. Given the symmetry of binomial coefficients, the probability that a randomly chosen graph <script type="math/tex">\Gamma_N \sim G_N</script> has more than half the maximum number of edges, <script type="math/tex">{N \choose 2}</script>, is also <script type="math/tex">\frac{1}{2}</script>. This doesn’t contradict the last observation since the number of possible edges grows quadratically <script type="math/tex">\sim \frac{N^2}{2}</script> while the number of possible vertices grows linearly <script type="math/tex">\sim N</script>.</p> <h2 id="discussion">Discussion:</h2> <p>This analysis actually preceded my last article on <a href="https://keplerlounge.com/math/2019/07/02/connected-graphs.html">simple graphs that are connected</a> and due to the rapid growth of <script type="math/tex">\lvert G_N \rvert</script> we may ask what does a typical graph look like.</p> <p>We know that almost all simple graphs are connected but generally speaking what are the properties of almost all simple graphs?</p>Aidan RockeIntroduction:Almost all simple graphs are connected2019-07-02T00:00:00+00:002019-07-02T00:00:00+00:00/math/2019/07/02/connected-graphs<h2 id="introduction">Introduction:</h2> <p>Recently, I wondered whether given the set of graphs with <script type="math/tex">N</script> distinguishable vertices, <script type="math/tex">G_N</script>, whether most of these graphs might be connected. This set may correspond to the state space of a biological network whose connectivity varies over time such as a brain. The relevance of connectivity here is that it guarantees a path between different fundamental nodes.</p> <p>Initially, I thought we might need to derive the asymptotic formula for the number of connected graphs with <script type="math/tex">N</script> vertices. It turns out that there’s a much simpler approach using the Erdős–Rényi random graph model. I realised this after discussing a related question with <a href="https://mathoverflow.net/questions/334936/asymptotic-formula-for-the-number-of-connected-graphs">mathematicians on the MathOverflow</a>.</p> <h2 id="demonstration">Demonstration:</h2> <p><a href="https://www.math.u-psud.fr/~fouquet/">Olivier Fouquet</a> and lambda made very helpful remarks regarding the connection with random graphs. In particular, I would like to point out lambda’s remark that:</p> <blockquote> <p>…the Erdős–Rényi random graph model with edge probability 1/2 gives the uniform distribution on labelled graphs</p> </blockquote> <p>Using this insight we may proceed as follows:</p> <p>Let’s first note that the Erdős–Rényi random graph model with edge probability 1/2 gives the uniform distribution on labelled graphs since for each pair of vertices they are either joined by an edge or not. It follows that given a graph with <script type="math/tex">N</script> vertices the probability that any finite subset of <script type="math/tex">k</script> vertices, <script type="math/tex">V \subset \{v_i\}_{i=1}^N</script> and <script type="math/tex">\lvert V \rvert=k</script>, are joined to a common vertex <script type="math/tex">v_l \notin V</script> is given by:</p> <p>\begin{equation} 1 - {N \choose k}\big(1-\frac{1}{2^k} \big)^{N-k} \end{equation}</p> <p>Now, we would like to show that:</p> <p>\begin{equation} \lim\limits_{N \to \infty}{N \choose k}\big(1-\frac{1}{2^k} \big)^{N-k}=0 \end{equation}</p> <p>Let’s first note that:</p> <p>\begin{equation} {N \choose k}=\frac{N!}{k!(N-k)!} \leq N^k \end{equation}</p> <p>\begin{equation} \big(1-\frac{1}{2^k} \big)^{N-k} \propto \big(1-\frac{1}{2^k} \big)^N \sim e^{-\frac{N}{2^k}} \end{equation}</p> <p>and taking logarithms we find that for fixed <script type="math/tex">k \in \mathbb{N}</script>:</p> <p>\begin{equation} \lim_{N \to \infty} \frac{\ln N}{N} &lt; \frac{1}{k2^k} \end{equation}</p> <p>so we may conclude that a simple graph is connected with probability 1.</p> <p>As a corollary we may deduce that for large <script type="math/tex">N</script> the number of connected graphs <script type="math/tex">K_N</script> is given by:</p> <p>\begin{equation} \lvert K_N \rvert \sim 2^{N \choose 2} \end{equation}</p> <h2 id="discussion">Discussion:</h2> <p>I’m quite satisfied with this demonstration using probabilistic arguments as it’s a lot simpler than the approach proposed by  and . However, I must say that  and  contain interesting insights and methods that I haven’t seen before. For this reason both of these publications are on my reading list.</p> <h1 id="references">References:</h1> <ol> <li>E. Bender, E. Canfield &amp; B. McKay. The Asymptotic Number of labeled Connected Graphs with a Given Number of I/ertices and Edges. 1990.</li> <li>Example II.15 in Flajolet and Sedgewick, Analytic Combinatorics. 2009.</li> </ol>Aidan RockeIntroduction:A constructive proof of the Vitali Covering Lemma2019-06-14T00:00:00+00:002019-06-14T00:00:00+00:00/real/analysis/2019/06/14/vitali<h2 id="theorem">Theorem:</h2> <p>Let <script type="math/tex">\{B_i\}_{i=1}^n</script> be a finite collection of balls in <script type="math/tex">\mathbb{R}^d</script>. Then there exists a sub-collection of balls <script type="math/tex">\{B_{j_i}\}_{i=1}^m</script> that are disjoint and satisfy:</p> <p>\begin{equation} \bigcup_{i=1}^n B_i \subseteq \bigcup_{i=1}^m 3 \cdot B_{j_i} \end{equation}</p> <h2 id="demonstration">Demonstration:</h2> <p>Let’s define:</p> <p>\begin{equation} \mathcal{B_1} := \bigcup_{i=1}^n B_i \end{equation}</p> <p>such that we re-index <script type="math/tex">B_i</script> so we have:</p> <p>\begin{equation} \mathcal{B_{1,1}} := B_1 \end{equation}</p> <p>\begin{equation} Vol(\mathcal{B_{1,i}}) \geq Vol(\mathcal{B_{1,i+1}}) \end{equation}</p> <p>and given <script type="math/tex">\mathcal{B_1}</script> we may define:</p> <p>\begin{equation} C_1 = \{\mathcal{B_{1,j}}: \mathcal{B_{1,j}} \cap \mathcal{B_{1,1}} \neq \emptyset \} \end{equation}</p> <p>so we have:</p> <p>\begin{equation} C_1 \subseteq 3 \cdot \mathcal{B_{1,1}} \end{equation}</p> <p>Now, using <script type="math/tex">\mathcal{B_i}</script> and <script type="math/tex">C_i</script> we may construct the following:</p> <p>\begin{equation} \mathcal{B_{i+1}} = \mathcal{B_i} \setminus C_i \end{equation}</p> <p>\begin{equation} \lvert \mathcal{B_{i+1}} \rvert &lt; \lvert \mathcal{B_{i}} \rvert \end{equation}</p> <p>\begin{equation} Vol(\mathcal{B_{i,j}}) &gt; Vol(\mathcal{B_{i,j+1}}) \end{equation}</p> <p>and by induction we have:</p> <p>\begin{equation} \bigcup_{i=1}^n B_i \subseteq \bigcup_{j=1}^m 3 \cdot \mathcal{B_{j,1}} \end{equation}</p> <p>where <script type="math/tex">m</script> is the smallest integer such that <script type="math/tex">\lvert \mathcal{B_{m+1}} \rvert = 0</script>.</p>Aidan RockeTheorem:False Dichotomies2019-05-21T00:00:00+00:002019-05-21T00:00:00+00:00/logic/2019/05/21/false-dichotomies<p>My philosophy of science, if I have one, can be summarised by the principle that we should ensure that our intellectual constructs aren’t merely diversions. One way I apply this principle is by trying to work out my own solution to problems before reading the accepted scientific solution. If the previous approach isn’t applicable, I try and determine empirically and/or analytically whether we are trying to fit circular pegs into square holes.</p> <p>This is frequently the case with dichotomies, which are very often mere figments of your imagination, and to illustrate my point I shall provide a few examples.</p> <ol> <li> <p>All organisms are either terrestrial or not terrestrial.</p> <p>What about amphibians?</p> </li> <li> <p>Humans have free will or they don’t have free will.</p> <p>Here we are assuming that ‘free will’ is a scientifically useful notion although it is grounded in introspection and not empirical observation. We can define the spatial freedom of a Newtonian particle in some sense but the meaning of ‘free will’ is sufficiently flexible to survive any experimental test.</p> <p>Free will is a metaphysical idea and therefore outside the domain of science.</p> </li> <li> <p>Benjamin is a good person or he isn’t a good person.</p> <p>Most people are complicated characters that don’t fit into simplistic Disney categories. Benjamin might be a great scientist but a nasty football player on weekends. If you ask his mates that play football they’ll tell you that he’s an ass and if you ask his scientific colleagues they’ll tell you that he’s a great person.</p> <p>Which account is true? On one level it depends who you ask. On another level, the ‘good person’ category is much too simplistic to describe people.</p> </li> <li> <p>Anne is conscious or not conscious.</p> <p>This intellectual construct is similar to ‘free will’ in the sense that it isn’t something we can observe empirically. We can have a vague notion of an internal mental model of ourself in our environment but so does a fish or a rat. So how does consciousness set us apart from any organism capable of adapting to its environment?</p> <p>In fairness to human knowledge, consciousness and free will are part of a pre-scientific and anthropocentric view of the Universe.</p> </li> <li> <p>An elephant is either less than 100 m long or more than 100 m long.</p> <p>In this case we are trying to ascribe a length to an object that has three dimensions so there isn’t a unique method for measuring an elephant. In fact, there is an infinite number of ways to measure the length of an object with more than one dimension.</p> <p>It follows that in this circumstance, like the others, our intellectual construct is merely a diversion.</p> </li> </ol> <p>The reader might wonder what stimulated this reflection. Well, a couple weeks ago I reflected upon Luitzen Brouwer’s criticism of the law of excluded middle. This law basically states that for any proposition, either that proposition is true or its negation is true. Intuitively, it makes sense but I provided five concrete examples where the law isn’t applicable.</p> <p>Once in a while it’s useful to reconsider the things we take for granted.</p>Aidan RockeMy philosophy of science, if I have one, can be summarised by the principle that we should ensure that our intellectual constructs aren’t merely diversions. One way I apply this principle is by trying to work out my own solution to problems before reading the accepted scientific solution. If the previous approach isn’t applicable, I try and determine empirically and/or analytically whether we are trying to fit circular pegs into square holes.What if we could simulate the human brain?2019-05-14T00:00:00+00:002019-05-14T00:00:00+00:00/neuroscience/2019/05/14/brains<center><img src="https://raw.githubusercontent.com/Kepler-Lounge/Kepler-Lounge.github.io/master/_images/frog.jpg" width="75%" height="75%" align="middle" /></center> <center>Relative to what is really going on in the Universe we might as well be frogs. </center> <h2 id="introduction">Introduction:</h2> <p>As an increasing amount of money is being allocated to computational neuroscience I sometimes wonder whether neuroscience theory will manage to keep up with progress in neuroscience simulations. The recent progress in deep learning, an example of <a href="https://t.co/4Pp4MkZPtu">neuromorphic computation</a>, appears to suggest otherwise. Basically, we are in a situation where highly-nonlinear connectionist models work quite well and we have no idea why. I must add that compared to the human brain these connectionist models are quite simple.</p> <p>In this context, I gathered five questions which I would personally like to have answered in a world where we have access to increasingly realistic and embodied(in VR perhaps) human brain simulations.</p> <p>Caveat: When scientists say ‘what you can’t build you can’t understand’ they mean that a simulation is necessary but they don’t imply that it’s sufficient.</p> <h2 id="what-if-we-could-simulate-the-human-brain">What if we could simulate the human brain?:</h2> <ol> <li> <p>Would our understanding of the brain validate the good regulator theorem?</p> <p>This theorem which dates back to a paper by Conant &amp; Ashby  states that:</p> <blockquote> <p>Every good regulator of a system must be a model of that system.</p> </blockquote> <p>In simple english this theorem is saying that because the organism and its environment form a coupled dynamical system they must in some sense form mirror images of each other. I must add that providing a sound mathematical basis for this theorem remains an <a href="https://mathoverflow.net/questions/327012/rigorous-proof-of-the-good-regulator-theorem">open problem</a>. The reader might also be interested in the recent review by Daniel McNamee and Daniel Wolpert .</p> <p>I think this is a very important problem as it directly links brains to behaviour and behaviour with respect to the environment in particular. In fact, I would argue that it provides a sensible path to a non-anthropomorphic definition of intelligent behaviour that potentially applies to all organisms.</p> </li> <li> <p>Would we understand how uncertainty is represented and computed in the human brain?</p> <p>In very complex environments that may or may not be deterministic, the epistemic and statistical uncertainty of the organism implies a probabilistic knowledge representation. It follows that from an algorithmic perspective, intelligent reasoning comes down to having good models and algorithms for uncertainty representation and computation.</p> <p>I have organised a <a href="https://github.com/Kepler-Lounge/Uncertainty_in_the_brain">list of papers on this subject</a> though I think this would also require answering <a href="https://psychology.stackexchange.com/questions/23248/what-sources-of-randomness-does-the-brain-use-for-sampling">an essential sub-question</a>:</p> <blockquote> <p>What sources of randomness does the brain use for sampling?</p> </blockquote> <p>This sub-question remains an open problem.</p> </li> <li> <p>Would we have a detailed understanding of the wiring optimisation problem and how it relates to neurogenesis?</p> <p>Another fundamental problem that interests me and that I would easily rank among the 23 most important unsolved problems in neuroscience is the wiring optimisation which dates back to S. Ramón y Cajal who postulated that brains are arranged to minimise wire length . I must add that this problem can be approached from different perspectives and each perspective is essentially a different formulation of the wiring optimisation problem.</p> <p>In  Dmitri Chklovskii and Charles Stevens formulate this problem as follows:</p> <blockquote> <p>Wiring a brain presents formidable problems because of the extremely large number of connections: a microliter of cortex contains approximately 105 neurons, 109 synapses, and 4 km ofaxons, with 60% of the cortical volume being taken up with “wire”, half of this by axons and the other half by dendrites. [ 1] Each cortical neighborhood must have exactly the right balance of components; if too many cell bodies were present in a particular mm cube, for example, insufficient space would remain for the axons, dendrites and synapses. Here we ask “What fraction of the cortical volume should be wires (axons + dendrites)?”</p> </blockquote> <p>I am motivated by its potential impact on different areas of neuroscience. In particular, the areas of developmental neuroscience, network neuroscience and biophysics(i.e. the energetic constraints on information processing in human brains).</p> </li> <li> <p>Would we understand what makes the brain energy-efficient and use this understanding to build neuromorphic computers and advanced neural prostheses?</p> <p>Although Google DeepMind recently accomplished an amazing feat by building AlphaGo Zero which could defeat the world’s best human Go players, an even more amazing fact is that the human brain uses ~20 Watts compared to the 200 KiloWatts used by 5000 TPUs to power AlphaGo Zero during training . In other words, AlphaGo Zero was ten thousand times less energy efficient than a human being for a comparable result. How is this possible?</p> <p>At present neuroscientists and computer scientists still have very little idea but I think that understanding neurogenesis and wiring trade-offs in the human brain shall be key. I also think that it will require building models of computation that relate computational complexity to thermodynamic costs of computation.</p> <p>Once we have a good theory of biologically-plausible and energy-efficient computer architecture I expect that we shall have a revolution in neuromorphic computing which shall lead to advanced neural prostheses. For more information, I highly recommend the <a href="https://cacm.acm.org/magazines/2019/4/235577-neural-algorithms-and-computing-beyond-moores-law/fulltext">review of Brad Aimone on ‘Neural Algorithms and Computing Beyond Moore’s Law’</a>.</p> </li> <li> <p>Will we automatically derive a theory of collective intelligence(ex. economics, social networks…etc.)?</p> <p>My understanding of the history of statistical mechanics suggests otherwise but this leads me to a related question. Is there a coarse-grained model for every collective intelligence model that isn’t coarse? In other words, I suspect that due to small-world phenomena we will be able to approximate the actual model up to epsilon accuracy while giving up significant amounts of information.</p> </li> </ol> <h2 id="open-ended-discussion">Open-ended discussion:</h2> <p>I don’t know how many of these questions we will be able to answer in the next thirty years. But, my hope is that I will be able to work on all of them within the next 15 years. Meanwhile, I look forward to hearing from other scientists working on behaviour, cognition, and/or neuroscience.</p> <p>Most of the problems posed above are listed as <a href="https://github.com/Kepler-Lounge/theoretical_neuroscience/issues">issues on Github</a>. The reader is welcome to open an issue and add a different problem provided that they link to a question asked on a stackexchange site. Reasonable candidates include:</p> <ol> <li><a href="https://psychology.stackexchange.com/">The Psychology and Neuroscience stackexchange</a>: for the formulation of problems in Neuroscience and/or Cognitive science</li> <li><a href="https://biology.stackexchange.com/">The Biology stackexchange</a>: for questions concerning biology including biophysics</li> <li><a href="https://stats.stackexchange.com/">The CrossValidated stackexchange</a>: for statistical questions that might concern Markov models and/or machine learning</li> <li><a href="https://cstheory.stackexchange.com/">The Theoretical Computer Science stackexchange</a>: for algorithmic analysis of computational models</li> <li><a href="https://mathoverflow.net/">The MathOverflow</a>: for mathematical insights into computational models</li> </ol> <p>My rationale is that this would allow other scientists to participate in gathering intelligence on the problem, i.e. references, as well as consider different formulations of the problem, potentially relevant open-source software and open-access datasets.</p> <h1 id="references">References:</h1> <ol> <li>Roger C. Conant and W. Ross Ashby, Every good regulator of a system must be a model of that system), International Journal of Systems Science 1 (1970), 89–97.</li> <li>D. McNamee and D. Wolpert. Internal Models in Biological Control. Annual Review of Control, Robotics, and Autonomous Systems. 2019.</li> <li>U. Maoz et al. Noise and the two-thirds power law. 2006.</li> <li>M. Richardson &amp; T. Flash. Comparing Smooth Arm Movements with the Two-Thirds Power Law and the Related Segmented-Control Hypothesis. 2002.</li> <li>Wei Ji Ma, J. Beck, P. Latham &amp; A. Pouget. Bayesian inference with probabilistic population codes. Nature Neuroscience. 2006.</li> <li>Andre Longtin. Neuronal noise. Scholarpedia. 2013.</li> <li>D. Dold et al. Stochasticity from function - why the Bayesian brain may need no noise. Arxiv. 2018.</li> <li>R. Cannon , C. O’Donnell , M. Nolan . Stochastic Ion Channel Gating in Dendritic Neurons: Morphology Dependence and Probabilistic Synaptic Activation of Dendritic Spikes. PLOS. 2010.</li> <li>D. Chklovskii, C. Stevens. Wiring optimization in the brain. NIPS. 2000.</li> <li>D. Van Essen. A tension-based theory of morphogenesis and compact wiring in the nervous system. Nature. 1997.</li> <li>G. Shepherd, A. Stepanyants, I. Bureau, D. Chklovskii and K. Svoboda. Geometric and functional organization of cortical circuits. Nature Neuroscience. 2005.</li> <li>M. Kaiser &amp; C. Hilgetag. Nonoptimal Component Placement, but Short Processing Paths, due to Long-Distance Projections in Neural Systems. PLOS. 2006.</li> <li>A. Stepanyants, L. Martinez, A. Ferecskó , and Z. Kisvárda. The fractions of short- and long-range connections in the visual cortex. PNAS. 2008.</li> <li>Q. Wena, A. Stepanyants, G. Elstonc, A. Grosberg, and D. Chklovskii. Maximization of the connectivity repertoire as a statistical principle governing the shapes of dendritic arbors. PNAS. 2009.</li> <li>C. Cherniak. Neural Wiring Optimization. 2011.</li> <li>E. Bullmore , O. Sporns.The economy of brain network organization. Nat Rev Neurosci. 2012.</li> <li>M. Hofman. Evolution of the human brain: when bigger is better. Frontiers in Neuroanatomy. 2014.</li> <li>A. Gushchin, A. Tang. Total Wiring Length Minimization of C. elegans Neural Network: A Constrained Optimization Approach. PLOS. 2015.</li> <li>J. Niven. Neuronal energy consumption: biophysics, efficiency and evolution. 2016.</li> <li>I. Wang &amp; T.Clandinin. The Influence of Wiring Economy on Nervous System Evolution. Current Biology. 2016.</li> <li>S. Srinivasan, C. Stevens. Scaling principles of distributed circuits. biorxiv. 2018.</li> <li>J. Stiso &amp; D. Bassett. Spatial Embedding Imposes Constraints on the Network Architectures. Arxiv. 2018. of Neural Systems</li> <li>D. Silver et al. Mastering the game of Go without human knowledge. 2017.</li> <li>Aidan Rocke. The true cost of AlphaGo Zero. Kepler Lounge. 2019.</li> </ol>Aidan RockeRelative to what is really going on in the Universe we might as well be frogs.Mimesis as random graph coloring, Part I2019-04-29T00:00:00+00:002019-04-29T00:00:00+00:00/self-organisation/2019/04/29/mimetic-I<h2 id="motivation">Motivation:</h2> <p>While reading ‘The physics of brain network structure, function and control’ by Chris Lynn and Dani Bassett I learned that the statistical mechanics of complex networks by Réka Albert &amp; Albert-László Barabási was an essential reference [1,2]. But, I didn’t know much graph theory and even less about random graphs, which play a central role in that reference. In order to develop an intuition about random graphs I decided to map this idea to a phenomenon I observed on a daily basis at all levels of society, mimetic behavior.</p> <p>Here, I propose a simple and tractable mechanism for mimetic desire . When we change our beliefs, we do so not because of their intrinsic value. Our desire to switch from belief <script type="math/tex">A</script> to belief <script type="math/tex">B</script> is proportional to the number of adherents of belief <script type="math/tex">B</script> that we know. Technically, I modeled the problem of two conflicting beliefs that propagate through a network with <script type="math/tex">N</script> nodes in a decentralised manner.</p> <p>Using vertex notation, two individuals <script type="math/tex">v_i</script> and <script type="math/tex">v_j</script> with identical beliefs are connected with probability <script type="math/tex">q</script>, and <script type="math/tex">1-q</script> otherwise. <script type="math/tex">v_i</script> changes its belief with a probability proportional to the number of nodes connected to <script type="math/tex">v_i</script> that have opposing views. Two key motivating questions are:</p> <ol> <li>Under what circumstances does a belief get completely wiped out.</li> <li>Under what circumstances does a belief completely dominate(i.e. wipe out) all other beliefs.</li> </ol> <p>In the scenario where there are only two possible beliefs these two questions are equivalent and I show that on average it’s sufficient that <script type="math/tex">q > 1-q</script> and that initially, one belief has a greater number of adherents than the other.</p> <h2 id="representation-of-the-problem">Representation of the problem:</h2> <h3 id="virtual-weights-as-a-representation-of-potential-connections">Virtual weights as a representation of potential connections:</h3> <p>Nodes carrying the first belief were assigned to the set of red vertices, <script type="math/tex">R</script>, and nodes carrying the second belief were assigned to the set of blue vertices, <script type="math/tex">B</script>. However, I wasn’t satisfied with this representation.</p> <p>After some reflection, I chose <script type="math/tex">+1</script> and <script type="math/tex">-1</script> as labels. The reason being that a change of belief using this representation would be equivalent to multiplication by <script type="math/tex">-1</script>. As a result, the <script type="math/tex">N</script> vertices could be represented by an N-dimensional vector:</p> <p>\begin{equation} \vec{v} \in \{-1,1\}^N \end{equation}</p> <p>where <script type="math/tex">N= \lvert v_i \in R \rvert + \lvert v_j \in B \rvert</script>.</p> <p>Using this representation, between each pair of vertices we may define a virtual weight matrix <script type="math/tex">W</script>:</p> <p>\begin{equation} w_{ij} = v_i \cdot v_j \end{equation}</p> <p>where <script type="math/tex">w_{ij}=+1</script> implies identical beliefs and we have <script type="math/tex">w_{ij}=-1</script> otherwise.</p> <p>Now, we note that <script type="math/tex">W</script> may be conveniently decomposed as follows:</p> <p>\begin{equation} W= W^+ + W^- \end{equation}</p> <p>where <script type="math/tex">W^-</script> denotes potential connections between nodes of different color and <script type="math/tex">W^+</script> denotes potential connections between nodes of identical colors.</p> <h3 id="modelling-the-adjacency-matrix-as-a-combination-of-random-matrices">Modelling the adjacency matrix as a combination of random matrices:</h3> <p>In order to simulate variations in connectivity we may assume that nodes of the same color are connected with probability:</p> <p>\begin{equation} \frac{1}{2} &lt; q &lt; 1 \end{equation}</p> <p>and nodes of different color are connected with probability <script type="math/tex">1-q</script>.</p> <p>Given <script type="math/tex">W</script> we may therefore construct the adjacency matrix <script type="math/tex">A</script> by sampling random matrices:</p> <p>\begin{equation} M_1, M_2 \sim \mathcal{U}([0,1])^{N \times N} \end{equation}</p> <p>\begin{equation} M^+ = 1_{[0,q)} \circ M_1 \end{equation}</p> <p>\begin{equation} M^- = 1_{(1-q,1]} \circ M_2 \end{equation}</p> <p>where <script type="math/tex">1_{[0,q)}</script> denotes the characteristic function over the set <script type="math/tex">[0,q)</script> and then we compute the Hadamard products:</p> <p>\begin{equation} A^+ = M^+ \cdot W^+ \end{equation}</p> <p>\begin{equation} A^- = M^- \cdot W^- \end{equation}</p> <p>so the adjacency matrix is given by <script type="math/tex">A = A^+ + A^-</script>.</p> <p>Now, in order to simulate variations in connectivity we simply use majority vote:</p> <p>\begin{equation} p(v_i^{n+1}=v_i^n) = \frac{\bar{N_i}}{N_i} \end{equation}</p> <p>\begin{equation} p(v_i^{n+1}=-1 \cdot v_i^n) = 1- \frac{\bar{N_i}}{N_i} \end{equation}</p> <p>\begin{equation} \bar{N_i} = \lvert A(i,-) &gt; 0 \rvert -1 \end{equation}</p> <p>\begin{equation} N_i = \bar{N_i} + \lvert A(i,-) &lt; 0 \rvert \end{equation}</p> <p>where <script type="math/tex">\lvert A(i,-) > 0 \rvert -1</script> denotes the number of connections between <script type="math/tex">v_i</script> and nodes sharing the same belief without counting a connection to itself.</p> <h2 id="simulation">Simulation:</h2> <p>Putting everything together with Julia we may create the following mimesis function which takes as input the number of nodes in the random graph <script type="math/tex">n</script>, the number of iterates of the system <script type="math/tex">N</script>, the fraction of nodes that are red <script type="math/tex">p</script>, and finally the</p> <div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">function</span><span class="nf"> mimesis</span><span class="x">(</span><span class="n">n</span><span class="o">::</span><span class="kt">Int64</span><span class="x">,</span><span class="n">N</span><span class="o">::</span><span class="kt">Int64</span><span class="x">,</span><span class="n">p</span><span class="o">::</span><span class="kt">Float64</span><span class="x">,</span><span class="n">q</span><span class="o">::</span><span class="kt">Float64</span><span class="x">)</span> <span class="sb"></span><span class="err"></span> <span class="n">n</span><span class="x">:</span> <span class="n">number</span> <span class="n">of</span> <span class="n">nodes</span> <span class="k">in</span> <span class="n">the</span> <span class="n">random</span> <span class="n">graph</span> <span class="n">N</span><span class="x">:</span> <span class="n">number</span> <span class="n">of</span> <span class="n">iterations</span> <span class="n">of</span> <span class="n">the</span> <span class="n">random</span> <span class="n">graph</span> <span class="n">coloring</span> <span class="n">process</span> <span class="n">p</span><span class="x">:</span> <span class="n">the</span> <span class="n">fraction</span> <span class="n">of</span> <span class="n">nodes</span> <span class="n">that</span> <span class="n">are</span> <span class="n">red</span> <span class="n">q</span><span class="x">:</span> <span class="n">the</span> <span class="n">probability</span> <span class="n">that</span> <span class="n">nodes</span> <span class="n">of</span> <span class="n">the</span> <span class="n">same</span> <span class="n">color</span> <span class="n">form</span> <span class="n">connections</span> <span class="sb"></span><span class="err"></span> <span class="c">## initialisation:</span> <span class="n">v</span> <span class="o">=</span> <span class="n">zeros</span><span class="x">(</span><span class="n">N</span><span class="x">,</span><span class="n">n</span><span class="x">)</span> <span class="n">color_ratio</span> <span class="o">=</span> <span class="n">zeros</span><span class="x">(</span><span class="n">N</span><span class="x">)</span> <span class="c">## generate positive terms:</span> <span class="n">v1</span> <span class="o">=</span> <span class="x">(</span><span class="n">rand</span><span class="x">(</span><span class="n">n</span><span class="x">)</span> <span class="o">.&lt;</span> <span class="n">p</span><span class="x">)</span><span class="o">*</span><span class="mf">1.0</span> <span class="c">## generate negative terms:</span> <span class="n">v2</span> <span class="o">=</span> <span class="n">ones</span><span class="x">(</span><span class="n">n</span><span class="x">)</span> <span class="o">-</span> <span class="n">v1</span> <span class="c">## negative terms</span> <span class="n">v</span><span class="x">[</span><span class="mi">1</span><span class="x">,:]</span> <span class="o">=</span> <span class="n">v1</span> <span class="o">-</span> <span class="n">v2</span> <span class="n">color_ratio</span><span class="x">[</span><span class="mi">1</span><span class="x">]</span> <span class="o">=</span> <span class="n">sum</span><span class="x">((</span><span class="n">v</span><span class="x">[</span><span class="mi">1</span><span class="x">,:]</span> <span class="o">.&gt;</span> <span class="mi">0</span><span class="x">)</span><span class="o">*</span><span class="mf">1.0</span><span class="x">)</span><span class="o">/</span><span class="n">n</span> <span class="k">for</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">1</span><span class="x">:</span><span class="n">N</span><span class="o">-</span><span class="mi">1</span> <span class="n">W</span> <span class="o">=</span> <span class="n">zeros</span><span class="x">(</span><span class="n">n</span><span class="x">,</span><span class="n">n</span><span class="x">)</span> <span class="k">for</span> <span class="n">j</span> <span class="o">=</span><span class="mi">1</span><span class="x">:</span><span class="n">n</span> <span class="k">for</span> <span class="n">k</span> <span class="o">=</span><span class="mi">1</span><span class="x">:</span><span class="n">n</span> <span class="c">## ignore self-connections</span> <span class="n">W</span><span class="x">[</span><span class="n">j</span><span class="x">,</span><span class="n">k</span><span class="x">]</span> <span class="o">=</span> <span class="n">v</span><span class="x">[</span><span class="n">i</span><span class="x">,</span><span class="n">j</span><span class="x">]</span><span class="o">*</span><span class="n">v</span><span class="x">[</span><span class="n">i</span><span class="x">,</span><span class="n">k</span><span class="x">]</span><span class="o">*</span><span class="x">(</span><span class="n">j</span> <span class="o">!=</span> <span class="n">k</span><span class="x">)</span><span class="o">*</span><span class="mf">1.0</span> <span class="k">end</span> <span class="k">end</span> <span class="k">if</span> <span class="n">abs</span><span class="x">(</span><span class="n">sum</span><span class="x">(</span><span class="n">v</span><span class="x">[</span><span class="n">i</span><span class="x">,:]))</span> <span class="o">&lt;</span> <span class="n">n</span> <span class="c">## split W into positive and negative parts...</span> <span class="n">Z1</span> <span class="o">=</span> <span class="x">(</span><span class="n">W</span> <span class="o">.&lt;</span> <span class="mi">0</span><span class="x">)</span><span class="o">*</span><span class="mf">1.0</span> <span class="c">## vertices with different colors</span> <span class="n">Z2</span> <span class="o">=</span> <span class="n">W</span> <span class="o">+</span> <span class="n">Z1</span> <span class="c">## vertices with the same color</span> <span class="c">## sample random zero-one matrices for new connections:</span> <span class="n">M1</span> <span class="o">=</span> <span class="x">(</span><span class="n">rand</span><span class="x">(</span><span class="n">n</span><span class="x">,</span><span class="n">n</span><span class="x">)</span> <span class="o">.&lt;</span> <span class="n">q</span><span class="x">)</span><span class="o">*</span><span class="mf">1.0</span> <span class="n">M2</span> <span class="o">=</span> <span class="x">(</span><span class="n">rand</span><span class="x">(</span><span class="n">n</span><span class="x">,</span><span class="n">n</span><span class="x">)</span> <span class="o">.&lt;</span> <span class="mi">1</span><span class="o">-</span><span class="n">q</span><span class="x">)</span><span class="o">*</span><span class="mf">1.0</span> <span class="c">## use Hadamard products to construct new adjacency matrices:</span> <span class="n">A1</span> <span class="o">=</span> <span class="n">M1</span> <span class="o">.*</span> <span class="n">Z2</span> <span class="n">A2</span> <span class="o">=</span> <span class="n">M2</span> <span class="o">.*</span> <span class="n">Z1</span> <span class="n">A</span> <span class="o">=</span> <span class="n">A1</span> <span class="o">-</span> <span class="n">A2</span> <span class="c">## generate probabilities for color transformations:</span> <span class="n">M_hat</span> <span class="o">=</span> <span class="x">[</span><span class="n">sum</span><span class="x">(</span><span class="n">A</span><span class="x">[</span><span class="n">i</span><span class="x">,:]</span> <span class="o">.&gt;</span> <span class="mi">0</span><span class="x">)</span> <span class="k">for</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">1</span><span class="x">:</span><span class="n">n</span><span class="x">]</span> <span class="n">M</span> <span class="o">=</span> <span class="n">M_hat</span> <span class="o">+</span> <span class="x">[</span><span class="n">sum</span><span class="x">(</span><span class="n">A</span><span class="x">[</span><span class="n">i</span><span class="x">,:]</span> <span class="o">.&lt;</span> <span class="mi">0</span><span class="x">)</span> <span class="k">for</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">1</span><span class="x">:</span><span class="n">n</span><span class="x">]</span> <span class="n">P</span> <span class="o">=</span> <span class="x">[</span><span class="n">M_hat</span><span class="x">[</span><span class="n">i</span><span class="x">]</span><span class="o">/</span><span class="x">(</span><span class="n">M</span><span class="x">[</span><span class="n">i</span><span class="x">]</span><span class="o">+</span><span class="mf">1.0</span><span class="x">)</span> <span class="k">for</span> <span class="n">i</span> <span class="o">=</span><span class="mi">1</span><span class="x">:</span><span class="n">n</span><span class="x">]</span> <span class="c">## zero-one vectors based on P:</span> <span class="n">p_q</span> <span class="o">=</span> <span class="x">(</span><span class="n">rand</span><span class="x">(</span><span class="n">n</span><span class="x">)</span> <span class="o">.&lt;</span> <span class="n">P</span><span class="x">)</span><span class="o">*</span><span class="mf">1.0</span> <span class="n">Q1</span> <span class="o">=</span> <span class="n">p_q</span> <span class="o">.*</span> <span class="n">v</span><span class="x">[</span><span class="n">i</span><span class="x">,:]</span> <span class="n">Q2</span> <span class="o">=</span> <span class="o">-</span><span class="mf">1.0</span><span class="o">*</span><span class="x">(</span><span class="n">ones</span><span class="x">(</span><span class="n">n</span><span class="x">)</span> <span class="o">.-</span> <span class="n">p_q</span><span class="x">)</span> <span class="o">.*</span> <span class="n">v</span><span class="x">[</span><span class="n">i</span><span class="x">,:]</span> <span class="n">v</span><span class="x">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="x">,:]</span> <span class="o">=</span> <span class="n">Q1</span> <span class="o">+</span> <span class="n">Q2</span> <span class="n">color_ratio</span><span class="x">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="x">]</span> <span class="o">=</span> <span class="n">sum</span><span class="x">((</span><span class="n">v</span><span class="x">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="x">,:]</span> <span class="o">.&gt;</span> <span class="mi">0</span><span class="x">)</span><span class="o">*</span><span class="mf">1.0</span><span class="x">)</span><span class="o">/</span><span class="n">n</span> <span class="k">else</span> <span class="n">v</span><span class="x">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="x">,:]</span> <span class="o">=</span> <span class="n">v</span><span class="x">[</span><span class="n">i</span><span class="x">,</span><span class="mi">1</span><span class="x">]</span><span class="o">*</span><span class="n">ones</span><span class="x">(</span><span class="n">n</span><span class="x">)</span> <span class="n">color_ratio</span><span class="x">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="x">]</span> <span class="o">=</span> <span class="n">v</span><span class="x">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="x">,</span><span class="mi">1</span><span class="x">]</span> <span class="k">end</span> <span class="k">end</span> <span class="k">return</span> <span class="n">color_ratio</span> <span class="k">end</span> </code></pre></div></div> <p>The above function may be called as follows for example:</p> <div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">c_ratio</span> <span class="o">=</span> <span class="n">mimesis</span><span class="x">(</span><span class="mi">10</span><span class="x">,</span> <span class="mi">100</span><span class="x">,</span> <span class="mf">0.1</span><span class="x">,</span><span class="mf">0.6</span><span class="x">);</span> </code></pre></div></div> <h2 id="analysis">Analysis:</h2> <h3 id="the-expected-number-of-neighbors">The expected number of neighbors:</h3> <p>If we denote the number of red vertices at instant <script type="math/tex">n</script> by <script type="math/tex">\alpha_n</script> and the number of blue vertices by <script type="math/tex">\beta_n</script> we may observe that the expected number of neighbors is given by:</p> <p>\begin{equation} \langle N(v_i \in R) \rangle = q \cdot (\alpha_n -1) + (1-q) \cdot \beta_n \end{equation}</p> <p>\begin{equation} \langle N(v_i \in B) \rangle = q \cdot (\beta_n -1) + (1-q) \cdot \alpha_n \end{equation}</p> <p>Using the above equations we may define:</p> <p>\begin{equation} \langle \alpha_{n+1} \rangle = \alpha_n \left( \frac{q \cdot (\alpha_n -1)}{q \cdot (\alpha_n -1) + (1-q) \cdot \beta_n} \right) + \beta_n \left(\frac{(1-q) \cdot \alpha_n}{q \cdot (\beta_n -1) + (1-q) \cdot \alpha_n} \right) \end{equation}</p> <p>and we may deduce that <script type="math/tex">\langle \beta_{n+1} \rangle = N - \langle \alpha_{n+1} \rangle</script>. It must be noted that this is an approximation that works particularly well when <script type="math/tex">% <![CDATA[ 0< q < 0.2 %]]></script> or when <script type="math/tex">% <![CDATA[ 0.8 \leq q < 1.0 %]]></script>.</p> <h3 id="alpha_n--beta_n-implies-that-limlimits_n-to-infty-langle-alpha_n-rangle--n"><script type="math/tex">\alpha_n > \beta_n</script> implies that <script type="math/tex">\lim\limits_{n \to \infty} \langle \alpha_n \rangle = N</script>:</h3> <p>Assuming that <script type="math/tex">q > 1-q</script>, a simple calculation shows that:</p> <p>\begin{equation} \langle \alpha_{n+1} \rangle - \alpha_n \geq 0 \iff \alpha_n \geq \beta_n \end{equation}</p> <p>and since:</p> <p>\begin{equation} \langle \alpha_{n+1} \rangle - \alpha_n= 0 \iff \beta_n = 0 \end{equation}</p> <p>we may deduce that:</p> <p>\begin{equation} \lim\limits_{n \to \infty} \langle \alpha_{n} \rangle = N \end{equation}</p> <h3 id="analysis-of-delta-alpha">Analysis of <script type="math/tex">\Delta \alpha</script>:</h3> <p>Using the fact that <script type="math/tex">\beta_n = N - \alpha_n</script> we may derive the following continuous-space variant of <script type="math/tex">\Delta \alpha_n = \langle \alpha_{n+1} \rangle - \alpha_n</script>:</p> <p>\begin{equation} \Delta \alpha(\alpha,q) = \frac{q \cdot (\alpha^2 -\alpha)}{2q \cdot \alpha + N \cdot (1-q) - \alpha - q} + \frac{(1-q) \cdot \alpha \cdot (N-\alpha)}{q \cdot N -2 \cdot q \cdot \alpha + \alpha - q} \end{equation}</p> <p>Assuming that <script type="math/tex">N</script> is a constant(ex. <script type="math/tex">N=100</script>) I obtained the following graph for various values of <script type="math/tex">q</script>:</p> <center><img src="https://raw.githubusercontent.com/Kepler-Lounge/blog_images/master/_images/evolution.png" width="75%" height="75%" align="middle" /></center> <p>It’s interesting to note that the curve doesn’t exactly behave in a symmetric manner, which is a little bit surprising. Specifically, the behavior of <script type="math/tex">\Delta \alpha</script> when <script type="math/tex">q=0.2</script> doesn’t resemble the behavior of <script type="math/tex">\Delta \alpha</script> when <script type="math/tex">q=0.8</script>.</p> <h2 id="generalising-to-the-case-of-more-than-two-beliefs">Generalising to the case of more than two beliefs:</h2> <p>After going through less satisfactory options, it occurred to me that in order to label <script type="math/tex">n</script> colors we may use the nth roots of unity:</p> <p>\begin{equation} S_n = \{e^{i \frac{2 k \pi}{n}} : k \in [0,n-1] \} \end{equation}</p> <p>In fact, we may note that <script type="math/tex">-1</script> and <script type="math/tex">+1</script> correspond to <script type="math/tex">S_2</script> and similarly we may define:</p> <p>\begin{equation} w_{ij}= v_i \cdot \bar{v_j} \end{equation}</p> <p>where <script type="math/tex">\bar{v_j}</script> indicates complex conjugation and <script type="math/tex">w_{ij}=1</script> implies that <script type="math/tex">v_i</script> and <script type="math/tex">v_j</script> have the same color.</p> <p>I believe these last two equations, in addition to everything else I shared here, are a sufficiently strong basis to model mimetic behavior in networks where there are a lot more than two competing belief systems.</p> <h3 id="note">Note:</h3> <p>The random graph colouring representation used here was done independently of the random graph colouring conventions established by other mathematicians [4,5]. But, I must credit the original authors for the idea of random graph colouring as I probably wouldn’t have thought of using random graph colouring to model mimetic behaviour if combinatorialists didn’t discuss this idea with me in the past. For an online discussion of the differences between my representation and the conventional random graph coloring representations, I recommend the reader to visit <a href="https://mathoverflow.net/questions/330375/comparison-of-random-graph-colouring-representations">the following MathOverflow question</a>.</p> <h2 id="references">References:</h2> <ol> <li>Chris Lynn &amp; Dani Bassett. The physics of brain network structure, function and control. 2019.</li> <li>Réka Albert &amp; Albert-László Barabási. Statistical mechanics of complex networks. 2002.</li> <li>René Girard. Le Bouc émissaire. 1986.</li> <li>P. Erdös and A. Rényi. On the evolution of random graphs. 1960.</li> <li>G. Grimmett and C. McDiarmid. On colouring random graphs. 1975.</li> </ol>Aidan RockeMotivation:Probability in High Dimension Part II2019-04-21T00:00:00+00:002019-04-21T00:00:00+00:00/probability/2019/04/21/high-dimension-prob-2<h2 id="motivation">Motivation:</h2> <p>A couple weeks ago I was working on a problem that involved the expected value of a ratio of two random variables:</p> <p>\begin{equation} \mathbb{E}\big[\frac{X_n}{Z_n}\big] \approx \frac{\mu_{X_n}}{\mu_{Z_n}} - \frac{\mathrm{Cov}(X_n,Z_n)}{\mu_{Z_n}^2} + \frac{\mathrm{Var(Z_n)}\mu_{X_n}}{\mu_{Z_n}^3} \end{equation}</p> <p>where <script type="math/tex">Z_n</script> was defined as a sum of <script type="math/tex">n</script> i.i.d. random variables with a symmetric distribution centred at zero.</p> <p>Everything about this approximation worked fine in computer simulations where <script type="math/tex">n</script> was large but mathematically there appeared to be a problem since:</p> <p>\begin{equation} \mathbb{E}\big[Z_n\big] = 0 \end{equation}</p> <p>Given that (2) didn’t appear to be an issue in simulation, I went through the code several times to check whether there was an error but found none. After thinking about the problem for a bit longer it occurred to me formalise the problem and analyse:</p> <p>\begin{equation} P(\sum_{n=1}^N a_n = 0) \end{equation}</p> <p>where <script type="math/tex">a_n</script> are i.i.d. random variables with a uniform distribution centred at zero so <script type="math/tex">\mathbb{E}[a_i]=0</script>. We may think of this as a measure-theoretic phenomenon in high-dimensional spaces where <script type="math/tex">N \in \mathbb{N}</script> is our dimension and <script type="math/tex">\vec{a} \in \mathbb{R}^N</script> is a random vector.</p> <p>Now, while in a <a href="https://keplerlounge.com/probability/2019/04/20/high-dimension-prob-1.html">previous article</a> I analysed this problem as an infinite series for the special case of <script type="math/tex">a_i \sim \mathcal{U}(\{-1,1\})</script>, for the more general case of <script type="math/tex">a_i \sim \mathcal{U}([-N,N])</script> where <script type="math/tex">[-N,N] \subset \mathbb{Z}</script> it occurred to me that modelling this problem as a random walk on <script type="math/tex">\mathbb{Z}</script> might be an effective approach.</p> <h2 id="a-random-walk-on-mathbbz">A random walk on <script type="math/tex">\mathbb{Z}</script>:</h2> <p>Let’s suppose <script type="math/tex">a_i \sim \mathcal{U}([-N,N])</script> where <script type="math/tex">[-N,N] \subset \mathbb{Z}</script>. We may then define:</p> <p>\begin{equation} S_n = \sum_{i=1}^n a_i \end{equation}</p> <p>Due to the i.i.d. assumption we have:</p> <p>\begin{equation} \mathbb{E}\big[S_n\big]= n \cdot \mathbb{E}\big[a_i\big]=0 \end{equation}</p> <p>We may now define:</p> <p>\begin{equation} u_n = P(S_n=0) \end{equation}</p> <p>and ask whether <script type="math/tex">u_n</script> is decreasing. In other words, what is the probability that we observe the expected value as <script type="math/tex">n</script> becomes large?</p> <h2 id="small-and-large-deviations">Small and Large deviations:</h2> <p>It’s useful to observe the following nested structure:</p> <p>\begin{equation} \forall k \in [0,N], \{\lvert S_n \rvert \leq k\} \subset \{\lvert S_n \rvert \leq k+1 \} \end{equation}</p> <p>From (7), we may deduce that:</p> <p>\begin{equation} P(\lvert S_n \rvert \leq N) + P(\lvert S_n \rvert &gt; N) = 1 \end{equation}</p> <p>So we are now ready to define the probability of a ‘small’ deviation:</p> <p>\begin{equation} \alpha_n = P(\lvert S_n \rvert \leq N) \end{equation}</p> <p>as well as the probability of ‘large’ deviations:</p> <p>\begin{equation} \beta_n = P(\lvert S_n \rvert &gt; N) \end{equation}</p> <p>Additional motivation for analysing <script type="math/tex">\alpha_n</script> and <script type="math/tex">\beta_n</script> arises from:</p> <p>\begin{equation} P(S_{n+1}| \lvert S_n \rvert &gt; N) = 0 \end{equation}</p> <p>\begin{equation} P(S_{n+1}| \lvert S_n \rvert \leq N) = \frac{1}{2N+1} \end{equation}</p> <p>Furthermore, by the law of total probability we have:</p> <p>\begin{equation} \begin{split} P(S_{n+1}) &amp; = \sum_{i=1}^N P(S_{n+1}|\lvert S_n \rvert \leq N) \cdot P(\lvert S_n \rvert \leq N) + P(S_{n+1}|\lvert S_n \rvert &gt; N) \cdot P(\lvert S_n \rvert &gt; N) \\ &amp; = P(S_{n+1}|\lvert S_n \rvert \leq N) \cdot P(\lvert S_n \rvert \leq N) \\ &amp; = \frac{P(\lvert S_n \rvert \leq N)}{2N+1} \end{split} \end{equation}</p> <h2 id="a-remark-on-symmetry">A remark on symmetry:</h2> <p>It’s useful to note the following alternative definitions of <script type="math/tex">\alpha_n</script> and <script type="math/tex">\beta_n</script> that emerge due to symmetries intrinsic to the problem:</p> <p>\begin{equation} \beta_n = P(\lvert S_n \rvert &gt; N) = 2 \cdot P(S_n &gt; N) = 2 \cdot P(S_n &lt; -N) \end{equation}</p> <p>\begin{equation} \alpha_n = P(\lvert S_n \rvert \leq N) = 1-2 \cdot P(S_n &gt; N)=1-2 \cdot P(S_n &lt; -N) \end{equation}</p> <h2 id="the-case-of-n1-and-n2">The case of <script type="math/tex">n=1</script> and <script type="math/tex">n=2</script>:</h2> <p>Given that <script type="math/tex">S_0=0</script>:</p> <p>\begin{equation} P(S_1=0)=\frac{P(\lvert S_0 \rvert \leq N)}{2N+1}= \frac{1}{2N+1} \end{equation}</p> <p>As for the case of <script type="math/tex">n=2</script>:</p> <p>\begin{equation} P(\lvert S_2 \rvert \leq N) =1 \implies P(S_2=0) = \frac{1}{2N+1} \end{equation}</p> <h2 id="the-case-of-n3">The case of <script type="math/tex">n=3</script>:</h2> <p>The case of <script type="math/tex">n=3</script> requires that we calculate:</p> <p>\begin{equation} P(S_3=0)=\frac{P(\lvert S_2 \rvert \leq N)}{2N+1}= \frac{1}{2N+1} \end{equation}</p> <p>\begin{equation} \begin{split} P(S_{2} &gt; N) &amp; = P(S_{2}| S_1 = i) \cdot P( S_1 = i) \\ &amp; = \frac{1}{2N+1} \sum_{i=1}^N (\frac{1}{2N+1} + … + \frac{N}{2N+1}) \\ &amp; = \frac{N \cdot (N-1)}{2 \cdot (2N+1)^2} \end{split} \end{equation}</p> <p>and using (19) we may derive <script type="math/tex">P(S_{2} \leq N)</script>:</p> <p>\begin{equation} \begin{split} P(S_{2} \leq N) &amp; = 1 - 2 \cdot P(S_{2} &gt; N) \\ &amp; = 1- \frac{N \cdot (N-1)}{(2N+1)^2} \\ &amp; = \frac{3N^2+5N+1}{(2N+1)^2} \sim \frac{3}{4} \end{split} \end{equation}</p> <p>and so for <script type="math/tex">n=3</script> we have:</p> <p>\begin{equation} \begin{split} P(S_{3} = 0) &amp; = P(S_{3} = 0 | \lvert S_2 \rvert \leq N) \cdot P(\lvert S_2 \rvert \leq N) \\ &amp; = \frac{3N^2+5N+1}{(2N+1)^3} \sim \frac{3}{8N} \end{split} \end{equation}</p> <h2 id="average-drift-or-why-ps_nk--ps_n--k1">Average drift or why <script type="math/tex">P(S_n=k) > P(S_n = k+1)</script>:</h2> <p>It’s useful to note that we may decompose <script type="math/tex">n</script> into:</p> <p>\begin{equation} n = \hat{n} + n_z \end{equation}</p> <p>where <script type="math/tex">\hat{n}</script> represents the total number of positive and negative terms, ignoring the null contribution of zero terms <script type="math/tex">n_z</script>.</p> <p>For the above reason, it’s convenient to decompose <script type="math/tex">S_n</script> into:</p> <p>\begin{equation} S_n = S_n^+ + S_n^{-} \end{equation}</p> <p>where <script type="math/tex">S_n^+</script> defines the sum of the positive terms and <script type="math/tex">S_n^{-}</script> defines the sum of the negative terms.</p> <p>By grouping the terms in the manner of (23) we may observe that when <script type="math/tex">\hat{n}</script> is large the average positive/negative step length is given by:</p> <p>\begin{equation} \Delta = \frac{N}{2} \end{equation}</p> <p>so that if <script type="math/tex">\tau</script> positive steps and <script type="math/tex">\hat{n}-\tau</script> negative steps are taken:</p> <p>\begin{equation} \mathbb{E}[S_n^+] = \tau \cdot \Delta \end{equation}</p> <p>\begin{equation} \mathbb{E}[S_n^-] = (\hat{n}-\tau) \cdot (-\Delta) \end{equation}</p> <p>\begin{equation} \mathbb{E}[S_n] = \mathbb{E}[S_n^+] + \mathbb{E}[S_n^-] = \Delta \cdot (2\tau-\hat{n}) \end{equation}</p> <p>and we note that:</p> <p>\begin{equation} \mathbb{E}[S_n] \geq 0 \implies \tau \geq \lfloor \frac{\hat{n}}{2} \rfloor \end{equation}</p> <p>Furthermore, due to symmetry:</p> <p>\begin{equation} P(\lvert S_n \rvert =k) &gt; P(\lvert S_n \rvert =k+1) \iff P(S_n =k) &gt; P(S_n =k+1) \end{equation}</p> <p>so it suffices to demonstrate <script type="math/tex">P(S_n =k) > P(S_n =k+1)</script>.</p> <p>In order to proceed with our demonstration we choose <script type="math/tex">\tau \in [\lfloor \frac{\hat{n}}{2} \rfloor + 1,\hat{n}N-1]</script> and find that <script type="math/tex">P</script> has a monotone relationship with the binomial distribution:</p> <p>\begin{equation} P(S_{n} = \lfloor \Delta \cdot (2\tau-\hat{n}) \rfloor) \propto {\hat{n} \choose \tau} \frac{1}{2^{\hat{n}}} \end{equation}</p> <p>where <script type="math/tex">\tau \geq \lfloor \frac{\hat{n}}{2} \rfloor</script> implies that:</p> <p>\begin{equation} \forall k \geq 0, \frac{P(S_n=k)}{P(S_n=k+1)} \sim \frac{(\tau+1)!(\hat{n}-\tau-1)!}{\tau!(\hat{n}-\tau)!} = \frac{\tau+1}{\hat{n}-\tau} &gt; 1 \end{equation}</p> <p>which holds for all <script type="math/tex">n_z \leq n</script>.</p> <p><strong>Note:</strong> I wrote a <a href="https://gist.github.com/AidanRocke/a4898097ce572bc8bc5a977fcbda6ed8">julia function that provides experimental evidence</a> for equation (30).</p> <h2 id="proof-that-u_n-is-decreasing">Proof that <script type="math/tex">u_n</script> is decreasing:</h2> <p>Given (13) we may derive the following ratio:</p> <p>\begin{equation} \frac{u_{n+1}}{u_n} = \frac{P(\lvert S_n \rvert \leq N)}{(2N+1) \cdot P(S_n = 0)} \end{equation}</p> <p>So in order to prove that <script type="math/tex">u_n</script> is decreasing we must show that:</p> <p>\begin{equation} P(\lvert S_n \rvert \leq N) &lt; (2N+1) \cdot P(S_n=0) \end{equation}</p> <p>and we note that this follows immediately from (31) since:</p> <p>\begin{equation} P(\lvert S_n \rvert \leq N) = 2 \sum_{k=1}^N P(S_n=k) + P(S_n=0) &lt; (2N+1) \cdot P(S_n=0) \end{equation}</p> <h2 id="proof-that-limlimits_n-to-infty-u_n--limlimits_n-to-infty-alpha_n--0">Proof that <script type="math/tex">\lim\limits_{n \to \infty} u_n = \lim\limits_{n \to \infty} \alpha_n = 0</script>:</h2> <p>Now, given (34) we may define:</p> <p>\begin{equation} \forall N \in \mathbb{N}, q_n = \frac{P(\lvert S_n \rvert \leq N)}{(2N+1)P(S_n=0)} &lt; 1 \end{equation}</p> <p>We may easily show that <script type="math/tex">q_n</script> is decreasing and therefore:</p> <p>\begin{equation} \lim_{n \to \infty} \frac{P(S_{n+1}=0)}{P(S_1=0)} = \prod_{n=1}^\infty \frac{P(S_{n+1}=0)}{P(S_n=0)} = \prod_{n=1}^\infty q_n = 0 \end{equation}</p> <p>so we may deduce that <script type="math/tex">u_n</script> decreases exponentially fast and that:</p> <p>\begin{equation} \lim_{n \to \infty} u_n = \lim_{n \to \infty} P(S_{n+1}=0) = \frac{0}{2N+1}=0 \end{equation}</p> <p>Likewise, given that:</p> <p>\begin{equation} \alpha_n = P(\lvert S_n \rvert \leq N) = (2N+1) \cdot P(S_{n+1}=0) \end{equation}</p> <p>we may conclude that large deviations are exponentially more likely as <script type="math/tex">n</script> becomes large:</p> <p>\begin{equation} \lim_{n \to \infty} \alpha_n = \lim_{n \to \infty} (2N+1) \cdot P(S_{n+1}=0) = 0 \end{equation}</p> <p>\begin{equation} \lim_{n \to \infty} \beta_n = \lim_{n \to \infty} P(\lvert S_n \rvert &gt; N) = \lim_{n \to \infty} 1 - \alpha_n = 1 \end{equation}</p> <p>One interpretation of the last two limits is that the mass of the discrete hypercube moves away from the centre and towards the corners which is a concentration-of-measure phenomenon.</p> <h2 id="discussion">Discussion:</h2> <p>I find it quite surprising that random structures, in this case a random walk, are useful for analysing high-dimensional systems. Indeed, I have to say that for such a general result thirty four equations isn’t much. But, what about the case of uniform distributions on closed intervals of the form <script type="math/tex">[-N,N] \subset \mathbb{R}</script>?</p> <p>It’s useful to note that <script type="math/tex">[-N,N]^n \subset \mathbb{R}^n</script> defines the hypercube with volume <script type="math/tex">(2N)^n</script> and I suspect that in the continuous setting, hypercube geometry and convex analysis might be particularly insightful.</p>Aidan RockeMotivation: