How does a global code emerge from local codes derived from MaxEnt inference given that a global encoding is necessary for the emergence of synchronization in large neural networks?

Let’s suppose we would like to analyse information loss, or efficient
coding, in the construction of global codes from local codes within
UMAP. How might we characterize such a measure as an *Information
Distance* between pairs of nodes in the original k-NN graph? If so,
what are its properties?

As we shall show, this distance has an ultrametric property that
allows us to characterize signal attenuation within sparse embeddings of
complex networks. If we model Synesthesia using Efficient Coding, and
model Synesthesia using Manifold Learning algorithms such as UMAP, this
allows us to address a fundamental problem in theoretical neuroscience
known as *Brette’s Dilemma*:

How do global codes emerge from local codes given that a global encoding is necessary for the emergence of synchronization in large neural networks?

The theoretical constraints on this problem are fundamental for addressing how the notion of time emerges within biological organisms.

If we define \(B_1(x) \subset V\) as the nearest neighbors of \(x \in V\), we may assert that:

\[\begin{equation} 1_{B_1(x)}(y)= \begin{cases} 1, y \in B_1(x)\\ 0, y \notin B_1(x)\\ \tag{1} \end{cases} \end{equation}\]

where \(y \in B_1(x)\) implies that \(y\) is one degree of separation from \(x\).

Moreover, if we observe that the graph \(G = (V,E)\) is naturally partitioned into pair-wise disjoint components:

\[\begin{equation} V = \bigcup_{i=1}^n V_i \tag{2} \end{equation}\]

where \(V_i \bigcap V_{j \neq i} = \emptyset\), then we may define the degree of separation measure \(l(\cdot,\cdot)\) between pairs of nodes using Dijkstra’s algorithm so that:

\[\begin{equation} \forall x,y \in V_i, l(x,y) \leq |V_i| \tag{3} \end{equation}\]

\[\begin{equation} \forall x \in V_i \forall y \in V_{j \neq i}, l(x,y) = \infty \tag{4} \end{equation}\]

More generally, \(\forall x,y \in V, l(x,y)\) may be defined using an all-pairs shortest path algorithm such as the Floyd-Warshall algorithm.

Given our constructive approach to the degree of separation measure we may extend our definition of ball as follows:

\[\begin{equation} B_n(x) := \{v \in V: l(x,v) \leq n \} \tag{5} \end{equation}\]

as well as its boundary:

\[\begin{equation} \partial B_n(x) := \{v \in V: l(x,v) = n \} \tag{6} \end{equation}\]

Hence, we may note that the balls are nested:

\[\begin{equation} B_n(x) \subset B_{n+1}(x) \tag{7} \end{equation}\]

\[\begin{equation} \partial B_n(x) \bigcap \partial B_{n+1}(x) = \emptyset \tag{8} \end{equation}\]

which implies a natural hierarchical structure.

Given this tree-like construction of nested balls, it is natural to ask whether we may model signal attenuation between pairs of nodes using an information-theoretic distance measure. How might we construct such a measure?

If we consider the probability \(P(x_i \rightarrow x_{i+1})\) to be an approximation of Algorithmic Probability, then the minimal length of the program \(f\) that satisfies:

\[\begin{equation} f \circ x_i = x_{i+1} \tag{9} \end{equation}\]

\[\begin{equation} f \circ x_{i+1} = x_i \tag{10} \end{equation}\]

\[\begin{equation} P(f \circ x_i = x_{i+1}) = P(f \circ x_{i+1} = x_i) = P(x_i \rightarrow x_{i+1}) \tag{11} \end{equation}\]

is given by Levin’s Coding theorem:

\[\begin{equation} -\log_2 P(x_i \rightarrow x_{i+1}) = K_U(f) - \mathcal{O}(1) \tag{12} \end{equation}\]

Hence, the Universal Cognitive Distance measure of Charles Bennet,
Peter Gács, Ming Li, Paul Vitanyí and Wojcech Zurech which they
formulate as the *Information Distance*:

\[\begin{equation} ID(x,y) = \min \{|p|: p(x) = y \land p(y) = x\} \tag{13} \end{equation}\]

simplifies to:

\[\begin{equation} ID(x_i,x_{i+1}) = -\log_2 P(x_i \rightarrow x_{i+1}) + \mathcal{O}(1) \tag{14} \end{equation}\]

where the constant term may be ignored as it doesn’t depend upon the edge \(x_i x_{i+1} \in G\). Furthermore, since each edge is either present or absent:

\[\begin{equation} \forall x_i, x_{i+1} \in V, ID(x_i,x_{i+1}) = \begin{cases} 0, x_i x_{i+1} \in G\\ \infty, x_i x_{i+1} \notin G\\ \tag{14} \end{cases} \end{equation}\]

which may be made precise by taking limits.

This definition may be naturally extended to any pair of nodes \(\alpha, \beta \in B_n(x)\):

\[\begin{equation} \forall \alpha, \beta \in B_n(x), d(\alpha,\beta) := \min \big(-\log_2 \prod_{i=0}^{\eta-1} P(x_i \rightarrow x_{i+1}) \big) \tag{15} \end{equation}\]

where \(x_0 = \alpha, x_{\eta} = \beta\) and \(l(\alpha,\beta) \leq \eta \leq |B_n(x)|\).

We may note that this distance measure is well-defined as \(\alpha\) and \(\beta\) are part of the same component and when they are not we adopt the convention:

\[\begin{equation} \forall \alpha, \beta \in V, l(\alpha,\beta) = \infty \implies d(\alpha,\beta) = \infty \tag{16} \end{equation}\]

Finally, if we assume that k-NN and sphere packing are locally equivalent then for the nearest neighbor parameter \(k \sim \log n\) [3]:

\[\begin{equation} P(x_i \rightarrow x_{i+1}) = \begin{cases} 1, x_{i+1} \in B_1(x_i)\\ 0, x_{i+1} \notin B_1(x_i)\\ \tag{17} \end{cases} \end{equation}\]

then we may expect the following Ultrametric property on the boundary of any ball \(B_n(x)\) satisfying \(|\partial B_n(x)| \geq 2\):

\[\begin{equation} \forall y,z \in \partial B_n(x), d(x,y) \sim d(x,z) \iff d(x,y) \lesssim \max \{d(x,z),d(y,z)\} \tag{18} \end{equation}\]

which means that the strength of the signal decays smoothly.

The theoretical limits we have identified for UMAP provide us with a framework for further experimental analysis.

L. McInnes, J. Healy, and J. Melville. UMAP: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, 2018.

Rocke (2023, May 12). Kepler Lounge: The Law of Conservation of Information within UMAP. Retrieved from keplerlounge.com

George C. Linderman, Gal Mishne, Yuval Kluger, Stefan Steinerberger. Randomized Near Neighbor Graphs, Giant Components, and Applications in Data Science. 2017.

Friston, K. The free-energy principle: a unified brain theory?. Nat Rev Neurosci 11, 127–138 (2010). https://doi.org/10.1038/nrn2787

Lay Kuan Loh & Mihovil Bartulovic. Efficient Coding Hypothesis and an Introduction to Information Theory. 2014.

For attribution, please cite this work as

Rocke (2023, May 27). Kepler Lounge: Efficient Coding and Signal Attenuation within UMAP. Retrieved from keplerlounge.com

BibTeX citation

@misc{rocke2023efficient, author = {Rocke, Aidan}, title = {Kepler Lounge: Efficient Coding and Signal Attenuation within UMAP}, url = {keplerlounge.com}, year = {2023} }