Kepler Lounge: Efficient Coding and Signal Attenuation within UMAP

Aidan Rocke

Efficient Coding within UMAP:

Let’s suppose we would like to analyse information loss, or efficient coding, in the construction of global codes from local codes within UMAP. How might we characterize such a measure as an Information Distance between pairs of nodes in the original k-NN graph? If so, what are its properties?

As we shall show, this distance has an ultrametric property that allows us to characterize signal attenuation within sparse embeddings of complex networks. If we model Synesthesia using Efficient Coding, and model Synesthesia using Manifold Learning algorithms such as UMAP, this allows us to address a fundamental problem in theoretical neuroscience known as Brette’s Dilemma:

How do global codes emerge from local codes given that a global encoding is necessary for the emergence of synchronization in large neural networks?

The theoretical constraints on this problem are fundamental for addressing how the notion of time emerges within biological organisms.

Nearest Neighbors and Degrees of Separation:

If we define \(B_1(x) \subset V\) as the nearest neighbors of \(x \in V\), we may assert that:

\[\begin{equation} 1_{B_1(x)}(y)= \begin{cases} 1, y \in B_1(x)\\ 0, y \notin B_1(x)\\ \tag{1} \end{cases} \end{equation}\]

where \(y \in B_1(x)\) implies that \(y\) is one degree of separation from \(x\).

Moreover, if we observe that the graph \(G = (V,E)\) is naturally partitioned into pair-wise disjoint components:

\[\begin{equation} V = \bigcup_{i=1}^n V_i \tag{2} \end{equation}\]

where \(V_i \bigcap V_{j \neq i} = \emptyset\), then we may define the degree of separation measure \(l(\cdot,\cdot)\) between pairs of nodes using Dijkstra’s algorithm so that:

\[\begin{equation} \forall x,y \in V_i, l(x,y) \leq |V_i| \tag{3} \end{equation}\]

\[\begin{equation} \forall x \in V_i \forall y \in V_{j \neq i}, l(x,y) = \infty \tag{4} \end{equation}\]

More generally, \(\forall x,y \in V, l(x,y)\) may be defined using an all-pairs shortest path algorithm such as the Floyd-Warshall algorithm.

Hierarchical structure and the Information Distance:

Given our constructive approach to the degree of separation measure we may extend our definition of ball as follows:

\[\begin{equation} B_n(x) := \{v \in V: l(x,v) \leq n \} \tag{5} \end{equation}\]

as well as its boundary:

\[\begin{equation} \partial B_n(x) := \{v \in V: l(x,v) = n \} \tag{6} \end{equation}\]

Hence, we may note that the balls are nested:

\[\begin{equation} B_n(x) \subset B_{n+1}(x) \tag{7} \end{equation}\]

\[\begin{equation} \partial B_n(x) \bigcap \partial B_{n+1}(x) = \emptyset \tag{8} \end{equation}\]

which implies a natural hierarchical structure.

Given this tree-like construction of nested balls, it is natural to ask whether we may model signal attenuation between pairs of nodes using an information-theoretic distance measure. How might we construct such a measure?

If we consider the probability \(P(x_i \rightarrow x_{i+1})\) to be an approximation of Algorithmic Probability, then the minimal length of the program \(f\) that satisfies:

\[\begin{equation} f \circ x_i = x_{i+1} \tag{9} \end{equation}\]

\[\begin{equation} f \circ x_{i+1} = x_i \tag{10} \end{equation}\]

\[\begin{equation} P(f \circ x_i = x_{i+1}) = P(f \circ x_{i+1} = x_i) = P(x_i \rightarrow x_{i+1}) \tag{11} \end{equation}\]

is given by Levin’s Coding theorem:

\[\begin{equation} -\log_2 P(x_i \rightarrow x_{i+1}) = K_U(f) - \mathcal{O}(1) \tag{12} \end{equation}\]

Hence, the Universal Cognitive Distance measure of Charles Bennet, Peter Gács, Ming Li, Paul Vitanyí and Wojcech Zurech which they formulate as the Information Distance:

\[\begin{equation} ID(x,y) = \min \{|p|: p(x) = y \land p(y) = x\} \tag{13} \end{equation}\]

simplifies to:

\[\begin{equation} ID(x_i,x_{i+1}) = -\log_2 P(x_i \rightarrow x_{i+1}) + \mathcal{O}(1) \tag{14} \end{equation}\]

where the constant term may be ignored as it doesn’t depend upon the edge \(x_i x_{i+1} \in G\). Furthermore, since each edge is either present or absent:

\[\begin{equation} \forall x_i, x_{i+1} \in V, ID(x_i,x_{i+1}) = \begin{cases} 0, x_i x_{i+1} \in G\\ \infty, x_i x_{i+1} \notin G\\ \tag{14} \end{cases} \end{equation}\]

which may be made precise by taking limits.

This definition may be naturally extended to any pair of nodes \(\alpha, \beta \in B_n(x)\):

\[\begin{equation} \forall \alpha, \beta \in B_n(x), d(\alpha,\beta) := \min \big(-\log_2 \prod_{i=0}^{\eta-1} P(x_i \rightarrow x_{i+1}) \big) \tag{15} \end{equation}\]

where \(x_0 = \alpha, x_{\eta} = \beta\) and \(l(\alpha,\beta) \leq \eta \leq |B_n(x)|\).

We may note that this distance measure is well-defined as \(\alpha\) and \(\beta\) are part of the same component and when they are not we adopt the convention:

\[\begin{equation} \forall \alpha, \beta \in V, l(\alpha,\beta) = \infty \implies d(\alpha,\beta) = \infty \tag{16} \end{equation}\]

Ultrametricity:

Finally, if we assume that k-NN and sphere packing are locally equivalent then for the nearest neighbor parameter \(k \sim \log n\) [3]:

\[\begin{equation} P(x_i \rightarrow x_{i+1}) = \begin{cases} 1, x_{i+1} \in B_1(x_i)\\ 0, x_{i+1} \notin B_1(x_i)\\ \tag{17} \end{cases} \end{equation}\]

then we may expect the following Ultrametric property on the boundary of any ball \(B_n(x)\) satisfying \(|\partial B_n(x)| \geq 2\):

\[\begin{equation} \forall y,z \in \partial B_n(x), d(x,y) \sim d(x,z) \iff d(x,y) \lesssim \max \{d(x,z),d(y,z)\} \tag{18} \end{equation}\]

which means that the strength of the signal decays smoothly.

Discussion:

The theoretical limits we have identified for UMAP provide us with a framework for further experimental analysis.

References:

L. McInnes, J. Healy, and J. Melville. UMAP: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, 2018.
Rocke (2023, May 12). Kepler Lounge: The Law of Conservation of Information within UMAP. Retrieved from keplerlounge.com
George C. Linderman, Gal Mishne, Yuval Kluger, Stefan Steinerberger. Randomized Near Neighbor Graphs, Giant Components, and Applications in Data Science. 2017.
Friston, K. The free-energy principle: a unified brain theory?. Nat Rev Neurosci 11, 127–138 (2010). https://doi.org/10.1038/nrn2787
Lay Kuan Loh & Mihovil Bartulovic. Efficient Coding Hypothesis and an Introduction to Information Theory. 2014.

Efficient Coding and Signal Attenuation within UMAP