The Law of Conservation of Information within UMAP

In the following analysis we demonstrate that information symmetrization within UMAP is designed to satisfy the Law of Conservation of Information, a universal principle that underlies Occam’s razor.

Aidan Rocke


Let’s suppose we have a Random Geometric Graph that is invariant to a group of geometric transformations. Without loss of generality, let’s suppose the Geometric Invariant that concerns us is Conformal Invariance. If each edge has two possible orientations and Boolean values:

\[\begin{equation} P(i \rightarrow j), P(j \rightarrow i) \in \{0,1\} \end{equation}\]

what is the simplest logical operation that may be applied to each edge so we may partition this graph into a collection of undirected components without losing geometric information?

As we shall demonstrate in this analysis, the UMAP procedure of edge symmetrization is a method for information symmetrization in the sense that it is formally equivalent to the logical or operation:

\[\begin{equation} P_{ij} = P(i \rightarrow j) \text{ or } P(j \rightarrow i) \end{equation}\]

which allows a global encoding of vertices to emerge via local codes of length \(\sim \log_2 k\), and is designed to satisfy Von Neumann’s Law of Conservation of Information. Hence, it satisfies the desired criteria.

k-NN as local sphere packing:

If we were to pack a maximum of \(N\) balls with volume \(V_s\) into the volume \(V_d \gg V_s\) we may define the a priori probability:

\[\begin{equation} P(\text{ball}) := \frac{V_s}{V_d} \sim \frac{1}{N} \tag{1} \end{equation}\]

In particular, if the balls are indexed in \([1,N]\) the average code length for a ball sampled uniformly from this dense set is given by:

\[\begin{equation} -\log_2 P(\text{ball}) \sim \log_2 N \tag{2} \end{equation}\]

It follows that if we consider k-NN to perform local sphere packing on a finite metric space then in the worst case each data point may be encoded with \(\sim \log_2 k\) bits. As this code may be specified with a fraction \(\frac{\log_2 k}{k}\) of the \(k\) neighbors, it is sufficient to encode a vertex \(v_i\) with \(\log_2 k\) Bernoulli variables \(P_{ij}\) which tells us whether the 1-simplex \(v_i \rightarrow v_j\) exists or not:

\[\begin{equation} \sum_{j \in \mathcal{B}} P_{ij} \approx \log_2 k \tag{3} \end{equation}\]

where \(\lvert \mathcal{B} \rvert \approx \log_2 k\).

From an information-theoretic perspective, each \(P_{ij}\) explicitly answers one question and therefore accounts for one bit of information. Hence, we ought to think of the units of \(P_{ij}\) as bits of quantized information and not probabilities.

Moreover, it may be shown that this defines an \(\epsilon\)-regular partition so this approach to defining a local encoding of feature vectors satisfies the Szemerédi regularity lemma.

Graph partitioning via symmetrization:

What we have so far ignored is the question of how we go from directed graphs of size \(k \in \mathbb{N}\) in the k-NN graph to undirected graphs whose edges are represented by symmetric Bernoulli variables:

\[\begin{equation} P_{ij} = P_{ji} \tag{4} \end{equation}\]

How is this accomplished? To avoid adding non-existent edges or removing directed edges, ie to conserve information in the process of constructing a global code, it is necessary and sufficient to use the logical or operation:

\[\begin{equation} P_{ij} = P(i \rightarrow j) \text{ or } P(j \rightarrow i) \tag{5} \end{equation}\]

where measurements of \(P(i \rightarrow j)\) and \(P(j \rightarrow i)\) behave like Boolean variables.

It follows that the simplest arithmetic representation of (5) is given by the indicator function for the union of event spaces:

\[\begin{equation} 1(A \cup B) = 1_A + 1_B - 1_A \cdot 1_B \tag{6} \end{equation}\]

which is equivalent to:

\[\begin{equation} P_{ij} = P(i \rightarrow j) + P(j \rightarrow i) - P(j \rightarrow i) \cdot P(i \rightarrow j) \tag{7} \end{equation}\]

In fact, upon closer inspection we find that (7) is a necessary consequence of the natural correspondence between the Law of Conservation of Information and the entropy of functions of two random variables:

\[\begin{equation} H(f(X,Y)) = H(X) + H(Y) - MI(X,Y) \tag{8} \end{equation}\]

where (8) has an elementary proof via Venn diagrams.


The Law of Conservation of Information is a Universal principle that is equivalent to the statement that the Von Neumann entropy is invariant to Unitary transformations. While its true meaning emerges within the context of the Universal Wave Function, it is applicable to systems of all shapes and sizes due to the compositional structure of the Observable Universe.

One of the lesser-known implications of the Law of Conservation of Information is that the Second Law of Thermodynamics and the Arrow of Time may not be fundamental. It may actually be a consequence of the information-theoretic language we use to develop the theory and quantify its theoretical implications. This particular conjecture has been carefully developed by Ricard Solé and his collaborators who found that a theory of Open-Ended Evolution formulated using Kolmogorov Complexity satisfies the Law of Conservation of Information [2]. On the other hand, an identical theory formulated using Shannon Entropy implies universal information loss.

For a brief overview of the Law of Conservation of Information, the author may recommend an elementary derivation using matrix algebra [3].


  1. Peter Grünwald and Paul Vitanyí. Shannon Information and Kolmogorov Complexity. Arxiv. 2004.
  2. Bernat Corominas-Murtra, Luís F. Seoane and Ricard Solé. Zipf’s Law, unbounded complexity and open-ended evolution. The Royal Society. 2018.
  3. Rocke (2022, Jan. 3). Kepler Lounge: The Law of Conservation of Information. Retrieved from


For attribution, please cite this work as

Rocke (2023, May 12). Kepler Lounge: The Law of Conservation of Information within UMAP. Retrieved from

BibTeX citation

  author = {Rocke, Aidan},
  title = {Kepler Lounge: The Law of Conservation of Information within UMAP},
  url = {},
  year = {2023}