# Almost all simple graphs are small world networks

## Introduction:

Two days ago, while thinking about brain networks, it occurred to me that almost all simple graphs are small world networks in the sense that if \(G_N\) is a simple graph with \(N\) nodes sampled from the Erdös-Rényi random graph distribution with probability half then when \(N\) is large:

\begin{equation} \mathbb{E}[d(v_i,v_j)] \leq \log_2 N \end{equation}

My strategy for proving this was to show that when \(N\) is large, \(\forall v_i \in G_N\) there exists a chain of distinct nodes of length \(\log_2 N\) originating from \(v_i\) almost surely. This implies that:

\begin{equation} \forall v_i, v_j \in G_N, d(v_i,v_j) \leq \log_2 N \end{equation}

almost surely when \(N\) is large.

Now, by using the above method of proof I managed to show that almost all simple graphs are *very small* in the sense that:

\begin{equation} \mathbb{E}[d(v_i,v_j)] \leq \log_2\log_2 N \end{equation}

when \(N\) tends to infinity. We can actually do even better.

Using my proof that almost all simple graphs are connected, I can show that almost all simple graphs have diameter 2. However, I think there is more value in going through my original proof which in my opinion provides greater insight into the problem.

## Degrees of separation and the neighborhood of a node:

We may think of degrees of separation as a sequence of ‘hops’ between the neighborhoods of distinct nodes \(v_i\). Given a node \(v_i\) we may define \(\mathcal{N}(v_i)\) as follows:

\begin{equation} \mathcal{N}(v_i) = \{v_j \in G_N: \overline{v_i v_j} \in G_N \} \end{equation}

where \(G_N = (V,E)\) is a graph with \(N\) nodes.

Now, given the E-R model we can say that \(v_i \neq v_j\) implies:

\begin{equation} P(v_k \notin \mathcal{N}(v_i) \land v_k \notin \mathcal{N}(v_j)) = P(v_k \notin \mathcal{N}(v_i)) \cdot P(v_k \notin \mathcal{N}(v_j)) = \frac{1}{4} \end{equation}

and by induction:

\begin{equation} P(v_k \notin \mathcal{N}(v_{1}) \land … \land v_k \notin \mathcal{N}(v_{n})) = \frac{1}{2^n} \end{equation}

It follows that if there is a chain of distinct nodes \(\overline{v_1 ... v_n}\) we can say that:

\begin{equation} P(d(v_1,v_k) \leq n) = 1- \frac{1}{2^n} \end{equation}

## Almost all simple graphs are very small world networks:

### A chain of distinct nodes \(\{v_i\}_{i=1}^{\log_2\log_2 N}\) exists almost surely:

The probability that there exists a chain of nodes of length \(\log_2\log_2 N\):

\begin{equation} \overline{v_1 … v_{\log_2\log_2 N}} \end{equation}

such that \(v_i = v_j \iff i=j\) is given by:

\begin{equation} P(\overline{v_1 … v_{\log_2\log_2 N}} \in G_N) = \prod_{k=1}^{\log_2\log_2 N} \big(1-\frac{1}{2^{N-k}} \big) \geq \big(1- \frac{\log_2 N}{2^N}\big)^{\log_2\log_2 N} \end{equation}

and we note that:

\begin{equation} \lim\limits_{N \to \infty} \big(1- \frac{\log_2 N}{2^N}\big)^{\log_2\log_2 N} = 1 \end{equation}

this guarantees the existence of a chain of distinct nodes of length \(\log_2 N\) originating from any \(v_i \in G_N\) almost surely.

### Given that \(\overline{v_1 ... v_{\log_2\log_2 N}}\) exists almost surely we may deduce that \(\forall i \in [1,\log_2\log_2 N], d(v_i,v_k) \leq \log_2\log_2 N\) almost surely:

If \(\overline{v_1 ... v_{\log_2\log_2 N}}\) exists we have:

\begin{equation} \forall \{v_i\}_{i=1}^n, v_k \in G_N, P(d(v_1,v_k) \leq \log_2\log_2 N) = 1 - \frac{1}{2^{\log_2\log_2 N}} \end{equation}

and so we have:

\begin{equation} \lim\limits_{N \to \infty} \forall \{v_i\}_{i=1}^n, v_k \in G_N, P(d(v_1,v_k) \leq \log_2\log_2 N) = 1 \end{equation}

## Discussion:

I must say that initially I found the above result quite surprising and I think it partially explains why small world networks frequently occur in nature. Granted, in natural settings the graph is typically embedded in some kind of Euclidean space so in addition to the degrees of separation we must consider the Euclidean distance. But, I suspect that in real-world networks with small world effects the Euclidean distance plays a marginal role.

In particular, I believe that wherever small-world networks prevail the Euclidean distance is dominated by ergodic dynamics between nodes. There is probably some kind of stochastic communication process between the nodes.