To develop algorithms for problems in machine learning or statistical physics, it is useful to develop an understanding of high-dimensional Euclidean spaces. One interesting property is that almost all high-dimensional random vectors are orthogonal with respect to the cosine distance.

I shall start with a simple uniformly distributed random variable that gives insight into more complex cases.

An illustrative problem:

We note that for any :

\begin{equation} \lVert X \rVert = \sqrt{2n} \end{equation}

\begin{equation} X \cdot Y = \sum_{i=1}^{2n} x_i \cdot y_i \end{equation}

where equals or with equal probability.

As a result, if we define the cosine distance:

\begin{equation} \text{COS}(X,Y) = \frac{X \cdot Y}{\lVert X \rVert \lVert Y \rVert} \end{equation}

we find that this expression simplifies to:

\begin{equation} S_n = \frac{\sum_{i=1}^{2n} x_i \cdot y_i}{2n} \approx \mathbb{E}[X \cdot Y] = 0 \end{equation}

and by applying the Central Limit Theorem to we find that:

\begin{equation} \forall \epsilon > 0, \lim_{n \to \infty} P(|S_n - \mathbb{E}[X \cdot Y]| > \epsilon) = \lim_{n \to \infty} P(|S_n| > \epsilon) = 0 \end{equation}

The case of isotropic Gaussian vectors:

For the case of we may proceed in a similar manner. It’s particularly useful to start by analysing the denominator of the cosine formula:

\begin{equation} \lVert X \rVert^2 = \sum_{x_i > 0} x_i^2 + \sum_{x_i < 0} x_i^2 \approx Cn \end{equation}

where the constant is given by:

\begin{equation} C = \int_{0}^\infty x^2 \cdot \frac{f(x)}{P(x \geq 0)} dx = \int_{0}^\infty 2 x^2 \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-\frac{x^2}{2 \sigma^2}} dx \end{equation}

As a result, if we compute the cosine distance of and independently sampled from we find that:

\begin{equation} \text{COS}(X,Y) = \frac{X \cdot Y}{\lVert X \rVert \lVert Y \rVert} \approx \frac{X \cdot Y}{\lVert X \rVert^2} \approx \frac{S_n}{C} \end{equation}

where is given by:

\begin{equation} S_n = \frac{\sum_{i=1}^{n} x_i \cdot y_i}{n} \approx \mathbb{E}[X \cdot Y] \end{equation}

and since by the Central Limit Theorem we have:

\begin{equation} \forall \epsilon > 0, \lim_{n \to \infty} P(|S_n| > \epsilon) = 0 \end{equation}