# Probability in High Dimension Part I

## Motivation:

A couple weeks ago I was working on a problem that involved the expected value of a ratio of two random variables:

\begin{equation} \mathbb{E}\big[\frac{X_n}{Z_n}\big] \approx \frac{\mu_{X_n}}{\mu_{Z_n}} - \frac{\mathrm{Cov}(X_n,Z_n)}{\mu_{Z_n}^2} + \frac{\mathrm{Var(Z_n)}\mu_{X_n}}{\mu_{Z_n}^3} \end{equation}

where \(Z_n\) was a sum of \(n\) i.i.d. random variables with a symmetric distribution centred at zero.

Everything about this approximation worked fine in computer simulations where \(n\) was large but mathematically there appeared to be a problem since:

\begin{equation} \mathbb{E}\big[Z_n\big] = 0 \end{equation}

Given that (2) didn’t appear to be an issue in simulation, I went through the code several times to check whether there was an error but found none. After thinking about the problem for a bit longer it occurred to me to formalise the problem and analyse:

\begin{equation} P(\sum_{n=1}^N a_n = 0) \end{equation}

where \(a_n\) are i.i.d. random variables with a uniform distribution centred at zero so \(\mathbb{E}[a_i]=0\). My intuition suggested that under relatively weak assumptions:

\begin{equation} \lim_{N \to \infty} P(\sum_{n=1}^N a_n = 0) = 0 \end{equation}

We may think of this as a measure-theoretic phenomenon in high-dimensional spaces where \(N \in \mathbb{N}\) is our dimension and \(\vec{a} \in \mathbb{R}^N\) is a random vector.

## Analysis of a special case:

Given that (3) is a very general problem, I decided to start by analysing the special case of \(a_i \sim \mathcal{U}(\{-1,1\})\) where:

\begin{equation} \forall n \in \mathbb{N}, P(a_n=1)=P(a_n=-1)=\frac{1}{2} \end{equation}

\begin{equation} S_0 = \{ (a_n)_{n=1}^N \in \{-1,1\} : \sum_n a_n = 0\} \end{equation}

Knowing that \(S_0\) is non-empty only if we have parity of positive and negative terms, we may deduce that:

\begin{equation} S_0 \neq \emptyset \iff N \in 2\mathbb{N} \end{equation}

For the above reason, I focused my analysis on the following sequence:

\begin{equation} u_N = P(\sum_{n=1}^{2N} a_n = 0)= \frac{2N \choose N}{2^{2N}} = \frac{(2N)!}{2^{2N}(N!)^2} \end{equation}

## Proof that \(u_N\) is decreasing:

We can demonstrate that \(u_N\) is strictly decreasing by considering the ratio:

\begin{equation} \frac{u_{N+1}}{u_N}=\frac{\frac{(2N+2)!}{2^{2N+2}((N+1)!)^2}}{\frac{(2N)!}{2^{2N}(N!)^2}}=\frac{(2N+2)(2N+1)}{4(N+1)^2}=\frac{2N+1}{2N+2} < 1 \end{equation}

Now, with (9) we have what is necessary to show that:

\begin{equation} \lim_{n \to \infty} u_N = 0 \end{equation}

## Analysis of the limit \(\lim\limits_{N \to \infty} u_N\):

Using (9) we may derive a recursive definition of \(u_N\):

\begin{equation} u_{N+1}=\frac{2N+1}{2N+2} \cdot u_N \end{equation}

and given that \(u_0=1\) we have:

\begin{equation} u_{N}=\prod_{n=0}^{N-1} \frac{2n+1}{2n+2}= \frac{1}{3} \cdot \frac{3}{4} \cdot \frac{5}{6} \cdot … \end{equation}

At this point we can make the useful observation:

\begin{equation} \lim_{N \to \infty} u_N = 0 \implies \lim_{N \to \infty} - \ln u_N = \infty \end{equation}

## Proof that \(\lim\limits_{N \to \infty} u_N=0\):

By combining (12) and (13) we find that:

\begin{equation} -\ln u_N = -\ln \prod_{n=0}^{N-1} \frac{2n+1}{2n+2}= \sum_{n=0}^{N-1} \ln \frac{2n+2}{2n+1}= \sum_{n=0}^{N-1} \ln \big(1+\frac{1}{2n+1}\big) \end{equation}

We note that when \(n\in \mathbb{N}\) is large:

\begin{equation} \ln \big(1+\frac{1}{n}\big) \approx \frac{1}{n} \end{equation}

Now, from (15) it follows that:

\begin{equation} \sum_{n=1}^\infty \frac{1}{2n+1} = \infty \implies \sum_{n=0}^{\infty} \ln \big(1+\frac{1}{2n+1}\big) = \infty \end{equation}

Using (15) we may conclude that (10) is indeed true. In some sense, when \(n\) is large we can expect to observe the expected value with vanishing probability.

## Discussion:

A natural question that follows is whether the above method may be used to handle other cases. Let’s consider \(a_i \sim \mathcal{U}(\{-1,0,1\})\) where:

\begin{equation} \forall n \in \mathbb{N}, P(a_n=1)=P(a_n=0)=P(a_n=-1)=\frac{1}{3} \end{equation}

so we may define:

\begin{equation} u_N = P(\sum_{n=1}^{4N} a_n = 0)= \frac{(4N)!}{3^{4N}} \sum_{k=1}^N \frac{1}{(2k)!^2 (4N-4k)!} \end{equation}

I actually tried to analyse the combinatorics of this sequence but quickly realised that even if I managed to show that this sequence converged to zero, it wasn’t clear how this method would manage to handle the most general setting, the case of all integer dimensions \(N \in \mathbb{N}\), and it didn’t appear to be very effective in terms of the number of calculations per case.

In order to make progress, I decided to model this problem from a different perspective.