# Probability in High Dimension Part I

## Motivation:

A couple weeks ago I was working on a problem that involved the expected value of a ratio of two random variables:

\begin{equation} \mathbb{E}\big[\frac{X_n}{Z_n}\big] \approx \frac{\mu_{X_n}}{\mu_{Z_n}} - \frac{\mathrm{Cov}(X_n,Z_n)}{\mu_{Z_n}^2} + \frac{\mathrm{Var(Z_n)}\mu_{X_n}}{\mu_{Z_n}^3} \end{equation}

where was a sum of i.i.d. random variables with a symmetric distribution centred at zero.

Everything about this approximation worked fine in computer simulations where was large but mathematically there appeared to be a problem since:

\begin{equation} \mathbb{E}\big[Z_n\big] = 0 \end{equation}

Given that (2) didn’t appear to be an issue in simulation, I went through the code several times to check whether there was an error but found none. After thinking about the problem for a bit longer it occurred to me to formalise the problem and analyse:

\begin{equation} P(\sum_{n=1}^N a_n = 0) \end{equation}

where are i.i.d. random variables with a uniform distribution centred at zero so . My intuition suggested that under relatively weak assumptions:

\begin{equation} \lim_{N \to \infty} P(\sum_{n=1}^N a_n = 0) = 0 \end{equation}

We may think of this as a measure-theoretic phenomenon in high-dimensional spaces where is our dimension and is a random vector.

## Analysis of a special case:

Given that (3) is a very general problem, I decided to start by analysing the special case of where:

\begin{equation} \forall n \in \mathbb{N}, P(a_n=1)=P(a_n=-1)=\frac{1}{2} \end{equation}

\begin{equation} S_0 = \{ (a_n)_{n=1}^N \in \{-1,1\} : \sum_n a_n = 0\} \end{equation}

Knowing that is non-empty only if we have parity of positive and negative terms, we may deduce that:

\begin{equation} S_0 \neq \emptyset \iff N \in 2\mathbb{N} \end{equation}

For the above reason, I focused my analysis on the following sequence:

\begin{equation} u_N = P(\sum_{n=1}^{2N} a_n = 0)= \frac{2N \choose N}{2^{2N}} = \frac{(2N)!}{2^{2N}(N!)^2} \end{equation}

## Proof that is decreasing:

We can demonstrate that is strictly decreasing by considering the ratio:

\begin{equation} \frac{u_{N+1}}{u_N}=\frac{\frac{(2N+2)!}{2^{2N+2}((N+1)!)^2}}{\frac{(2N)!}{2^{2N}(N!)^2}}=\frac{(2N+2)(2N+1)}{4(N+1)^2}=\frac{2N+1}{2N+2} < 1 \end{equation}

Now, with (9) we have what is necessary to show that:

\begin{equation} \lim_{n \to \infty} u_N = 0 \end{equation}

## Analysis of the limit :

Using (9) we may derive a recursive definition of :

\begin{equation} u_{N+1}=\frac{2N+1}{2N+2} \cdot u_N \end{equation}

and given that we have:

\begin{equation} u_{N}=\prod_{n=0}^{N-1} \frac{2n+1}{2n+2}= \frac{1}{3} \cdot \frac{3}{4} \cdot \frac{5}{6} \cdot … \end{equation}

At this point we can make the useful observation:

\begin{equation} \lim_{N \to \infty} u_N = 0 \implies \lim_{N \to \infty} - \ln u_N = \infty \end{equation}

## Proof that :

By combining (12) and (13) we find that:

\begin{equation} -\ln u_N = -\ln \prod_{n=0}^{N-1} \frac{2n+1}{2n+2}= \sum_{n=0}^{N-1} \ln \frac{2n+2}{2n+1}= \sum_{n=0}^{N-1} \ln \big(1+\frac{1}{2n+1}\big) \end{equation}

We note that when is large:

\begin{equation} \ln \big(1+\frac{1}{n}\big) \approx \frac{1}{n} \end{equation}

Now, from (15) it follows that:

\begin{equation} \sum_{n=1}^\infty \frac{1}{2n+1} = \infty \implies \sum_{n=0}^{\infty} \ln \big(1+\frac{1}{2n+1}\big) = \infty \end{equation}

Using (15) we may conclude that (10) is indeed true. In some sense, when is large we can expect to observe the expected value with vanishing probability.

## Discussion:

A natural question that follows is whether the above method may be used to handle other cases. Let’s consider where:

\begin{equation} \forall n \in \mathbb{N}, P(a_n=1)=P(a_n=0)=P(a_n=-1)=\frac{1}{3} \end{equation}

so we may define:

\begin{equation} u_N = P(\sum_{n=1}^{4N} a_n = 0)= \frac{(4N)!}{3^{4N}} \sum_{k=1}^N \frac{1}{(2k)!^2 (4N-4k)!} \end{equation}

I actually tried to analyse the combinatorics of this sequence but quickly realised that even if I managed to show that this sequence converged to zero, it wasn’t clear how this method would manage to handle the most general setting, the case of all integer dimensions , and it didn’t appear to be very effective in terms of the number of calculations per case.

In order to make progress, I decided to model this problem from a different perspective.