Motivation:

A couple weeks ago I was working on a problem that involved the expected value of a ratio of two random variables:

\begin{equation} \mathbb{E}\big[\frac{X_n}{Z_n}\big] \approx \frac{\mu_{X_n}}{\mu_{Z_n}} - \frac{\mathrm{Cov}(X_n,Z_n)}{\mu_{Z_n}^2} + \frac{\mathrm{Var(Z_n)}\mu_{X_n}}{\mu_{Z_n}^3} \end{equation}

where was defined as a sum of i.i.d. random variables with a symmetric distribution centred at zero.

Everything about this approximation worked fine in computer simulations where was large but mathematically there appeared to be a problem since:

\begin{equation} \mathbb{E}\big[Z_n\big] = 0 \end{equation}

Given that (2) didn’t appear to be an issue in simulation, I went through the code several times to check whether there was an error but found none. After thinking about the problem for a bit longer it occurred to me formalise the problem and analyse:

\begin{equation} P(\sum_{n=1}^N a_n = 0) \end{equation}

where are i.i.d. random variables with a uniform distribution centred at zero so . We may think of this as a measure-theoretic phenomenon in high-dimensional spaces where is our dimension and is a random vector.

Now, while in a previous article I analysed this problem as an infinite series for the special case of , for the more general case of where it occurred to me that modelling this problem as a random walk on might be an effective approach.

A random walk on :

Let’s suppose where . We may then define:

\begin{equation} S_n = \sum_{i=1}^n a_i \end{equation}

Due to the i.i.d. assumption we have:

\begin{equation} \mathbb{E}\big[S_n\big]= n \cdot \mathbb{E}\big[a_i\big]=0 \end{equation}

We may now define:

\begin{equation} u_n = P(S_n=0) \end{equation}

and ask whether is decreasing. In other words, what is the probability that we observe the expected value as becomes large?

Small and Large deviations:

It’s useful to observe the following nested structure:

\begin{equation} \forall k \in [0,N], \{\lvert S_n \rvert \leq k\} \subset \{\lvert S_n \rvert \leq k+1 \} \end{equation}

From (7), we may deduce that:

\begin{equation} P(\lvert S_n \rvert \leq N) + P(\lvert S_n \rvert > N) = 1 \end{equation}

So we are now ready to define the probability of a ‘small’ deviation:

\begin{equation} \alpha_n = P(\lvert S_n \rvert \leq N) \end{equation}

as well as the probability of ‘large’ deviations:

\begin{equation} \beta_n = P(\lvert S_n \rvert > N) \end{equation}

Additional motivation for analysing and arises from:

\begin{equation} P(S_{n+1} = 0 | \lvert S_n \rvert > N) = 0 \end{equation}

\begin{equation} P(S_{n+1} = 0 | \lvert S_n \rvert \leq N) = \frac{1}{2N+1} \end{equation}

Furthermore, by the law of total probability we have:

\begin{equation} \begin{split} P(S_{n+1}=0) & = \sum_{i=1}^N P(S_{n+1}=0|\lvert S_n \rvert \leq N) \cdot P(\lvert S_n \rvert \leq N) + P(S_{n+1}=0|\lvert S_n \rvert > N) \cdot P(\lvert S_n \rvert > N) \\ & = P(S_{n+1}=0|\lvert S_n \rvert \leq N) \cdot P(\lvert S_n \rvert \leq N) \\ & = \frac{P(\lvert S_n \rvert \leq N)}{2N+1} \end{split} \end{equation}

A remark on symmetry:

It’s useful to note the following alternative definitions of and that emerge due to symmetries intrinsic to the problem:

\begin{equation} \beta_n = P(\lvert S_n \rvert > N) = 2 \cdot P(S_n > N) = 2 \cdot P(S_n < -N) \end{equation}

\begin{equation} \alpha_n = P(\lvert S_n \rvert \leq N) = 1-2 \cdot P(S_n > N)=1-2 \cdot P(S_n < -N) \end{equation}

The case of and :

Given that :

\begin{equation} P(S_1=0)=\frac{P(\lvert S_0 \rvert \leq N)}{2N+1}= \frac{1}{2N+1} \end{equation}

As for the case of :

\begin{equation} P(\lvert S_2 \rvert \leq N) =1 \implies P(S_2=0) = \frac{1}{2N+1} \end{equation}

The case of :

The case of requires that we calculate:

\begin{equation} P(S_3=0)=\frac{P(\lvert S_2 \rvert \leq N)}{2N+1}= \frac{1}{2N+1} \end{equation}

\begin{equation} \begin{split} P(S_{2} > N) & = P(S_{2}| S_1 = i) \cdot P( S_1 = i) \\ & = \frac{1}{2N+1} \sum_{i=1}^N (\frac{1}{2N+1} + … + \frac{N}{2N+1}) \\ & = \frac{N \cdot (N-1)}{2 \cdot (2N+1)^2} \end{split} \end{equation}

and using (19) we may derive :

\begin{equation} \begin{split} P(S_{2} \leq N) & = 1 - 2 \cdot P(S_{2} > N) \\ & = 1- \frac{N \cdot (N-1)}{(2N+1)^2} \\ & = \frac{3N^2+5N+1}{(2N+1)^2} \sim \frac{3}{4} \end{split} \end{equation}

and so for we have:

\begin{equation} \begin{split} P(S_{3} = 0) & = P(S_{3} = 0 | \lvert S_2 \rvert \leq N) \cdot P(\lvert S_2 \rvert \leq N) \\ & = \frac{3N^2+5N+1}{(2N+1)^3} \sim \frac{3}{8N} \end{split} \end{equation}

Average drift or why :

It’s useful to note that we may decompose into:

\begin{equation} n = \hat{n} + n_z \end{equation}

where represents the total number of positive and negative terms, ignoring the null contribution of zero terms .

For the above reason, it’s convenient to decompose into:

\begin{equation} S_n = S_n^+ + S_n^{-} \end{equation}

where defines the sum of the positive terms and defines the sum of the negative terms.

By grouping the terms in the manner of (23) we may observe that when is large the average positive/negative step length is given by:

\begin{equation} \Delta = \frac{N}{2} \end{equation}

so that if positive steps and negative steps are taken:

\begin{equation} \mathbb{E}[S_n^+] = \tau \cdot \Delta \end{equation}

\begin{equation} \mathbb{E}[S_n^-] = (\hat{n}-\tau) \cdot (-\Delta) \end{equation}

\begin{equation} \mathbb{E}[S_n] = \mathbb{E}[S_n^+] + \mathbb{E}[S_n^-] = \Delta \cdot (2\tau-\hat{n}) \end{equation}

and we note that:

\begin{equation} \mathbb{E}[S_n] \geq 0 \implies \tau \geq \lfloor \frac{\hat{n}}{2} \rfloor \end{equation}

Furthermore, due to symmetry:

\begin{equation} P(\lvert S_n \rvert =k) > P(\lvert S_n \rvert =k+1) \iff P(S_n =k) > P(S_n =k+1) \end{equation}

so it suffices to demonstrate .

In order to proceed with our demonstration we choose and find that has a monotone relationship with the binomial distribution:

\begin{equation} P(S_{n} = \lfloor \Delta \cdot (2\tau-\hat{n}) \rfloor) \propto {\hat{n} \choose \tau} \frac{1}{2^{\hat{n}}} \end{equation}

where implies that:

\begin{equation} \forall k \geq 0, \frac{P(S_n=k)}{P(S_n=k+1)} \sim \frac{(\tau+1)!(\hat{n}-\tau-1)!}{\tau!(\hat{n}-\tau)!} = \frac{\tau+1}{\hat{n}-\tau} > 1 \end{equation}

which holds for all .

Note: I wrote a julia function that provides experimental evidence for equation (30).

Proof that is decreasing:

Given (13) we may derive the following ratio:

\begin{equation} \frac{u_{n+1}}{u_n} = \frac{P(\lvert S_n \rvert \leq N)}{(2N+1) \cdot P(S_n = 0)} \end{equation}

So in order to prove that is decreasing we must show that:

\begin{equation} P(\lvert S_n \rvert \leq N) < (2N+1) \cdot P(S_n=0) \end{equation}

and we note that this follows immediately from (31) since:

\begin{equation} P(\lvert S_n \rvert \leq N) = 2 \sum_{k=1}^N P(S_n=k) + P(S_n=0) < (2N+1) \cdot P(S_n=0) \end{equation}

Proof that :

Now, given (34) we may define:

\begin{equation} \forall N \in \mathbb{N}, q_n = \frac{P(\lvert S_n \rvert \leq N)}{(2N+1)P(S_n=0)} < 1 \end{equation}

Furthermore, using the concentration phenomenon (30) we may show that when is large:

\begin{equation} P(S_{2n}=0) \sim \frac{1}{2^{2n}} {2n \choose n} \sim \frac{1}{2^{2n}} \frac{\sqrt{4 \pi n} (\frac{2n}{e})^{2n}}{2 \pi n (\frac{n}{e})^{2n}} \sim \frac{1}{\sqrt{\pi n}} \end{equation}

From this we may deduce that is decreasing:

\begin{equation} \prod_{n=1}^\infty q_n = \prod_{n=1}^\infty \frac{P(S_{n+1}=0)}{P(S_n=0)} = \lim_{n \to \infty} \frac{P(S_{n+1}=0)}{P(S_1=0)} = 0 \end{equation}

Likewise, given that:

\begin{equation} \alpha_n = P(\lvert S_n \rvert \leq N) = (2N+1) \cdot P(S_{n+1}=0) \end{equation}

we may conclude that large deviations are exponentially more likely as becomes large:

\begin{equation} \lim_{n \to \infty} \alpha_n = \lim_{n \to \infty} (2N+1) \cdot P(S_{n+1}=0) = 0 \end{equation}

\begin{equation} \lim_{n \to \infty} \beta_n = \lim_{n \to \infty} P(\lvert S_n \rvert > N) = \lim_{n \to \infty} 1 - \alpha_n = 1 \end{equation}

One interpretation of the last two limits is that although the largest hyperplane intersection of the discrete hypercube always contains most of the mass of the hypercube, the ratio of this volume to the total volume of the discrete hypercube shrinks as the dimension of the hypercube tends to infinity.

Discussion:

I find it quite surprising that random structures, in this case a random walk, are useful for analysing high-dimensional systems. But, what about the case of uniform distributions on closed intervals of the form ?

It’s useful to note that defines the hypercube with volume and I suspect that in the continuous setting, hypercube geometry and convex analysis might be particularly insightful.