Motivation:

A couple weeks ago I was working on a problem that involved the expected value of a ratio of two random variables:

\begin{equation} \mathbb{E}\big[\frac{X_n}{Z_n}\big] \approx \frac{\mu_{X_n}}{\mu_{Z_n}} - \frac{\mathrm{Cov}(X_n,Z_n)}{\mu_{Z_n}^2} + \frac{\mathrm{Var(Z_n)}\mu_{X_n}}{\mu_{Z_n}^3} \end{equation}

where \(Z_n\) was defined as a sum of \(n\) i.i.d. random variables with a symmetric distribution centred at zero.

Everything about this approximation worked fine in computer simulations where \(n\) was large but mathematically there appeared to be a problem since:

\begin{equation} \mathbb{E}\big[Z_n\big] = 0 \end{equation}

Given that (2) didn’t appear to be an issue in simulation, I went through the code several times to check whether there was an error but found none. After thinking about the problem for a bit longer it occurred to me to formalise the problem and analyse:

\begin{equation} P(\sum_{i=1}^n a_i = 0) \end{equation}

where \(a_i\) are i.i.d. random variables with a uniform distribution centred at zero so \(\mathbb{E}[a_i]=0\). We may think of this as a measure-theoretic phenomenon in high-dimensional spaces where \(N \in \mathbb{N}\) is our dimension and \(\vec{a} \in \mathbb{R}^N\) is a random vector.

Now, while in a previous article I analysed this problem as an infinite series for the special case of \(a_i \sim \mathcal{U}(\{-1,1\})\), for the more general case of \(a_i \sim \mathcal{U}([-N,N])\) where \([-N,N] \subset \mathbb{Z}\) it occurred to me that modelling this problem as a random walk on \(\mathbb{Z}\) might be an effective approach.

A random walk on \(\mathbb{Z}\):

Let’s suppose \(a_i \sim \mathcal{U}([-N,N])\) where \([-N,N] \subset \mathbb{Z}\). We may then define:

\begin{equation} S_n = \sum_{i=1}^n a_i \end{equation}

Due to the i.i.d. assumption we have:

\begin{equation} \mathbb{E}\big[S_n\big]= n \cdot \mathbb{E}\big[a_i\big]=0 \end{equation}

We may now define:

\begin{equation} u_n = P(S_n=0) \end{equation}

and ask whether \(u_n\) is decreasing. In other words, what is the probability that we observe the expected value as \(n\) becomes large?

Small and Large deviations:

It’s useful to observe the following nested structure:

\begin{equation} \forall k \in [0,N], \{\lvert S_n \rvert \leq k\} \subset \{\lvert S_n \rvert \leq k+1 \} \end{equation}

From (7), we may deduce that:

\begin{equation} P(\lvert S_n \rvert \leq N) + P(\lvert S_n \rvert > N) = 1 \end{equation}

So we are now ready to define the probability of a ‘small’ deviation:

\begin{equation} \alpha_n = P(\lvert S_n \rvert \leq N) \end{equation}

as well as the probability of ‘large’ deviations:

\begin{equation} \beta_n = P(\lvert S_n \rvert > N) \end{equation}

Additional motivation for analysing \(\alpha_n\) and \(\beta_n\) arises from:

\begin{equation} P(S_{n+1} = 0 | \lvert S_n \rvert > N) = 0 \end{equation}

\begin{equation} P(S_{n+1} = 0 | \lvert S_n \rvert \leq N) = \frac{1}{2N+1} \end{equation}

Furthermore, by the law of total probability we have:

\begin{equation} \begin{split} P(S_{n+1}=0) & = P(S_{n+1}=0|\lvert S_n \rvert \leq N) \cdot P(\lvert S_n \rvert \leq N) + P(S_{n+1}=0|\lvert S_n \rvert > N) \cdot P(\lvert S_n \rvert > N) \
& = P(S_{n+1}=0|\lvert S_n \rvert \leq N) \cdot P(\lvert S_n \rvert \leq N) \
& = \frac{P(\lvert S_n \rvert \leq N)}{2N+1} \end{split} \end{equation}

A remark on symmetry:

It’s useful to note the following alternative definitions of \(\alpha_n\) and \(\beta_n\) that emerge due to symmetries intrinsic to the problem:

\begin{equation} \beta_n = P(\lvert S_n \rvert > N) = 2 \cdot P(S_n > N) = 2 \cdot P(S_n < -N) \end{equation}

\begin{equation} \alpha_n = P(\lvert S_n \rvert \leq N) = 1-2 \cdot P(S_n > N)=1-2 \cdot P(S_n < -N) \end{equation}

The case of \(n=1\) and \(n=2\):

Given that \(S_0=0\):

\begin{equation} P(S_1=0)=\frac{P(\lvert S_0 \rvert \leq N)}{2N+1}= \frac{1}{2N+1} \end{equation}

As for the case of \(n=2\):

\begin{equation} P(\lvert S_2 \rvert \leq N) =1 \implies P(S_2=0) = \frac{1}{2N+1} \end{equation}

The case of \(n=3\):

The case of \(n=3\) requires that we calculate:

\begin{equation} \begin{split} P(S_{2} > N) & = \sum_{i=1}^N P(S_{2} > N| S_1 = i) \cdot P( S_1 = i) \
& = \frac{1}{2N+1} \sum_{i=1}^N (\frac{1}{2N+1} + … + \frac{N}{2N+1}) \
& = \frac{N \cdot (N-1)}{2 \cdot (2N+1)^2} \end{split} \end{equation}

and using (18) we may derive \(P(S_{2} \leq N)\):

\begin{equation} \begin{split} P(S_{2} \leq N) & = 1 - 2 \cdot P(S_{2} > N) \
& = 1- \frac{N \cdot (N-1)}{(2N+1)^2} \
& = \frac{3N^2+5N+1}{(2N+1)^2} \sim \frac{3}{4} \end{split} \end{equation}

and so for \(n=3\) we have:

\begin{equation} \begin{split} P(S_{3} = 0) & = P(S_{3} = 0 | \lvert S_2 \rvert \leq N) \cdot P(\lvert S_2 \rvert \leq N) \
& = \frac{3N^2+5N+1}{(2N+1)^3} \sim \frac{3}{8N} \end{split} \end{equation}

Average drift or why \(P(S_n=k) > P(S_n = k+1)\):

It’s useful to note that we may decompose \(n\) into:

\begin{equation} n = \hat{n} + n_z \end{equation}

where \(\hat{n}\) represents the total number of positive and negative terms, allowing us to ignore the trivial contribution of zero terms \(n_z\).

For the above reason, it’s convenient to decompose \(S_n\) into:

\begin{equation} S_n = S_n^+ + S_n^{-} \end{equation}

where \(S_n^+\) defines the sum of the positive terms and \(S_n^{-}\) defines the sum of the negative terms.

By grouping the terms in the manner of (22) we may observe that when \(\hat{n}\) is large the average positive/negative step length is given by:

\begin{equation} \Delta = \frac{N}{2} \end{equation}

so that if \(\tau\) positive steps and \(\hat{n}-\tau\) negative steps are taken:

\begin{equation} \mathbb{E}[S_n^+] = \tau \cdot \Delta \end{equation}

\begin{equation} \mathbb{E}[S_n^-] = (\hat{n}-\tau) \cdot (-\Delta) \end{equation}

\begin{equation} \mathbb{E}[S_n] = \mathbb{E}[S_n^+] + \mathbb{E}[S_n^-] = \Delta \cdot (2\tau-\hat{n}) \end{equation}

and we note that:

\begin{equation} \mathbb{E}[S_n] \geq 0 \implies \tau \geq \lfloor \frac{\hat{n}}{2} \rfloor \end{equation}

Furthermore, due to symmetry:

\begin{equation} P(\lvert S_n \rvert =k) > P(\lvert S_n \rvert =k+1) \iff P(S_n =k) > P(S_n =k+1) \end{equation}

so it suffices to demonstrate \(P(S_n =k) > P(S_n =k+1)\).

In order to proceed with our demonstration we choose \(\tau \in [\lfloor \frac{\hat{n}}{2} \rfloor + 1,\hat{n}N-1]\) and find that \(P\) has a monotone relationship with the binomial distribution:

\begin{equation} P(S_{n} = \lfloor \Delta \cdot (2\tau-\hat{n}) \rfloor) \propto {\hat{n} \choose \tau} \frac{1}{2^{\hat{n}}} \end{equation}

where \(\tau \geq \lfloor \frac{\hat{n}}{2} \rfloor\) implies that:

\begin{equation} \forall k \geq 0, \frac{P(S_n=k)}{P(S_n=k+1)} \sim \frac{(\tau+1)!(\hat{n}-\tau-1)!}{\tau!(\hat{n}-\tau)!} = \frac{\tau+1}{\hat{n}-\tau} > 1 \end{equation}

which holds for all \(n_z \leq n\).

Note: I wrote a julia function that provides experimental evidence for equation (30).

Proof that \(u_n\) is decreasing:

Given (13) we may derive the following ratio:

\begin{equation} \frac{u_{n+1}}{u_n} = \frac{P(\lvert S_n \rvert \leq N)}{(2N+1) \cdot P(S_n = 0)} \end{equation}

So in order to prove that \(u_n\) is decreasing we must show that:

\begin{equation} P(\lvert S_n \rvert \leq N) < (2N+1) \cdot P(S_n=0) \end{equation}

and we note that this follows immediately from (31) since:

\begin{equation} P(\lvert S_n \rvert \leq N) = 2 \sum_{k=1}^N P(S_n=k) + P(S_n=0) < (2N+1) \cdot P(S_n=0) \end{equation}

Proof that \(\lim\limits_{n \to \infty} u_n = \lim\limits_{n \to \infty} \alpha_n = 0\):

Now, given (33) we may define:

\begin{equation} \forall N \in \mathbb{N}, q_n = \frac{P(\lvert S_n \rvert \leq N)}{(2N+1)P(S_n=0)} < 1 \end{equation}

Furthermore, using the concentration phenomenon (30) we may show that when \(n\) is large:

\begin{equation} P(S_{2n}=0) \sim \frac{1}{2^{2n}} {2n \choose n} \sim \frac{1}{2^{2n}} \frac{\sqrt{4 \pi n} (\frac{2n}{e})^{2n}}{2 \pi n (\frac{n}{e})^{2n}} \sim \frac{1}{\sqrt{\pi n}} \end{equation}

From this we may deduce that \(q_n\) is decreasing:

\begin{equation} \prod_{n=1}^\infty q_n = \prod_{n=1}^\infty \frac{P(S_{n+1}=0)}{P(S_n=0)} = \lim_{n \to \infty} \frac{P(S_{n+1}=0)}{P(S_1=0)} = 0 \end{equation}

Likewise, given that:

\begin{equation} \alpha_n = P(\lvert S_n \rvert \leq N) = (2N+1) \cdot P(S_{n+1}=0) \end{equation}

we may conclude that large deviations are exponentially more likely as \(n\) becomes large:

\begin{equation} \lim_{n \to \infty} \alpha_n = \lim_{n \to \infty} (2N+1) \cdot P(S_{n+1}=0) = 0 \end{equation}

\begin{equation} \lim_{n \to \infty} \beta_n = \lim_{n \to \infty} P(\lvert S_n \rvert > N) = \lim_{n \to \infty} 1 - \alpha_n = 1 \end{equation}

A geometric interpretation of the last two limits is that the ratio of the largest hyperplane intersection of the discrete hypercube relative to the total volume of the discrete hypercube shrinks as the dimension of the hypercube tends to infinity.

Discussion:

I find it quite surprising that random structures, in this case a random walk, are useful for analysing high-dimensional systems. But, what about the case of uniform distributions on closed intervals of the form \([-N,N] \subset \mathbb{R}\)?

It’s useful to note that \([-N,N]^n \subset \mathbb{R}^n\) defines the hypercube with volume \((2N)^n\) and I suspect that in the continuous setting, hypercube geometry and convex analysis might be particularly insightful.