Introduction:

Cramér’s conjecture states that the upper-bound on prime gaps is well-behaved:

\begin{equation} \exists M \in \mathbb{N}, p_{n+1}-p_n \leq M \cdot (\ln p_n)^2 \end{equation}

or equivalently,

\begin{equation} p_{n+1}-p_n = \mathcal{O}((\ln p_n)^2) \end{equation}

A probabilistic proof of Cramér’s conjecture is developed within a frequentist perspective. Other than prior knowledge of Bertrand’s postulate, Mertens’ second theorem and the Prime Number Theorem this proof is entirely self-contained.

The key idea is to show that Cramér’s conjecture holds almost surely.

An upper-bound on prime gaps:

Given that \(\int \ln x dx = x \cdot \ln x + x\) we may define the integrals:

\begin{equation} I_{N+1} = \int_{1}^{P_{N+1}} \ln x dx = P_{N+1} \cdot \ln P_{N+1} - P_{N+1} + 1 \end{equation}

\begin{equation} I_{N} = \int_{1}^{P_{N}} \ln x dx = P_{N} \cdot \ln P_{N} - P_{N} + 1 \end{equation}

where \(P_N\) and \(P_{N+1}\) are consecutive prime numbers.

Now, if we analyse the difference \(0 \leq \Delta I_N = I_{N+1} - I_N\) we find:

\begin{equation} 0 \leq \Delta I_N = P_{N+1} \cdot \ln P_{N+1} - P_{N+1} - (P_{N} \cdot \ln P_{N} - P_{N}) \end{equation}

and from (3) we may deduce:

\begin{equation} 0 \leq P_{N+1} - P_N \leq P_{N+1} \cdot \ln P_{N+1} - P_N \cdot \ln P_N \end{equation}

Meanwhile, the Prime Number Theorem tells us that:

\begin{equation} P_N \sim \pi(P_N) \cdot \ln P_N = N \cdot \ln P_N \end{equation}

from which we may deduce:

\begin{equation} P_N \cdot \ln P_N \sim N \cdot (\ln P_N)^2 \end{equation}

and as Bertrand’s postulate implies \(\ln P_N < \ln P_{N+1} < \ln P_N + 1\) we also have:

\begin{equation} P_{N+1} \cdot \ln P_{N+1} \sim (N+1) \ln (P_{N})^2 \end{equation}

Finally, if we combine (4), (6) and (7) we may conjecture a sensible upper-bound on prime gaps:

\begin{equation} 0 \leq P_{N+1} - P_N \leq (\ln P_N)^2 + \delta \end{equation}

and the remainder of this analysis will show that \(\delta\) is essentially negligible.

A derivation of Cramér’s heuristic, or the typical probability of observing a prime number:

As is done in probabilistic proofs of the Hardy-Ramanujan theorem, we may define the indicator variable \(X_p\) where \(X_p = 1\) if \(x\) is divisible by the prime number \(p\) and \(X_p = 0\) otherwise. From a frequentist perspective:

\begin{equation} \mathbb{E}[X_p] = 1 \cdot \frac{1}{p} + 0 \cdot (1- \frac{1}{p}) = \frac{1}{p} \end{equation}

as numbers divisible by \(p\) occur with frequency \(\frac{1}{p}\).

Given the Nth prime number \(P_N\), we may use the frequentist perspective to define the typical frequency \(P\) with which a number \(x \in [1,P_N]\) is not divisible by any prime in \([1,P_N]\):

\begin{equation} P = \prod_{p \leq P_N} \big(1-\frac{1}{p}\big) \end{equation}

and due to Mertens’ second theorem we have:

\begin{equation} -\ln P \approx \sum_{p \leq P_N} = \ln \ln P_N + \mathcal{O}(1) \end{equation}

which implies that:

\begin{equation} P \approx \frac{1}{\ln P_N} \end{equation}

so a prime number \(p \in [1,P_N] \cap \mathbb{P}\) typically occurs with frequency \(\frac{1}{\ln P_N}\) and the expected number of primes in \([1,P_N]\) scales as follows:

\begin{equation} N \sim \frac{P_N}{\ln P_N} \end{equation}

as there are exactly \(N\) primes in \([1,P_N]\).

A probabilistic proof of Cramér’s theorem:

Bertrand’s postulate implies that \(\ln \frac{p_{n+1}}{p_n} \leq \ln 2\) and therefore the density of the primes in \([P_N, 2 \cdot P_N]\) may be approximated by \(\frac{1}{\ln P_N}\) without affecting the asymptotic analysis that follows.

In particular, let’s consider the events that \(\exists \alpha, \beta, \gamma \in \mathbb{R}_{+}^3\) such that:

\begin{equation} A_N : \exists p_{n+1} \in [P_N + 1, 2 \cdot P_N], p_{n+1}-p_n \geq \alpha \cdot (\ln p_n)^2 + \beta \cdot \ln p_n + \gamma \end{equation}

\begin{equation} B_N : \mathbb{P} \cap [P_N, 2 \cdot P_N] \neq \emptyset \end{equation}

Due to Bertrand’s postulate, we know that \(P(B_N) = 1\) and since we also have \(A_N \implies B_N\) we have:

\begin{equation} A_N \subset B_N \end{equation}

and therefore:

\begin{equation} P(A_N|B_N) = \frac{P(A_N \cap B_N)}{P(B_N)} = P(A_N \cap B_N) = P(A_N) \end{equation}

It follows that we may define the upper-bound:

\begin{equation} P(A_N |B_N) \leq P_N \cdot P(A_N) \end{equation}

\begin{equation} P(A_N) = (1- \frac{1}{\ln P_N})^{\alpha \cdot (\ln P_N)^2} \cdot (1- \frac{1}{\ln P_N})^{\beta \cdot \ln P_N} \cdot (1- \frac{1}{\ln P_N})^{\gamma} \end{equation}

where \(P(A_N) \approx \frac{e^{-\beta}}{(P_N)^{\alpha}} \leq \big(\frac{1}{P_N}\big)^{\alpha}\).

Now, in order to consider any \(\alpha \in (1,\infty)\) we’ll use the fact that:

\begin{equation} \int_{1}^{\infty} e^{-x \ln c} dx = \frac{1}{c \ln c} \end{equation}

so we have:

\begin{equation} P(A_N | B_N) \leq P_N \cdot (\frac{1}{P_N \cdot \ln P_N}) = \frac{1}{\ln P_N} \end{equation}

so we may deduce that:

\begin{equation} \lim_{N \to \infty} P(A_N | B_N) \leq \lim_{N \to \infty} \frac{1}{\ln P_N} = 0 \end{equation}

so \(A_N\) is a measure zero event.

References:

  1. Cramér, H. “On the Order of Magnitude of the Difference Between Consecutive Prime Numbers.” Acta Arith. 2, 23-46, 1936.

  2. Hardy, G. H.; Ramanujan, S. (1917), “The normal number of prime factors of a number n”, Quarterly Journal of Mathematics

  3. Turán, Pál (1934), “On a theorem of Hardy and Ramanujan”, Journal of the London Mathematical Society

  4. Yufei Zhao. The Probabilistic Method in Combinatorics. 2019.

  5. F. Mertens. J. reine angew. Math. 78 (1874)