# What is the probability that the sun will rise tomorrow?

The sunrise problem is a problem first considered by Laplace that asks for the probability that the sun will rise tomorrow given a history of sunrises. Though this may seem like a silly problem it can serve to illustrate fundamental differences between Frequentists and Bayesians.
While Frequentists use probability only to model processes broadly described as ‘sampling’, Bayesians use probability to model both sampling and their ‘degree of belief’.

First, let’s consider the Frequentist approach. Now, the Frequentist guy has to
cheat in some way as this problem isn’t well-defined in the Frequentist
framework since ‘tomorrow’ is a sample of size one(and infinite standard
deviation). Some Frequentists try to define this probability by assuming that
there are many worlds with a sun potentially rising on each. But this is a really
silly bastardization of Laplace’s principle of insufficent reason. In all honesty,
the best this guy can do is to calculate the probability that the sun will rise on
any day, and not the probability that the sun rises on a particular day. Here we go:

1) Let’s assume that this phenomenon can be modeled as i.i.d. draws from a binomial distribution(i.e. a Bernoulli trial) where $X$ is the sum of $n$
Bernoulli random variables and represents the number of days that the sun
rises out of $n$ observations.

2) By the Law of Large Numbers $X=n \hat{p}$ converges to the expected
number of sunrises, $E[X]=np$ where $p$ is the probability that the sun rises on any day and $\hat{p}$ is our estimate of this probability.

3) By the Central Limit Theorem, for large $n$ the sunrises should be normally distributed with mean $np$ and variance $np(1-p)$

4) Furthermore, for large $n$ we may construct confidence intervals
with coverage at least $1-\alpha$ :

$\displaystyle\hat{p}\pm z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$

where $z_{\alpha/2}$ is the $100(1-\alpha/2)th$ percentile of the standard normal distribution. So, after a large number of sunrises the Frequentist can give a reasonable answer for the probability that the sun would rise on any day provided that his assumption holds true.

Now, compare this with the Bayesian solution which allows consideration of such questions:

1) $X$ is defined exactly as the Frequentist had it defined but in addition we can
define $Y$  ,the event that the sun rises tomorrow, where $Y$ equals 1 or 0.

2) Let $\theta$ be the probability of a sunrise on any given day.

3) We assume that before observing $X$ we had no prior information concerning
$\theta$. Hence, by the principle of insufficient reason we may assume that
our prior is uniformly distributed on $[0,1]$.

4) Now for the calculation of $P(Y=1|X=n)$:
$P(Y=1| X=n)=\int^1_0 P(Y=1|\theta)P(\theta|X=n)\,d\theta= \int^1_0 \theta \frac{ P(\theta|X=n)}{\int^1_0 P(\theta|X=n)\,d\theta} \,d\theta = \frac{n+1}{n+2}$

Note: the Bayesian approach isn’t better than the Frequentist solution. But, building a statistical model without any domain knowledge is doomed for failure whatever your approach…Bayesian, Frequentist, or otherwise.

# Why do stones tossed into a pond form circular waves regardless of their shape?

Sometime in September I went for a walk around Inverleith Park in
Edinburgh. As I walked past the pond and observed ripples in the
water, I thought about something that I took for granted since I was a child.
For as long as I could remember, stones thrown into water would create waves
that would form concentric rings around the stone…but how exactly did this happen?

I tossed them into the water, I noticed that the ripples eventually(by the 5th
ripple) converged to circular rings regardless of the shape of the stone. I could
only observe the ripples on the surface but I conjectured that the ripples must
form spherical shells around the location of impact. Now I wondered why this
was so…but I didn’t know anything about fluid dynamics. So I decided to use
the internet to find the answer.

Sure enough, I found that somebody had already asked a similar question on the Physics StackExchange. Here’s my summary of the discussion:

The pond water may be approximated as a homogeneous fluid, and the water
waves are longitudinal waves that travel through the fluid. If we assume that
the stone is released at a trajectory that is normal with respect to the surface
of the water, the 1st wave front should travel at nearly constant speed in all
directions that are normal with respect to the submerged surface of the
stone. And the magnitude of this velocity would be proportional to its
momentum upon impact.

If we measure the difference in radii with respect to the centroid of the stone($\Delta$), we would obtain the following inequality: $m \leq \Delta \leq M$

Now, as the initial wave front travels a larger distance this length becomes
much more important compared to the largest difference in radii, $M$.
This is why we eventually perceive a circle. As a matter of fact, if we
analyse the distance travelled by the first wave front as a function of
its number of cycles $n$, we may derive the following ratio:

$\frac{Cn+m}{Cn} \leq f(n) \leq \frac{Cn + M}{Cn}$ and from this we may deduce that $\lim_{n \to +\infty}f(n) = 1$

This may appear to be superficially similar to the Huygens-Fresnel principle,
but the surface of the pond inside the expanding wave is not perfectly calm
after the main wave has passed through as this principle would require.

# Strong Law of Large Numbers

Here I present a very useful version of the strong law of large numbers
where we assume that a random variable $X$ has finite
variance. Briefly, the Strong Law of Large Numbers states that for large
sample sizes the sample average converges almost surely to the expectation
of the random variable $X$ as $n \longrightarrow \infty$. In some cases, where we don’t have a strict upper-bounds for $|X|$ this may not be a good assumption and this is the case for many distributions encountered in Economics or Finance such as the Pareto distribution. But, I dare say that for most scientists this assumption holds for most of the distributions that they deal with.

Lemma: If $E[\sum_{i=1}^{n} |X_{i}|^{s}] < \infty$ , and $s > 0$ then $X_{n} \longrightarrow 0$ almost surely.

Proof:
By the monotone convergence theorem, $E[\sum_{i=1}^{n} |X_{i}|^{s}] < \infty$ which implies that $\sum_{i=1}^{n} |X_{i}|^{s}$ is finite with probability 1. Therefore, $|X_{n}|^{s}\longrightarrow 0$ almost surely which also implies that  $X_{n} \longrightarrow 0$ almost surely.

Now we’re ready to proceed with the proof. But, first we must rigorously state
the version of the Strong Law of Large Numbers to be proven.

Theorem:  Let $X_{1}, X_{2}, ...$ be i.i.d. random variables and assume that $E[|X|^{2}] < \infty$. Let $S_{n} = \sum_{i=1}^{n} X_{i}$, then $\frac{S_{n}}{n}$ converges almost surely to $E[X]$.

Proof:
Assuming that $E[|X|^{2}] < \infty$ we have $E[(\frac{S_{n}}{n}-\mu)^{2}] = \frac{var(X)}{n}$.

If we only consider values of n that are perfect squares, we obtain
$\sum_{i=1}^{\infty} E[(\frac{S_{i}}{i^{2}}-\mu)^{2}] = \sum_{i=1}^{\infty}\frac{var(X)}{i^{2}} < \infty$
which implies that $(\frac{S_{i}}{i^{2}}-E[X])^{2}$ converges to $0$ with probability $1$

Let’s suppose the variables $X_{i}$ are non-negative. Consider some $n$ such that $i^{2} \leq n \leq (i+1)^{2}$. We then have $S_{i^{2}} \leq S_{n} \leq S_{(i+1)^{2}}$. It follows that

$\frac{S_{i^{2}}}{(i+1)^{2}} \leq \frac{S_{n}}{n} \leq \frac{S_{(i+1)^{2}}}{i^{2}}$
or $\frac{i^{2}}{(i+1)^{2}}\frac{S_{i^{2}}}{i^{2}} \leq \frac{S_{n}}{n} \leq \frac{(i+1)^{2}}{i^{2}}\frac{S_{(i+1)^{2}}}{(i+1)^{2}}$

As $n \longrightarrow \infty$$\frac{i}{(i+1)}\longrightarrow 1$ and since $P(\frac{S_{i^{2}}}{i^{2}} \longrightarrow E[X]) = 1$ we have $P(\frac{S_{n}}{n}\longrightarrow E[X]) = 1$

Note: If $X \geq 0$ doesn’t always hold, you can apply the above method to the
positive and negative parts of $X$ where $X = X^{+} -X^{-}$ and show
that the Strong Law of Large Numbers holds for this variable as well due to the linearity of expectation.

# Law of Large Numbers

Alongside the Central Limit Theorem, the Law of Large Numbers is equally important. The Weak Law of Large Numbers essentially states that the sample average converges in probability toward the expected value. There is also a Strong version which I won’t discuss for now. But, both versions are of great importance in science as they imply that large sample sizes are better for estimating population averages.

Mathematically, the Weak Law states that if we let $X_{1}, X_{2}, ...$ be a sequence of iid random variables each having finite mean $E[X_{i}]= \mu$, for any $\epsilon > 0$

$P\{|\frac{1}{n}\sum_{i=1}^{n} X_{i} -\mu| \geq \epsilon\} \longrightarrow 0$ as $n \longrightarrow \infty$

In order to demonstrate this, we shall first go through two Russian inequalities.

Markov’s inequality:
If $X$  is a random variable that takes only non-negative values then for any  $a > 0$,

$P \{X \geq a\} \leq \frac{E[X]}{a}$

Proof: For $a > 0$, let $I = \begin{cases} 1 \iff X \geq a \\ 0 \iff X < a\end{cases}$

and note that since $X \geq 0$, $I \leq \frac{X}{a}$

Taking expectations of he previous inequality yields $E[I] \leq \frac{E[X]}{a}$

$E[I] = P \{ X \geq a \}$ so we have  $P \{X \geq a\} \leq \frac{E[X]}{a}$

Chebyshev’s inequality:
If $X$  is a random variable with finite mean $\mu$ and finite variance
$\sigma^{2}$ then for any value $k> 0$,

$P\{|X -\mu| \geq k\} \leq \frac{\sigma^{2}}{k^{2}}$

Proof: Since $(X-\mu)^2 \geq 0$, we can apply Markov’s inequality with $a = k^2$ to obtain

$P \{(X-\mu)^{2} \geq k^{2}\} \leq \frac{E[(X-\mu)^{2}]}{k^{2}}$

Since $(X-\mu)^{2} \geq k^{2}$ iff $|X-\mu| \geq k$, we have

$P \{(X-\mu)^{2} \geq k^{2}\} \leq \frac{E[(X-\mu)^{2}]}{k^{2}} = \frac{\sigma^{2}}{k^{2}}$

With the above ingredients we are now ready to provide a proof of the Weak Law of Large Numbers:

Assuming that all random variables are iid with finite mean and finite variance,

$E[\frac{1}{n}\sum_{i=1}^{n} X_{i}] = \mu$ and $Var[\frac{1}{n}\sum_{i=1}^{n} X_{i}] = \frac{\sigma^2}{n}$

and it follows from Chebyshev’s inequality that

$P\{|\frac{1}{n}\sum_{i=1}^{n} X_{i} -\mu| \geq \epsilon\} \leq \frac{\sigma^2}{n \epsilon^{2}}$

Now, if we take the limit as $n \longrightarrow \infty$ we obtain
the desired result.