Zipf’s law and power laws in general allow scientists to model a large range of integer-valued observables. Sometimes the range of values is allowed to be infinite and sometimes the state-space is finite. In this article, I shall go over the arguments of Matt Visser who demonstrates that power laws may be obtained by applying maximum entropy methods directly to the Shannon entropy subject to the constraint of the average of the logarithm of the observed quantity.

This method allows us to readily derive the Riemann Zeta function as a normalising constant.

Power laws in infinite state space:

Let’s define the set of observed quantities to be positive integers \(n \in \mathbb{N}\) without an upper-bound. The maximum entropy approach involves estimating the probabilities \(p_n\) by maximising the Shannon entropy:

\begin{equation} S = - \sum_{n} p_n \cdot \ln p_n \end{equation}

subject to a small number of constraints.

We may define the single constraint:

\begin{equation} \langle \ln n \rangle = \sum_{n=1}^{\infty} p_n \ln n = \chi \end{equation}

which may be interpreted as the probability that we observe the integer \(n\) given that the Kolmogorov Complexity of \(n\) scales with \(\ln n\).

If we are to maximise the Shannon entropy with respect to this constraint, this is best done by introducing a Lagrange multiplier \(z\) corresponding to the constraint \(\chi\) as well as a second Lagrange multiplier \(\lambda\) corresponding to the normalisation constraint \(\sum_{n} p_n = 1\):

\begin{equation} \hat{S} = -z\big(\sum_{n=1}^{\infty} p_n \ln n - \chi \big) - \lambda \big(\sum_{n=1}^{\infty} p_n -1 \big) - \sum_{n=1}^{\infty} p_n \ln p_n \end{equation}

Without loss of generality, we may redefine the Lagrange multipliers as follows:

\begin{equation} \hat{S} = -z\big(\sum_{n=1}^{\infty} p_n \ln p_n - \chi \big) - (\ln Z - 1) \big(\sum_{n=1}^{\infty} p_n - 1\big) - \sum_{n=1}^{\infty} p_n \ln p_n \end{equation}

Varying with respect to \(p_n\) yields:

\begin{equation} \frac{\partial \hat{S}}{\partial p_n} = -z \cdot \ln n - \ln Z - \ln p_n = 0 \end{equation}

which simplifies to \(\ln p_n \cdot Z = \ln n^{-z}\) so we have:

\begin{equation} \sum_{n} p_n = \frac{1}{Z} \sum_{n} \frac{1}{n^z} = 1 \end{equation}

and therefore we have the explicit solution:

\begin{equation} Z = \zeta (z) \end{equation}

\begin{equation} p_n = \frac{n^{-z}}{\zeta (z)} \end{equation}

where \(z > 1\) in order to guarantee that \(\zeta (z)\) converges. However, it is worth noting that we would never exactly obtain Zipf’s law(\(z=1\)) although we may get as close as desired.

Given (7) and (8) we find that:

\begin{equation} \chi(z) = \langle \ln n \rangle = \frac{\sum_{n=1}^{\infty} n^{-z} \cdot \ln n}{\zeta (z)} = - \frac{d \ln \zeta (z)}{dz} \end{equation}

so at maximum entropy we have:

\begin{equation} S(z) = - \sum_{n=1}^{\infty} p_n \ln p_n = \ln \zeta (z) + z \chi (z) \end{equation}

Moreover, it is worth noting that:

\begin{equation} \exp \langle \ln n \rangle = \prod_{n=1}^{\infty} n^{p_n} \end{equation}

is the geometric mean of the integers \(\mathbb{N}\) with exponents weighted by the probabilities \(p_n\). So we may also obtain pure power laws by maximising the entropy subject to the constraint that the geometric mean takes on a specified value.

Power laws in finite state space:

If one desires an exact Zipf’s law(i.e. \(z=1\) so \(p_n \propto \frac{1}{n}\)) then the simplest thing to do is to place an upper-bound on \(N\).

The maximum entropy approach then amounts to considering:

\begin{equation} \hat{S} = -z \big(\sum_{n=1}^N p_n \ln p_n - \chi \big) - (\ln Z - 1) \big(\sum_{n=1}^N p_n - 1\big) - \sum_{n=1}^N p_n \ln p_n \end{equation}

Varying with respect to \(p_n\) yields the familiar equation:

\begin{equation} \frac{\partial \hat{S}}{\partial p_n} = -z \ln n - \ln Z - \ln p_n = 0 \end{equation}

which implies that:

\begin{equation} p_n = \frac{n^{-z}}{H_N (z)} \end{equation}

\begin{equation} Z = H_N (z) \end{equation}


\begin{equation} H_N (z) = \sum_{n=1}^N \frac{1}{n^z} \implies \zeta (z) = \lim_{N \to \infty} H_N (z) \end{equation}

As the sum is now always finite, there is no longer any constraint on the exponent \(z\). The case of \(z=1\) is Zipf’s law and \(z=0\) corresponds to the uniform distribution. Moreover, at maximum entropy we have:

\begin{equation} S(z,N) = - \sum_{n=1}^{\infty} p_n \ln p_n = \ln H_N (z) + z \chi (z,N) \end{equation}


This observation that the Riemann Zeta function should make a natural appearance in the derivation of Zipf’s law makes me wonder whether it occurs elsewhere in mathematical statistics.


  1. Matt Visser. Zipf’s law, power laws, and maximum entropy. Arxiv. 2013.

  2. George K Zipf, Human Behavior and the Principle of Least Effort, (Addison-Wesley, 1949).