Let each source symbol from \(S=\{s_i\}_{i=1}^n\) be encoded into a uniquely decodable code over an alphabet of size \(r\) with codewords of lengths \(\{l_i\}_{i=1}^n\).

Then we have:

\begin{equation} \sum_{i=1}^n r^{-l_i} \leq 1 \end{equation}


The following expression:

\begin{equation} (\sum_{i=1}^n r^{-l_i})^k \end{equation}

is equivalent to:

\begin{equation} \sum_{j=k \cdot l_1}^{k \cdot l_n} C_j \cdot r^{-j} \end{equation}

where \(C_j\) is the number of combinations of \(k\)-codewords that have a total length of \(j\).

We also know that \(C_j \leq r^j\) so the combination of (2) and (3) yields:

\begin{equation} \big(\sum_{i=1}^r r^{-l_i}\big)^k = \sum_{j=k \cdot l_1}^{k \cdot l_n} C_j \cdot r^{-j} \leq k \cdot (l_n - l_1) + 1 \end{equation}

which implies that we have:

\begin{equation} \sum_{i=1}^n s^{-l_i} \leq (k \cdot (l_n - l_1) + 1)^{\frac{1}{k}} \end{equation}

and as we let \(k \to \infty\) we obtain the desired inequality (1).


  1. Kraft, Leon G. (1949), A device for quantizing, grouping, and coding amplitude modulated pulses, Cambridge, MA: MS Thesis, Electrical Engineering Department, Massachusetts Institute of Technology.

  2. McMillan, Brockway (1956), “Two inequalities implied by unique decipherability”, IEEE Trans. Inf. Theory

  3. C.E. Shannon, “A Mathematical Theory of Communication”, Bell System Technical Journal, vol. 27, pp. 379–423, 623-656, July, October, 1948.

  4. David J. C. MacKay. Information Theory, Inference, and Learning Algorithms Cambridge: Cambridge University Press, 2003.