McMillan's Theorem

In the field of information theory, McMillan's Theorem (also known as the Kraft-McMillan Inequality) provides the essential criteria to ascertain whether a prefix code can be uniquely decoded. The theorem asserts:

A set of string lengths $ \{s_1, s_2, ..., s_n\} $ defined over an alphabet of r symbols constitutes a uniquely decodable prefix code if and only if it satisfies the following condition: $$ \sum_{i=1}^{n} r^{-s_i} \leq 1 $$ Here, $ r $ denotes the number of symbols in the code's alphabet (e.g., $ r = 2 $ for binary codes), and $ s_i $ represents the length of the encoded string for symbol $ i $.

If this cumulative sum is less than or equal to one, then it is possible to construct a prefix code that ensures unique decodability.

The theorem further implies that if such a code exists, then the combined lengths of the codes, when taken to the negative power of the number of symbols in the alphabet, must not exceed one.

Essentially, McMillan's theorem provides a method for assessing whether a set of strings {s₁, s₂, ..., s_n}, derived from an alphabet of n symbols, qualifies as an alphabet in its own right.

Application of McMillan's Theorem to Prefix Codes: A prefix code is characterized by each string in the set being such that no string is a prefix of another. For instance, the set $$ \{s_1, s_2, s_3\} = \{"0", "01", "11"\} $$ is not a prefix code because "0" serves as a prefix for "01". Conversely, the set $$ \{s_1, s_2, s_3\} = \{"00", "01", "11"\} $$ qualifies as a prefix code as no string acts as a prefix to another.

Example

Let's examine a concrete example to elucidate McMillan's theorem.

Consider a basic binary alphabet consisting of the symbols $ r = 2 $, namely 0 and 1.

Creating a set of strings with this alphabet:

$$ \{s_1, s_2, s_3\} = \{"00", "01", "11"\} $$

This set is a prefix code as no string is a prefix of any other.

Yet, to determine if it is also uniquely decodable, we apply McMillan's Theorem.

The lengths of $ s_1 = "00" $, $ s_2 = "01" $, and $ s_3 = "11" $ are all 2.

Upon applying the theorem:

$$ \sum_{i=1}^{3} 2^{-|s_i|} = 2^{-2} + 2^{-2} + 2^{-2} = \frac{1}{4} + \frac{1}{4} + \frac{1}{4} = \frac{3}{4} $$

Here, the sum is less than 1, fulfilling the theorem's condition.

$$ \sum_{i=1}^{3} 2^{-|s_i|} = \frac{3}{4} \le 1 $$

This confirms that our specific set of strings {s₁,s₂,s₃} qualifies as a "uniquely decodable code" based on the theorem's criteria.

Note: If all possible permutations of the symbols "0" and "1" in the binary alphabet were considered, we would obtain a set of four strings. $$ \{s_1, s_2, s_3, s_4\} = \{"00", "01", "11", "10"\} $$ In this scenario, McMillan's theorem would precisely yield a sum of 1. $$ \sum_{i=1}^{4} 2^{-|s_i|} = 2^{-2} + 2^{-2} + 2^{-2} + 2^{-2} = \frac{1}{4} + \frac{1}{4} + \frac{1}{4} + \frac{1}{4} = 1 $$ This establishes the upper limit on the amount of information that can be encoded with a series of strings from the alphabet.