Shannon-Fano Coding

Shannon-Fano coding is another compression technique used to reduce the number of bits needed to store a message, however unlike Huffman coding, it does not use a bottom-up approach and neither does it generate minimal codes.

The following section summarizes the simple encoding process.

Example sentence: This_is_an_example

Step 1: Preparing

Count the frequencies of existing letters

A E H I L M N P S T X _
2 2 1 2 1 1 1 1 2 1 1 3

Sort symbols based on their frequencies

H L M N P T X A E I S _
1 1 1 1 1 1 1 2 2 2 2 3

Step 2: Combining (Top-Down)

Recursively divide the set of symbols into two parts, each with approximately same number of frequencies. The process of splitting two symbols has been omitted in the following section.

Step 2.1

Total AEISHLMNPTX: 18 => ~9 each.

     AEISHLMNPTX_
    /            \
AEISH            LMNPTX_

Step 2.2

Total AEISH: 9 => ~4.5 each.
Total LMNPTX_: 9 => ~4.5 each.

      AEISHLMNPTX_
     /             \
  AEISH            LMNPTX_
 /     \          /      \
AE      ISH      LMNP     TX_

Step 2.3

Total ISH: 5 => 2.5 each.
Total LMNP: 4 => 2 each.
Total TX_: 5 => 2.5 each.

        AEISHLMNPTX_
       /             \
    AEISH            LMNPTX_
   /     \          /      \
  AE      ISH      LMNP     TX_
 /  \    /  \      /  \     /  \
A    E  I   SH    LM  NP   TX   _

Step 2.4

And we finally get the desired tree…

        AEISHLMNPTX_
       /             \
    AEISH            LMNPTX_
   /     \          /      \
  AE      ISH      LMNP     TX_
 /  \    /  \      /  \     /  \
A    E  I   SH    LM  NP   TX   _
            /\   /\   /\   /\
           S H  L M  N P  T X

Step 3: Assigning Codes

Now arbitrary declare that the left path of every parent node is indicating the number 0 and the right path respectivly 1.

Analysis

The above encoding process leads to the following result:

Symbol  Code  Frequency  Code     Total
                         Length   Length
-----------------------------------------
A       000     2         3        6
E       001     2         3        6
I       010     2         3        6
S       0110    2         4        8
H       0111    1         4        4
L       1000    1         4        4
M       1001    1         4        4
N       1010    1         4        4
P       1011    1         4        4
T       1100    1         4        4
X       1101    1         4        4
_       111     3         3        9

Total symbols: 18 (sum of frequencies)
Bits needed with Shannon-Fano coding: 63 Bit (sum of total length)
Bits needed without Shannon-Fano coding: 64 Bit (see below)

Calculating how many Bits we need when not using Shannon-Fano coding:

12 (2^4) unique symbols => 4 Bit/Symbol needed
Total Bits needed: 4 Bit * 18 = 64 Bit

Leave a Reply