Menu

Project Gimu

Software research and development

Huffman Coding

Huffman coding is a compression technique used to reduce the number of bits needed to store a message. Codes are basically generated by building a tree from bottom up (as opposed to Shannon-Fano coding), using the occurrence amount of input entities (letters in the following example).
This leads to the codes being prefix-free, meaning that each input yields a unique sequence of bits.

The following section summarizes the simple encoding process.

Example sentence: “`This_is_an_example“

Step 1: Preparing

Count the frequencies of existing letters

A E H I L M N P S T X _
2 2 1 2 1 1 1 1 2 1 1 3

Step 2: Combining (Bottom-Up)

Recursively combine the lowest two frequencies.

Step 2.1

In this example multiple letters with the same (and lowest) frequency (namely H, L, M, N, P, T and X) exist, therefore arbitrary combine those. The parent node displays the sum of frequencies.

   2         2          2          3
 /   \     /   \      /   \      /   \
H     L   M     N    P     T    X     A

X has been combined with A, as there is no letter with frequency 1 left.

The new frequency table:

HL MN PT XA E I S _
2  2  2  3  2 2 2 3

Step 2.2

We repeat the step on the above table and combine the letters with frequency 2…

HLE MNI PTS XA _
4   4   4   3  3 

     4          4           4
    / \        / \         / \
   2   E      2   I       2   S   
 /   \      /   \       /   \  
H     L    M     N     P     T

Step 2.3

And again for the above table (letters with frequency 3)…

HLE MNI PTS XA_
4   4   4   6

     4          4           4          6
    / \        / \         / \        / \
   2   E      2   I       2   S      3   _
 /   \      /   \       /   \      /   \
H     L    M     N     P     T    X     A

Step 2.4

And again (letters with frequency 4)…

HLMNPTE ISXA
8       10

       ___8___                ___10____  
      /       \              /         \
     4          4           4           6
    / \        / \         / \         / \
   2   E      2   I       2   S       3   _
 /   \      /   \       /   \       /   \
H     L    M     N     P     T     X     A

Step 2.5

To round it up…

HLMNPTEISXA
18 (root)

            _________18_________
           /                    \
       ___8___                ___10____  
      /       \              /         \
     4          4           4           6
    / \        / \         / \         / \
   2   E      2   I       2   S       3   _
 /   \      /   \       /   \       /   \
H     L    M     N     P     T     X     A

Step 3: Assigning Codes

Now arbitrary declare that the left path of every parent node is indicating the number 0 and the right path respectivly 1.

This leads to the following (prefix-free) codes:

H: 0000
L: 0001
M: 0100
N: 0101
P: 1000
T: 1001
X: 1100
A: 1101
E: 001
I: 011
S: 101
_: 111

Analysis

Symbol  Code  Frequency  Code     Total
                         Length   Length
-----------------------------------------
H       0000  1          4        4
L       0001  1          4        4
M       0100  1          4        4          
N       0101  1          4        4
P       1000  1          4        4
T       1001  1          4        4
X       1100  1          4        4
A       1101  2          4        8
E       001   2          3        6
I       011   2          3        6
S       101   2          3        6
_       111   3          3        9

Leave a Reply