Huffman Code

Huffman Code is a lossless data compression algorithm that assigns variable-length binary codes to input characters based on their frequency. It is widely used in data compression applications such as file compression and encoding.

1 Concept[편집 | 원본 편집]

Huffman coding works by assigning shorter codes to more frequent characters and longer codes to less frequent characters, minimizing the total number of bits used to represent the data.

2 Algorithm[편집 | 원본 편집]

The Huffman coding algorithm follows these steps:

Calculate Frequency: Count the occurrence of each character in the input data.
Build a Priority Queue: Create a min-heap where each node represents a character and its frequency.
Construct a Huffman Tree:
- Extract the two nodes with the lowest frequency.
- Merge them into a new node with their combined frequency.
- Repeat until only one node remains, which becomes the root of the tree.
Assign Binary Codes: Traverse the tree and assign binary values (0 for left, 1 for right) to generate unique Huffman codes for each character.

3 Example[편집 | 원본 편집]

Given the input string hello world, the frequency table and corresponding Huffman codes might be:

Character	Frequency	Huffman Code
h	1	1100
e	1	1101
l	3	10
o	2	00
w	1	1110
r	1	1111
d	1	01

The compressed representation replaces each character with its Huffman code.

4 Implementation[편집 | 원본 편집]

A simple implementation of Huffman coding in Python:

import heapq
from collections import Counter, namedtuple

class Node(namedtuple("Node", ["char", "freq", "left", "right"])):
    def __lt__(self, other):
        return self.freq < other.freq

def build_huffman_tree(text):
    freq = Counter(text)
    heap = [Node(char, freq, None, None) for char, freq in freq.items()]
    heapq.heapify(heap)

    while len(heap) > 1:
        left = heapq.heappop(heap)
        right = heapq.heappop(heap)
        merged = Node(None, left.freq + right.freq, left, right)
        heapq.heappush(heap, merged)

    return heap[0]

def generate_codes(node, prefix="", codebook={}):
    if node:
        if node.char:
            codebook[node.char] = prefix
        generate_codes(node.left, prefix + "0", codebook)
        generate_codes(node.right, prefix + "1", codebook)
    return codebook

text = "hello world"
tree = build_huffman_tree(text)
codes = generate_codes(tree)
print(codes)

5 Advantages[편집 | 원본 편집]

Lossless Compression
- Ensures no data loss in reconstruction.
Efficient Encoding
- Optimized for frequent characters, reducing overall storage size.
Used in Many Applications
- Commonly applied in ZIP, GZIP, JPEG, and other compression formats.

6 Limitations[편집 | 원본 편집]

Requires Frequency Table
- The table must be transmitted alongside the encoded data.
Not Optimal for Small Texts
- Overhead can be significant for short messages.
Sensitive to Input Data
- Performance depends on character frequency distribution.

7 Applications[편집 | 원본 편집]

File Compression
- Used in ZIP and GZIP formats.
Multimedia Encoding
- Applied in JPEG and MP3 compression.
Data Transmission
- Efficient encoding in communication protocols.

익명 사용자

검색

Huffman Code

이름공간

더 보기

문서 행위

Contents

1 Concept[편집 | 원본 편집]

2 Algorithm[편집 | 원본 편집]

3 Example[편집 | 원본 편집]

4 Implementation[편집 | 원본 편집]

5 Advantages[편집 | 원본 편집]

6 Limitations[편집 | 원본 편집]

7 Applications[편집 | 원본 편집]

8 See Also[편집 | 원본 편집]

둘러보기

둘러보기

광고

위키 도구

위키 도구

익명 사용자

검색

Huffman Code

1 Concept[편집 | 원본 편집]

2 Algorithm[편집 | 원본 편집]

3 Example[편집 | 원본 편집]

4 Implementation[편집 | 원본 편집]

5 Advantages[편집 | 원본 편집]

6 Limitations[편집 | 원본 편집]

7 Applications[편집 | 원본 편집]

8 See Also[편집 | 원본 편집]

둘러보기

위키 도구

문서 도구