Huffman Code 편집하기

'''Huffman Code''' is a lossless data compression algorithm that assigns variable-length binary codes to input characters based on their frequency. It is widely used in data compression applications such as file compression and encoding.
==Concept==
Huffman coding works by assigning shorter codes to more frequent characters and longer codes to less frequent characters, minimizing the total number of bits used to represent the data.
==Algorithm==
The Huffman coding algorithm follows these steps:
#'''Calculate Frequency:''' Count the occurrence of each character in the input data.
#'''Build a Priority Queue:''' Create a min-heap where each node represents a character and its frequency.
#'''Construct a Huffman Tree:'''
#*Extract the two nodes with the lowest frequency.
#*Merge them into a new node with their combined frequency.
#*Repeat until only one node remains, which becomes the root of the tree.
#'''Assign Binary Codes:''' Traverse the tree and assign binary values (0 for left, 1 for right) to generate unique Huffman codes for each character.
==Example==
Given the input string '''hello world''', the frequency table and corresponding Huffman codes might be:
{| class="wikitable"
!Character!!Frequency!!Huffman Code
|-
|h||1||1100
|-
|e||1||1101
|-
|l||3||10
|-
|o||2||00
|-
|w||1||1110
|-
|r||1||1111
|-
|d||1||01
|}The compressed representation replaces each character with its Huffman code.
==Implementation==
A simple implementation of Huffman coding in Python:<syntaxhighlight lang="python">
import heapq
from collections import Counter, namedtuple

class Node(namedtuple("Node", ["char", "freq", "left", "right"])):
    def __lt__(self, other):
        return self.freq < other.freq

def build_huffman_tree(text):
    freq = Counter(text)
    heap = [Node(char, freq, None, None) for char, freq in freq.items()]
    heapq.heapify(heap)

    while len(heap) > 1:
        left = heapq.heappop(heap)
        right = heapq.heappop(heap)
        merged = Node(None, left.freq + right.freq, left, right)
        heapq.heappush(heap, merged)

    return heap[0]

def generate_codes(node, prefix="", codebook={}):
    if node:
        if node.char:
            codebook[node.char] = prefix
        generate_codes(node.left, prefix + "0", codebook)
        generate_codes(node.right, prefix + "1", codebook)
    return codebook

text = "hello world"
tree = build_huffman_tree(text)
codes = generate_codes(tree)
print(codes)
</syntaxhighlight>
==Advantages==
*'''Lossless Compression'''
**Ensures no data loss in reconstruction.
*'''Efficient Encoding'''
**Optimized for frequent characters, reducing overall storage size.
*'''Used in Many Applications'''
**Commonly applied in ZIP, GZIP, JPEG, and other compression formats.
==Limitations==
*'''Requires Frequency Table'''
**The table must be transmitted alongside the encoded data.
*'''Not Optimal for Small Texts'''
**Overhead can be significant for short messages.
*'''Sensitive to Input Data'''
**Performance depends on character frequency distribution.
==Applications==
*'''File Compression'''
**Used in ZIP and GZIP formats.
*'''Multimedia Encoding'''
**Applied in JPEG and MP3 compression.
*'''Data Transmission'''
**Efficient encoding in communication protocols.
==See Also==
*[[Data Compression]]
*[[Prefix Code]]
*[[Run-Length Encoding]]
*[[Shannon-Fano Coding]]
*[[Entropy Encoding]]