Lempel–Ziv–Welchide
LZ77算法是採用字典作數據壓縮的算法,由以色列的兩位大神Jacob Ziv與Abraham Lempel在1977年發表的論文《A Universal Algorithm for Sequential Data Compression》中提出。post
基於統計的數據壓縮編碼,好比Huffman編碼,須要獲得先驗知識——信源的字符頻率,而後進行壓縮。可是在大多數狀況下,這種先驗知識是很難預先得到。優化
所以,設計一種更爲通用的數據壓縮編碼顯得尤其重要。LZ77數據壓縮算法應運而生,其核心思想:利用數據的重複結構信息來進行數據壓縮。編碼
LZ77: referring to previously processed data as dictionary 利用內部信息做爲字典。url
在提出基於滑動窗口的LZ77算法後,兩位大神Jacob Ziv與Abraham Lempel於1978年在發表的論文中提出了LZ78算法;spa
與LZ77算法不一樣的是LZ78算法使用動態樹狀詞典維護歷史字符串。設計
LZ78: use an explicit dictionary 字典是外置的。code
LZ系列壓縮算法均爲LZ77與LZ78的變種,在此基礎上作了優化。
LZW Encoding:
Video: https://www.youtube.com/watch?v=nW7OARbr7OI
"TO BE OR NOT TO BE OR TO BE OR NOT"
Idea:
如下是咱們已知的字典。
再動態補充新發現的pattern字典,從27開始編號,以下所示:
current | next | code | dictionary | ||
T | O | 20 | TO | 27 | TO BE OR NOT TO BE OR TO BE OR NOT |
O | B | 15 | OB | 28 | TO BE OR NOT TO BE OR TO BE OR NOT |
B | E | 2 | BE | 29 | TO BE OR NOT TO BE OR TO BE OR NOT |
E | O | 5 | EO | 30 | TO BE OR NOT TO BE OR TO BE OR NOT |
O | R | 15 | OR | 31 | TO BE OR NOT TO BE OR TO BE OR NOT |
R | N | 18 | RV | 32 | TO BE OR NOT TO BE OR TO BE OR NOT |
N | O | 14 | NO | 33 | TO BE OR NOT TO BE OR TO BE OR NOT |
O | T | 15 | OT | 34 | TO BE OR NOT TO BE OR TO BE OR NOT |
T | T | 20 | TT | 35 | TO BE OR NOT TO BE OR TO BE OR NOT |
TO | B | 27 | TOB | 36 | TO BE OR NOT TO BE OR TO BE OR NOT |
BE | O | 29 | BEO | 37 | TO BE OR NOT TO BE OR TO BE OR NOT |
OR | T | 31 | ORT | 38 | TO BE OR NOT TO BE OR TO BE OR NOT |
TOB | E | 36 | TOBE | 39 | TO BE OR NOT TO BE OR TO BE OR NOT |
EO | R | 30 | EOR | 40 | TO BE OR NOT TO BE OR TO BE OR NOT |
RN | O | 32 | RNO | 41 | TO BE OR NOT TO BE OR TO BE OR NOT |
OT | # | 34 | N/A | N/A | TO BE OR NOT TO BE OR TO BE OR NOT |
Input | Output |
這裏共16行,也就是原來的24字節 --> 16字節。
LZW Decoding:
code | prev | output | dictionary | ||
20 | T | TO BE OR NOT TO BE OR TO BE OR NOT | |||
15 | T | O | TO | 27 | TO BE OR NOT TO BE OR TO BE OR NOT |
2 | O | B | OB | 28 | TO BE OR NOT TO BE OR TO BE OR NOT |
5 | B | E | BE | 29 | TO BE OR NOT TO BE OR TO BE OR NOT |
15 | E | O | EO | 30 | TO BE OR NOT TO BE OR TO BE OR NOT |
18 | O | R | OR | 31 | TO BE OR NOT TO BE OR TO BE OR NOT |
14 | R | N | RN | 32 | TO BE OR NOT TO BE OR TO BE OR NOT |
15 | N | O | NO | 33 | TO BE OR NOT TO BE OR TO BE OR NOT |
20 | O | T | OT | 34 | TO BE OR NOT TO BE OR TO BE OR NOT |
27 | T | TO | TT | 35 | TO BE OR NOT TO BE OR TO BE OR NOT |
29 | TO | BE | TOB | 36 | TO BE OR NOT TO BE OR TO BE OR NOT |
31 | BE | OR | BEO | 37 | TO BE OR NOT TO BE OR TO BE OR NOT |
36 | OR | TOB | ORT | 38 | TO BE OR NOT TO BE OR TO BE OR NOT |
30 | TOB | EO | TOBE | 39 | TO BE OR NOT TO BE OR TO BE OR NOT |
32 | EO | RN | EOR | 40 | TO BE OR NOT TO BE OR TO BE OR NOT |
34 | RN | OT | RNO | 41 | TO BE OR NOT TO BE OR TO BE OR NOT |
<Output> | <Input> |
可見與encoding時表格一一對應的關係。
就是還原表格的過程。