[IR] Dictionary Coding

【數據壓縮】LZ77算法原理及實現html

【數據壓縮】LZ78算法原理及實現算法

Lempel–Ziv–Welchide


 

LZ77算法是採用字典作數據壓縮的算法,由以色列的兩位大神Jacob ZivAbraham Lempel1977年發表的論文《A Universal Algorithm for Sequential Data Compression》中提出。post

基於統計的數據壓縮編碼,好比Huffman編碼,須要獲得先驗知識——信源的字符頻率,而後進行壓縮。可是在大多數狀況下,這種先驗知識是很難預先得到。優化

所以,設計一種更爲通用的數據壓縮編碼顯得尤其重要。LZ77數據壓縮算法應運而生,其核心思想:利用數據的重複結構信息來進行數據壓縮。編碼

LZ77: referring to previously processed data as dictionary 利用內部信息做爲字典。url

 

在提出基於滑動窗口的LZ77算法後,兩位大神Jacob ZivAbraham Lempel1978年在發表的論文中提出了LZ78算法spa

與LZ77算法不一樣的是LZ78算法使用動態樹狀詞典維護歷史字符串。設計

LZ78: use an explicit dictionary 字典是外置的。code

 

LZ系列壓縮算法均爲LZ77LZ78的變種,在此基礎上作了優化。

  • LZ77LZSS、LZR、LZB、LZH;
  • LZ78LZW、LZC、LZT、LZMW、LZJ、LZFG。

 


 

LZW Encoding:

  Video: https://www.youtube.com/watch?v=nW7OARbr7OI

"TO BE OR NOT TO BE OR TO BE OR NOT"

Idea:

如下是咱們已知的字典。

再動態補充新發現的pattern字典,從27開始編號,以下所示:

current next code dictionary  
T O 20 TO 27 TO BE OR NOT TO BE OR TO BE OR NOT
O B 15 OB 28 TO BE OR NOT TO BE OR TO BE OR NOT
B E 2 BE 29 TO BE OR NOT TO BE OR TO BE OR NOT
E O 5 EO 30 TO BE OR NOT TO BE OR TO BE OR NOT
O R 15 OR 31 TO BE OR NOT TO BE OR TO BE OR NOT
R N 18 RV 32 TO BE OR NOT TO BE OR TO BE OR NOT
N O 14  NO  33  TO BE OR NOT TO BE OR TO BE OR NOT
O 15  OT  34  TO BE OR NOT TO BE OR TO BE OR NOT
T T 20 TT  35  TO BE OR NOT TO BE OR TO BE OR NOT
TO B 27 TOB 36  TO BE OR NOT TO BE OR TO BE OR NOT
BE O 29  BEO 37  TO BE OR NOT TO BE OR TO BE OR NOT
OR 31  ORT  38  TO BE OR NOT TO BE OR TO BE OR NOT
TOB 36  TOBE  39  TO BE OR NOT TO BE OR TO BE OR NOT
EO 30  EOR  40  TO BE OR NOT TO BE OR TO BE OR NOT
RN O 32 RNO 41 TO BE OR NOT TO BE OR TO BE OR NOT
OT # 34 N/A N/A TO BE OR NOT TO BE OR TO BE OR NOT
Input   Output      

這裏共16行,也就是原來的24字節 --> 16字節。 

 

LZW Decoding:

code prev output dictionary  
20   T     TO BE OR NOT TO BE OR TO BE OR NOT
15 T O TO 27 TO BE OR NOT TO BE OR TO BE OR NOT
2 O B OB 28 TO BE OR NOT TO BE OR TO BE OR NOT
5 B E BE 29 TO BE OR NOT TO BE OR TO BE OR NOT
15 E O EO 30 TO BE OR NOT TO BE OR TO BE OR NOT
18 O R OR 31 TO BE OR NOT TO BE OR TO BE OR NOT
14 R N RN 32 TO BE OR NOT TO BE OR TO BE OR NOT
15 N O NO 33 TO BE OR NOT TO BE OR TO BE OR NOT
20 O T OT 34 TO BE OR NOT TO BE OR TO BE OR NOT
27 T TO TT 35 TO BE OR NOT TO BE OR TO BE OR NOT
29 TO BE TOB 36 TO BE OR NOT TO BE OR TO BE OR NOT
31 BE OR BEO 37 TO BE OR NOT TO BE OR TO BE OR NOT
36 OR TOB ORT 38 TO BE OR NOT TO BE OR TO BE OR NOT
30 TOB EO TOBE 39 TO BE OR NOT TO BE OR TO BE OR NOT 
32 EO RN EOR 40 TO BE OR NOT TO BE OR TO BE OR NOT 
34 RN OT RNO 41 TO BE OR NOT TO BE OR TO BE OR NOT 
 <Output>   <Input>      

可見與encoding時表格一一對應的關係。

就是還原表格的過程。

相關文章
相關標籤/搜索