[IR] Arithmetic Coding

Statistical methods的除了huffman外的另外一種常見壓縮方式。ide

 

Huffman coding的非連續數值特性成爲了沒法達到香農極限的先天沒法彌補的缺陷,但Arithmetic coding給出了better solution。優化

固然,最好的東西每每伴隨着各類專利。google


 

2012年以後,貌似能夠有一部分能夠用了呢。spa

 

Encoding:code

每一個字符分配一個Range,size就是其比例(Probability)。blog

Algorithm:ip

Set low  to 0.0
Set high to 1.0

While there are still input symbols do
get an input symbol
  code_range = high - low.
  high = low + range*high_range(symbol)
  low  = low + range*low_range (symbol)
End of While
output low or a number within the range

  

Decoding:get

第四行:0.72167752, Low:0.6, High:0.8, 那麼,下一個char會是什麼input

range=0.8-0.6=0.2it

encoded number = (0.72167752-0.6)/0.20.6083876 --> L

 

Algorithm:

get encoded number
Do   find symbol whose range straddles the encoded number   output the symbol   range = symbol high value - symbol low value   subtract symbol low value from encoded number   divide encoded number by range until no more symbols

 

 

優化技巧:

 

其實,0.45即能解碼成功。

大大地提升了壓縮率。

 

Bzip2 and JPG use Huffman as AC protected by patents
PackJPG using AC shows 25% of size saving

 

關於專利:

U.S. Patent 4,122,440 — (IBM) Filed 4 March 77, Granted 24 October 78 (Now expired)U.S. Patent 4,286,256 — (IBM) Granted 25 August 81 (Now expired)U.S. Patent 4,467,317 — (IBM) Granted 21 August 84 (Now expired)U.S. Patent 4,652,856 — (IBM) Granted 4 February 86 (Now expired)U.S. Patent 4,891,643 — (IBM) Filed 15 September 86, granted 2 January 90 (Now expired)U.S. Patent 4,905,297 — (IBM) Filed 18 November 88, granted 27 February 90 (Now expired)U.S. Patent 4,933,883 — (IBM) Filed 3 May 88, granted 12 June 90 (Now expired)U.S. Patent 4,935,882 — (IBM) Filed 20 July 88, granted 19 June 90 (Now expired)U.S. Patent 4,989,000 — Filed 19 June 89, granted 29 January 91 (Now expired)U.S. Patent 5,099,440 — (IBM) Filed 5 January 90, granted 24 March 92 (Now expired)U.S. Patent 5,272,478 — (Ricoh) Filed 17 August 92, granted 21 December 93 (Now expired)

相關文章
相關標籤/搜索