關於算術編碼的具體講解我很少細說,本文按照下述三個部分構成。html
編碼過程:將字符映射到 [0,1) 的區間的一個數算法
稍微說明一下,一開始將區間分爲好幾段,每一段表示一個字符。編碼字符e的時候,就把原先區間表示e的那一段放大,對這個區間進行劃分得到子區間,每一個子區間也是表明一個字符。依次進行下去。編碼結束的時候得到的那個區間就是咱們要的,咱們能夠在這中間取個數就行了。ide
僞代碼是這樣的:oop
解碼過程:將編碼獲得的數還原成字符串。this
大概思路是這樣的,就是每次看那個數處落在哪一個子區間段,而後輸出這個區間段所表示的字符。以後,調整區間以及這個數,遞歸知道輸出全部編碼字符爲止。編碼
首先咱們得確切知道咱們到底編碼出來的是什麼,而後咱們才能去進一步去證實。spa
通過上一步的直觀認識,咱們應該知道編碼結束的時候咱們得到一個最終的區間,而後取這個區間中的一個值來表示最終的編碼。在實踐中,咱們是輸出子區間上下界中的共同位。好比咱們最終獲得的區間是[0.1010011,0.1010000)那麼共同位就是0.10100,固然嘍,方便起見,咱們就只保存10100就行了,而把小數點什麼的去掉。3d
接下來就是證實了。code
着重講一下編碼過程當中字符編碼的實現,先看一下代碼。功能在於完成一個字符的編碼工做htm
1: static void bit_plus_follow(int); /* Routine that follows */
2: static code_value low, high; /* Ends of the current code region */
3: static long bits_to_follow; /* Number of opposite bits to output after */
4:
5:
6: void encode_symbol(int symbol,int cum_freq[])
7: {
8: long range; /* Size of the current code region */
9: range = (long)(high-low)+1;
10:
11: high = low + (range*cum_freq[symbol-1])/cum_freq[0]-1; /* Narrow the code region to that allotted to this */
12: low = low + (range*cum_freq[symbol])/cum_freq[0]; /* symbol. */
13:
14: for (;;)
15: { /* Loop to output bits. */
16: if (high<Half) {
17: bit_plus_follow(0); /* Output 0 if in low half. */
18: }
19: else if (low>=Half) { /* Output 1 if in high half.*/
20: bit_plus_follow(1);
21: low -= Half;
22: high -= Half; /* Subtract offset to top. */
23: }
24: else if (low>=First_qtr && high<Third_qtr) { /* Output an opposite bit later if in middle half. */
25: bits_to_follow += 1;
26: low -= First_qtr; /* Subtract offset to middle*/
27: high -= First_qtr;
28: }
29: else break; /* Otherwise exit loop. */
30: low = 2*low;
31: high = 2*high+1; /* Scale up code range. */
32: }
33: }
34:
35: static void bit_plus_follow(int bit)
36: {
37: output_bit(bit); /* Output the bit. */
38: while (bits_to_follow>0) {
39: output_bit(!bit); /* Output bits_to_follow */
40: bits_to_follow -= 1; /* opposite bits. Set */
41: } /* bits_to_follow to zero. */
42: }
詳細說明:
6-12行就是簡單地計算,根據當前編碼字符找到咱們須要的子區間。前面講到僞代碼的時候編碼到這一步的時候就已經完成對該字符的編碼,即將對下一字符編碼了。但是,實際操做的時候,咱們看到這樣一次次運行,區間會愈來愈小,也就意味着要存的那個數位數愈來愈多,那麼咱們的計算機能不能存下呢?這是個很嚴重的問題。
解決的方法是這樣的,咱們注意到,要是區間的上下界中前面幾個字符是同樣的,那麼之後編碼的時候它們仍是同樣不變的.舉個例子,要是編碼區間爲[0.1101,0.1111),那麼後來再怎麼編碼,獲得的區間仍是[0.11~,0.11~)前面幾個字符是同樣的。那麼咱們是否是能夠進行輸出了呢,這樣就能夠避免溢出啦!16-23行代碼就是執行這個的。
細心的同窗就發現了還有24-28行代碼的存在,他們是幹嗎的呢?
咱們舉個,就是說區間卡在0.5這個地方,區間爲[0.10~,0.01~)那麼這種狀況怎麼處理?由於顯然要是始終這樣下去的話,16-23行代碼是無能爲力的。對此咱們也是能夠處理的。
此時的區間上下界應該是相似這樣,前面相同的部分咱們就不看了,默認已經由16-23行代碼處理完畢。
咱們先看這個例子,假設區間是[0.011,0.101),那麼畫圖來看的話區間就是處於[3/8,6/8)之間,咱們將原先區間的[2/8,6/8)放大一倍,那麼此時原先的子區間就變成了[2/8,1),能夠參見下圖。
咱們注意到放大後,若是編碼下一個字符的時候,子區間存在於上半部分,也就是上圖右邊[4/8,1)之間,那麼也就是上圖左邊[4/8,6/8)的位置,這個部分的編碼爲10,因此輸出10。
經過這個例子咱們就知道怎麼處理了。
首先記錄一下從[2/8,6/8)放大到區間[0,1)的次數bits_to_follow ,直到區間長度大於0.5爲止。
而後開始編碼下一個字符,若是區間存在於上半部,則輸出10000,其中0的個數爲bits_to_follow 個。
若是區間存在於下半部,則輸出01111,其中1的個數爲bits_to_follow 個。若是區間位於[2/8,6/8)則繼續放大,bits_to_follow 也隨之增長。
建議你們本身畫圖好好體會一下這段代碼的妙處!
如今給出所有代碼:不少小細節有待本身去研究,很微妙的。
View Code1 #include<cstdio> 2 #include<stdlib.h> 3 using namespace::std; 4 5 #define Code_value_bits 16 /* Number of bits in a code value */ 6 typedef long code_value; /* Type of an arithmetic code value */ 7 8 #define Top_value (((long)1<<Code_value_bits)-1) /* Largest code value */ 9 10 11 #define First_qtr (Top_value/4+1) /* Point after first quarter */ 12 #define Half (2*First_qtr) /* Point after first half */ 13 #define Third_qtr (3*First_qtr) /* Point after third quarter */ 14 15 #define No_of_chars 256 /* Number of character symbols */ 16 #define EOF_symbol (No_of_chars+1) /* Index of EOF symbol */ 17 18 #define No_of_symbols (No_of_chars+1) /* Total number of symbols */ 19 20 /* TRANSLATION TABLES BETWEEN CHARACTERS AND SYMBOL INDEXES. */ 21 22 int char_to_index[No_of_chars]; /* To index from character */ 23 unsigned char index_to_char[No_of_symbols+1]; /* To character from index */ 24 25 /* CUMULATIVE FREQUENCY TABLE. */ 26 27 #define Max_frequency 16383 /* Maximum allowed frequency count */ 28 /* 2^14 - 1 */ 29 int cum_freq[No_of_symbols+1]; /* Cumulative symbol frequencies */ 30 31 //固定頻率表,爲了方便起見 32 int freq[No_of_symbols+1] = { 33 0, 34 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 124, 1, 1, 1, 1, 1, 35 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 36 37 /* ! " # $ % & ' ( ) * + , - . / */ 38 1236, 1, 21, 9, 3, 1, 25, 15, 2, 2, 2, 1, 79, 19, 60, 1, 39 40 /* 0 1 2 3 4 5 6 7 8 9 : ; < = > ? */ 41 15, 15, 8, 5, 4, 7, 5, 4, 4, 6, 3, 2, 1, 1, 1, 1, 42 43 /* @ A B C D E F G H I J K L M N O */ 44 1, 24, 15, 22, 12, 15, 10, 9, 16, 16, 8, 6, 12, 23, 13, 11, 45 46 /* P Q R S T U V W X Y Z [ / ] ^ _ */ 47 14, 1, 14, 28, 29, 6, 3, 11, 1, 3, 1, 1, 1, 1, 1, 3, 48 49 /* ' a b c d e f g h i j k l m n o */ 50 1, 491, 85, 173, 232, 744, 127, 110, 293, 418, 6, 39, 250, 139, 429, 446, 51 52 /* p q r s t u v w x y z { | } ~ */ 53 111, 5, 388, 375, 531, 152, 57, 97, 12, 101, 5, 2, 1, 2, 3, 1, 54 55 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 56 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 57 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 58 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 59 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 60 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 61 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 62 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 63 1 64 }; 65 66 //用來存儲編碼值,是編碼解碼過程的橋樑。大小暫定100,實際中能夠修改 67 char code[100]; 68 static int code_index=0; 69 static int decode_index=0; 70 71 //buffer爲八位緩衝區,暫時存放編碼制 72 static int buffer; 73 //buffer中還有幾個比特沒有用到,初始值爲8 74 static int bits_to_go; 75 //超過了EOF的字符,也是垃圾 76 static int garbage_bits; 77 78 //啓用字符頻率統計模型,也就是計算各個字符的頻率分佈區間 79 void start_model(){ 80 int i; 81 for (i = 0; i<No_of_chars; i++) { 82 //爲了便於查找 83 char_to_index[i] = i+1; 84 index_to_char[i+1] = i; 85 } 86 87 //累計頻率cum_freq[i-1]=freq[i]+...+freq[257], cum_freq[257]=0; 88 cum_freq[No_of_symbols] = 0; 89 for (i = No_of_symbols; i>0; i--) { 90 cum_freq[i-1] = cum_freq[i] + freq[i]; 91 } 92 //這條語句是爲了確保頻率和的上線,這是後話,這裏就註釋掉 93 //if (cum_freq[0] > Max_frequency); /* Check counts within limit*/ 94 } 95 96 97 //初始化緩衝區,便於開始接受編碼值 98 void start_outputing_bits() 99 { 100 buffer = 0; //緩衝區一開始爲空 101 bits_to_go = 8; 102 } 103 104 105 void output_bit(int bit) 106 { 107 //爲了寫代碼方便,編碼數據是從右到左進入緩衝區的。記住這一點! 108 buffer >>= 1; 109 if (bit) buffer |= 0x80; 110 bits_to_go -= 1; 111 //當緩衝區滿了的時候,就輸出存起來 112 if (bits_to_go==0) { 113 code[code_index]=buffer; 114 code_index++; 115 116 bits_to_go = 8; //從新恢復爲8 117 } 118 } 119 120 121 void done_outputing_bits() 122 { 123 //編碼最後的時候,當緩衝區沒有滿,則直接補充0 124 code[code_index]=buffer>>bits_to_go; 125 code_index++; 126 } 127 128 129 130 static void bit_plus_follow(int); /* Routine that follows */ 131 static code_value low, high; /* Ends of the current code region */ 132 static long bits_to_follow; /* Number of opposite bits to output after */ 133 134 135 void start_encoding() 136 { 137 for(int i=0;i<100;i++)code[i]='\0'; 138 139 low = 0; /* Full code range. */ 140 high = Top_value; 141 bits_to_follow = 0; /* No bits to follow */ 142 } 143 144 145 void encode_symbol(int symbol,int cum_freq[]) 146 { 147 long range; /* Size of the current code region */ 148 range = (long)(high-low)+1; 149 150 high = low + (range*cum_freq[symbol-1])/cum_freq[0]-1; /* Narrow the code region to that allotted to this */ 151 low = low + (range*cum_freq[symbol])/cum_freq[0]; /* symbol. */ 152 153 for (;;) 154 { /* Loop to output bits. */ 155 if (high<Half) { 156 bit_plus_follow(0); /* Output 0 if in low half. */ 157 } 158 else if (low>=Half) { /* Output 1 if in high half.*/ 159 bit_plus_follow(1); 160 low -= Half; 161 high -= Half; /* Subtract offset to top. */ 162 } 163 else if (low>=First_qtr && high<Third_qtr) { /* Output an opposite bit later if in middle half. */ 164 bits_to_follow += 1; 165 low -= First_qtr; /* Subtract offset to middle*/ 166 high -= First_qtr; 167 } 168 else break; /* Otherwise exit loop. */ 169 low = 2*low; 170 high = 2*high+1; /* Scale up code range. */ 171 } 172 } 173 174 /* FINISH ENCODING THE STREAM. */ 175 176 void done_encoding() 177 { 178 bits_to_follow += 1; /* Output two bits that */ 179 if (low<First_qtr) bit_plus_follow(0); /* select the quarter that */ 180 else bit_plus_follow(1); /* the current code range */ 181 } /* contains. */ 182 183 184 static void bit_plus_follow(int bit) 185 { 186 output_bit(bit); /* Output the bit. */ 187 while (bits_to_follow>0) { 188 output_bit(!bit); /* Output bits_to_follow */ 189 bits_to_follow -= 1; /* opposite bits. Set */ 190 } /* bits_to_follow to zero. */ 191 } 192 193 194 195 void encode(){ 196 start_model(); /* Set up other modules. */ 197 start_outputing_bits(); 198 start_encoding(); 199 for (;;) { /* Loop through characters. */ 200 int ch; 201 int symbol; 202 ch = getchar(); /* Read the next character. */ 203 //if (ch==EOF) break; /* Exit loop on end-of-file. */ 204 //爲了簡單起見,這裏就不用EOF爲結尾了,直接使用回車符做爲結尾。這不影響說明算法的原理 205 if(ch==10)break; 206 symbol = char_to_index[ch]; /* Translate to an index. */ 207 encode_symbol(symbol,cum_freq); /* Encode that symbol. */ 208 209 } 210 //將EOF編碼進去,做爲終止符 211 encode_symbol(EOF_symbol,cum_freq); 212 done_encoding(); /* Send the last few bits. */ 213 done_outputing_bits(); 214 215 } 216 217 218 //解碼 219 220 static code_value value; /* Currently-seen code value */ 221 222 void start_inputing_bits() 223 { 224 bits_to_go = 0; /* Buffer starts out with */ 225 garbage_bits = 0; /* no bits in it. */ 226 } 227 228 229 int input_bit() 230 { 231 int t; 232 233 if (bits_to_go==0) { 234 buffer = code[decode_index]; 235 decode_index++; 236 237 // if (buffer==EOF) { 238 if(decode_index > code_index ){ 239 garbage_bits += 1; /* Return arbitrary bits*/ 240 if (garbage_bits>Code_value_bits-2) { /* after eof, but check */ 241 fprintf(stderr,"Bad input file/n"); /* for too many such. */ 242 // exit(-1); 243 } 244 } 245 bits_to_go = 8; 246 } 247 //從左到右取出二進制位,由於存的時候是從右到左 248 t = buffer&1; /* Return the next bit from */ 249 buffer >>= 1; /* the bottom of the byte. */ 250 bits_to_go -= 1; 251 return t; 252 } 253 254 void start_decoding() 255 { 256 int i; 257 value = 0; /* Input bits to fill the */ 258 for (i = 1; i<=Code_value_bits; i++) { /* code value. */ 259 value = 2*value+input_bit(); 260 } 261 262 263 low = 0; /* Full code range. */ 264 high = Top_value; 265 } 266 267 268 int decode_symbol(int cum_freq[]) 269 { 270 long range; /* Size of current code region */ 271 int cum; /* Cumulative frequency calculated */ 272 int symbol; /* Symbol decoded */ 273 range = (long)(high-low)+1; 274 cum = (((long)(value-low)+1)*cum_freq[0]-1)/range; /* Find cum freq for value. */ 275 276 for (symbol = 1; cum_freq[symbol]>cum; symbol++) ; /* Then find symbol. */ 277 high = low + (range*cum_freq[symbol-1])/cum_freq[0]-1; /* Narrow the code region *//* to that allotted to this */ 278 low = low + (range*cum_freq[symbol])/cum_freq[0]; 279 280 for (;;) { /* Loop to get rid of bits. */ 281 if (high<Half) { 282 /* nothing */ /* Expand low half. */ 283 } 284 else if (low>=Half) { /* Expand high half. */ 285 value -= Half; 286 low -= Half; /* Subtract offset to top. */ 287 high -= Half; 288 } 289 else if (low>=First_qtr && high <Third_qtr) { 290 value -= First_qtr; 291 low -= First_qtr; /* Subtract offset to middle*/ 292 high -= First_qtr; 293 } 294 else break; /* Otherwise exit loop. */ 295 low = 2*low; 296 high = 2*high+1; /* Scale up code range. */ 297 value = 2*value+input_bit(); /* Move in next input blt. */ 298 } 299 return symbol; 300 } 301 302 303 void decode(){ 304 start_model(); /* Set up other modules. */ 305 start_inputing_bits(); 306 start_decoding(); 307 for (;;) { /* Loop through characters. */ 308 int ch; int symbol; 309 symbol = decode_symbol(cum_freq); /* Decode next symbol. */ 310 if (symbol==EOF_symbol) break; /* Exit loop if EOF symbol. */ 311 ch = index_to_char[symbol]; /* Translate to a character.*/ 312 putc(ch,stdout); /* Write that character. */ 313 } 314 } 315 316 int main() 317 { 318 encode(); 319 decode(); 320 system("pause"); 321 return 0; 322 }