Huffman Codes

Huffman Codes

In 1953, David A. Huffman published his paper "A Method for the Construction of Minimum-Redundancy Codes", and hence printed his name in the history of computer science. As a professor who gives the final exam problem on Huffman codes, I am encountering a big problem: the Huffman codes are NOT unique. For example, given a string "aaaxuaxz", we can observe that the frequencies of the characters 'a', 'x', 'u' and 'z' are 4, 2, 1 and 1, respectively. We may either encode the symbols as {'a'=0, 'x'=10, 'u'=110, 'z'=111}, or in another way as {'a'=1, 'x'=01, 'u'=001, 'z'=000}, both compress the string into 14 bits. Another set of code can be given as {'a'=0, 'x'=11, 'u'=100, 'z'=101}, but {'a'=0, 'x'=01, 'u'=011, 'z'=001} is NOT correct since "aaaxuaxz" and "aazuaxax" can both be decoded from the code 00001011001001. The students are submitting all kinds of codes, and I need a computer program to help me determine which ones are correct and which ones are not.html

Input Specification:

Each input file contains one test case. For each case, the first line gives an integer N (2), then followed by a line that contains all the N distinct characters and their frequencies in the following format:ios

c[1] f[1] c[2] f[2] ... c[N] f[N]數據結構

where c[i] is a character chosen from {'0' - '9', 'a' - 'z', 'A' - 'Z', '_'}, and f[i] is the frequency of c[i] and is an integer no more than 1000. The next line gives a positive integer M (≤), then followed by M student submissions. Each student submission consists of N lines, each in the format:ide

c[i] code[i]函數

where c[i] is the i-th character and code[i] is an non-empty string of no more than 63 '0's and '1's.測試

Output Specification:

For each test case, print in each line either "Yes" if the student's submission is correct, or "No" if not.編碼

Note: The optimal solution is not necessarily generated by Huffman algorithm. Any prefix code with code length being optimal is considered correct.spa

Sample Input:

7
A 1 B 1 C 1 D 3 E 3 F 6 G 6
4
A 00000
B 00001
C 0001
D 001
E 01
F 10
G 11
A 01010
B 01011
C 0100
D 011
E 10
F 11
G 00
A 000
B 001
C 010
D 011
E 100
F 101
G 110
A 00000
B 00001
C 0001
D 001
E 00
F 10
G 113d

Sample Output:

Yes
Yes
No
No指針

 

解題思路

  要判斷編碼是否爲最優編碼,須要對編碼進行兩個方面的檢驗:對每一組編碼來判斷WPL是否爲最小的,以及是否爲前綴碼。

  下面給出第一種思路,須要構建Huffman Tree的方法,同時也是用堆來實現的。

  樹節點的定義以下:

1 struct Data {
2     char letter;
3     int freq;
4 };
5 
6 struct TNode {
7     Data data;
8     TNode *left, *right;
9 };

  首先咱們要根據給出的字符頻率來構造出一顆對應的Huffman Tree,同時計算出WPL。

  又由於咱們是用堆來實現的,因此要構造一顆Huffman Tree咱們先要把給定的頻率來構建一個最小堆。

  這裏咱們須要對堆進行定義,同時還要定義堆的相關操做。這裏就直接上代碼,不解釋了。

 1 struct Heap {
 2     TNode *H;    // 堆的每個元素的數據類型是樹節點TNode 
 3     int size;
 4     int capacity;
 5 };
 6 
 7 Heap *createMinHeap(int n) {
 8     Heap *minHeap = new Heap;
 9     minHeap->size = 0;
10     minHeap->capacity = n;
11     minHeap->H = new TNode[n + 1];
12     minHeap->H[0].data.freq = -1;
13     
14     for (int i = 0; i < minHeap->capacity; i++) {
15         TNode *tmp = new TNode;
16         tmp->left = tmp->right = NULL;
17         getchar();
18         scanf("%c %d", &tmp->data.letter, &tmp->data.freq);
19         insertHeap(minHeap, tmp);
20     }
21     
22     return minHeap;
23 }
24 
25 void insertHeap(Heap *minHeap, TNode *treeNode) {
26     int pos = ++minHeap->size;
27     for ( ; treeNode->data.freq < minHeap->H[pos / 2].data.freq; pos /= 2) {
28         minHeap->H[pos] = minHeap->H[pos / 2];
29     }
30     minHeap->H[pos] = *treeNode;
31 }
32 
33 TNode *deleteMin(Heap *minHeap) {
34     TNode *minTreeNode = new TNode;
35     *minTreeNode = minHeap->H[1];
36     TNode tmp = minHeap->H[minHeap->size--];
37     
38     int parent = 1, child;
39     for ( ; parent * 2 <= minHeap->size; parent = child) {
40         child = parent * 2;
41         if (child != minHeap->size && minHeap->H[child].data.freq > minHeap->H[child + 1].data.freq) child++;
42         if (tmp.data.freq < minHeap->H[child].data.freq) break;
43         else minHeap->H[parent] = minHeap->H[child];
44     }
45     minHeap->H[parent] = tmp;
46     
47     return minTreeNode;
48 }

  構建好最小堆後,下一步咱們須要經過這個堆來構造出對應的Huffman Tree。

  就是每次從堆中彈出最小頻率的那兩個節點,而後把這兩個節點分別插在新節點的左右邊,做爲左右孩子。再把新節點壓入堆中。如此循環n-1次後(其中n表明節點的個數),堆中就只剩下一個元素,那個元素就是Huffman Tree的根節點,咱們直接返回便可。

  按照上面構造Huffman Tree的思路,相應的代碼以下:

 1 TNode *createHuffmanTree(Heap *minHeap) {
 2     int n = minHeap->size - 1;
 3     while (n--) {
 4         TNode *tmp = new TNode;
 5         tmp->left = deleteMin(minHeap);
 6         tmp->right = deleteMin(minHeap);
 7         tmp->data.freq = tmp->left->data.freq + tmp->right->data.freq;
 8         
 9         insertHeap(minHeap, tmp);
10     }
11     
12     return deleteMin(minHeap);
13 }

  而後,咱們要根據這顆Huffman Tree來計算WPL。咱們用遞歸來實現計算WPL。

  若是這個節點是葉子節點,那麼就用當前深度乘以對應的頻率,而後返回。若是不是葉子節點,就遞歸來計算左右子樹的WPL,相加後返回。

1 int WPL(TNode *T, int depth) {
2     if (T->left == NULL && T->right == NULL) return depth * T->data.freq;   // 葉子節點直接返回結果 
3     else return WPL(T->left, depth + 1) + WPL(T->right, depth + 1);         // 不是葉子節點,計算左右子樹的WPL並返回,同時因爲左右子樹的深度加深一層,記得depth+1 
4 }

  好了,折騰了這麼久,終於計算出給定頻率的WPL了。

  接下來咱們先對每一組編碼來檢測其WPL是否是最小的,也就是每組編碼的WPL是否與給定頻率的WPL相等。

  計算方法很簡單,每一組編碼的WPL計算公式爲:

  再判斷codeLen是否與上面求出的給定頻率的WPL相等,若是不相等,就說明這個編碼不是最優編碼,就不須要再判斷是否爲前綴碼了。若是相等再去判斷是否爲前綴碼。

  這裏還有個陷阱。首先咱們要知道,一個最優編碼的長度是不會超過n-1的。因此若是某個編碼的長度大於n-1也說明該編碼不是最優編碼。

  這裏同時給出計算編碼長度和判斷是不是前綴碼的函數:

 1 bool check(TNode *huffmanTree, int n) {
 2     int wpl = WPL(huffmanTree, 0);  // 計算給定頻率構成的Huuffman Tree的WPL 
 3 
 4     std::string code[n];            // 存放每個字符的編碼 
 5     int codeLen = 0;
 6     bool ret = true;                // 用來標記該組編碼是否爲最優編碼 
 7     
 8     for (int i = 0; i < n; i++) {
 9         char letter;
10         getchar();
11         scanf("%c", &letter);
12         getchar();
13         std::cin >> code[i];
14         
15         if (ret) {                  // 若是已經知道該組編碼不是最優編碼就不須要再計算編碼長度了,但仍要繼續輸入 
16             if (code[i].size() > n - 1) ret = false;                    // 若是某個字符的編碼長度大於n-1,說明該組編碼不是最優編碼 
17             codeLen += code[i].size() * findFreq(huffmanTree, letter);  // 計算編碼長度 
18         }
19     }
20     
21     if (ret && codeLen == wpl) {        // 若是ret == true而且編碼長度與WPL相同,接着判斷該組編碼是否爲前綴碼 
22         TNode *T = new TNode;           // 爲這組編碼構造一顆Huffman Tree,初始化Huffman Tree的根節點 
23         T->data.freq = 0;
24         T->left = T->right = NULL;
25             
26         for (int i = 0; i < n; i++) {   // 有n個節點,須要判斷n次 
27             TNode *pre = T;             // 每次判斷一個字符都從根節點開始 
28             
29             for (std::string::iterator it = code[i].begin(); it != code[i].end(); it++) {   // 對該字符的每個編碼進行判斷 
30                 if (*it == '0') {                   // 若是編碼是0 
31                     if (pre->left == NULL) {        // 若是當前節點的左子樹爲空 
32                         TNode *tmp = new TNode;     // 就爲當前節點生成一顆左子樹 
33                         tmp->data.freq = 0;         // 該節點的頻率標記爲0,表示該節點尚未字符佔用 
34                         tmp->left = tmp->right = NULL;
35                         pre->left = tmp;
36                     }
37                     pre = pre->left;                // pre指針指向左子樹 
38                 }
39                 else {                              // 若是編碼是1 
40                     if (pre->right == NULL) {       // 若是當前節點的右子樹爲空
41                         TNode *tmp = new TNode;     // 就爲當前節點生成一顆右子樹
42                         tmp->data.freq = 0;         // 該節點的頻率標記爲0,表示該節點尚未字符佔用
43                         tmp->left = tmp->right = NULL;
44                         pre->right = tmp;
45                     }
46                     pre = pre->right;                // pre指針指向左子樹
47                 }
48             }
49             
50             // 讀完了字符的編碼後,pre指針就指向這個字符應該佔用的位置
51             // 這時須要判斷pre指向的這個節點是否爲葉子節點,而且該節點有沒有被其餘字符佔用 
52             if (pre->left == NULL && pre->right == NULL && pre->data.freq == 0) {
53                 pre->data.freq = 1;                  //  若是是葉子節點而且沒有被佔用,該字符就佔用了這個節點,並把這個節點的頻率標記爲1 
54             }
55             else {                                   // 不然,若是這些條件中有一個不知足 
56                 ret = false;                         // 就說明該組字符不知足前綴碼的要求,ret賦值爲false 
57                 break;                               // 後面的字符不須要判斷了,直接退出退出判斷前綴碼的循環 
58             }
59         }
60     }
61     else {          // 若是ret == false而且編碼長度不等於WPL,就說明該組編碼不是最優編碼 
62         ret = false;
63     }
64     
65     return ret;
66 }

  這裏是經過構造一顆Huffman Tree來判斷該組編碼是否符合前綴碼。

  判斷的過程以下:

  有一個指向Huffman Tree根節點的指針。

  • 若是編碼是'0',先判斷當前節點的左子樹是否存在,若是不存在先生成左子樹,再讓指針移到左子樹的節點。若是存在那麼直接讓指針移到左子樹的節點便可。
  • 若是編碼是'1',先判斷當前節點的右子樹是否存在,若是不存在先生成右子樹,再讓指針移到右子樹的節點。若是存在那麼直接讓指針移到右子樹的節點便可。

  讀完該字符的編碼後,那麼此時字符應該放入這個指針指向的節點。這個節點要知足兩個條件才能夠放入:

  • 該節點的左右孩子都爲空,也就是該節點爲葉子節點。
  • 該節點每有被標記過,也就是說該節點沒有存放其餘的字符。

  若是有一個條件不知足,就說明該組編碼不是前綴碼。

  最後,給出這種方法的完整AC代碼,代碼量有點多。

#include <cstdio>
#include <iostream>
#include <string>

struct Data {
    char letter;
    int freq;
};

struct TNode {
    Data data;
    TNode *left, *right;
};

struct Heap {
    TNode *H;
    int size;
    int capacity;
};

Heap *createMinHeap(int n);
void insertHeap(Heap *minHeap, TNode *treeNode);
TNode *deleteHeap(Heap *minHeap);
TNode *createHuffmanTree(Heap *minHeap);
bool check(TNode *huffmanTree, int n);
int WPL(TNode *T, int depth);
int findFreq(TNode *huffmanTree, char letter);

int main() {
    int n;
    scanf("%d", &n);
    
    Heap *minHeap = createMinHeap(n);
    TNode *huffmanTree = createHuffmanTree(minHeap);
    
    int m;
    scanf("%d", &m);
    for (int i = 0; i < m; i++) {
        bool ret = check(huffmanTree, n);
        printf("%s\n", ret ? "Yes" : "No");
    }
    
    return 0;
}

Heap *createMinHeap(int n) {
    Heap *minHeap = new Heap;
    minHeap->size = 0;
    minHeap->capacity = n;
    minHeap->H = new TNode[n + 1];
    minHeap->H[0].data.freq = -1;
    
    for (int i = 0; i < minHeap->capacity; i++) {
        TNode *tmp = new TNode;
        tmp->left = tmp->right = NULL;
        getchar();
        scanf("%c %d", &tmp->data.letter, &tmp->data.freq);
        insertHeap(minHeap, tmp);
    }
    
    return minHeap;
}

void insertHeap(Heap *minHeap, TNode *treeNode) {
    int pos = ++minHeap->size;
    for ( ; treeNode->data.freq < minHeap->H[pos / 2].data.freq; pos /= 2) {
        minHeap->H[pos] = minHeap->H[pos / 2];
    }
    minHeap->H[pos] = *treeNode;
}

TNode *deleteHeap(Heap *minHeap) {
    TNode *minTreeNode = new TNode;
    *minTreeNode = minHeap->H[1];
    TNode tmp = minHeap->H[minHeap->size--];
    
    int parent = 1, child;
    for ( ; parent * 2 <= minHeap->size; parent = child) {
        child = parent * 2;
        if (child != minHeap->size && minHeap->H[child].data.freq > minHeap->H[child + 1].data.freq) child++;
        if (tmp.data.freq < minHeap->H[child].data.freq) break;
        else minHeap->H[parent] = minHeap->H[child];
    }
    minHeap->H[parent] = tmp;
    
    return minTreeNode;
}

TNode *createHuffmanTree(Heap *minHeap) {
    int n = minHeap->size - 1;
    while (n--) {
        TNode *tmp = new TNode;
        tmp->left = deleteHeap(minHeap);
        tmp->right = deleteHeap(minHeap);
        tmp->data.freq = tmp->left->data.freq + tmp->right->data.freq;
        
        insertHeap(minHeap, tmp);
    }
    
    return deleteHeap(minHeap);
}

bool check(TNode *huffmanTree, int n) {
    int wpl = WPL(huffmanTree, 0);

    std::string code[n];
    int codeLen = 0;
    bool ret = true;
    
    for (int i = 0; i < n; i++) {
        char letter;
        getchar();
        scanf("%c", &letter);
        getchar();
        std::cin >> code[i];
        
        if (ret) {
            if (code[i].size() > n - 1) ret = false;
            codeLen += code[i].size() * findFreq(huffmanTree, letter);
        }
    }
    
    if (ret && codeLen == wpl) {
        TNode *T = new TNode;
        T->data.freq = 0;
        T->left = T->right = NULL;
            
        for (int i = 0; i < n; i++) {
            TNode *pre = T;
            
            for (std::string::iterator it = code[i].begin(); it != code[i].end(); it++) {
                if (*it == '0') {
                    if (pre->left == NULL) {
                        TNode *tmp = new TNode;
                        tmp->data.freq = 0;
                        tmp->left = tmp->right = NULL;
                        pre->left = tmp;
                    }
                    pre = pre->left;
                }
                else {
                    if (pre->right == NULL) {
                        TNode *tmp = new TNode;
                        tmp->data.freq = 0;
                        tmp->left = tmp->right = NULL;
                        pre->right = tmp;
                    }
                    pre = pre->right;
                }
            }
            
            if (pre->left == NULL && pre->right == NULL && pre->data.freq == 0) {
                pre->data.freq = 1;
            }
            else {
                ret = false;
                break;
            }
        }
    }
    else {
        ret = false;
    }
    
    return ret;
}

int WPL(TNode *T, int depth) {
    if (T->left == NULL && T->right == NULL) return depth * T->data.freq;
    else return WPL(T->left, depth + 1) + WPL(T->right, depth + 1);
}

int findFreq(TNode *huffmanTree, char letter) {
    int ret = 0;
    if (huffmanTree) {
        if (huffmanTree->left == NULL && huffmanTree->right == NULL && huffmanTree->data.letter == letter) ret = huffmanTree->data.freq;
        if (ret == 0) ret = findFreq(huffmanTree->left, letter);
        if (ret == 0) ret = findFreq(huffmanTree->right, letter);
    }
    
    return ret;
}
AC Code1

  而後,咱們來對斷定前綴碼的代碼進行改進,下面給出斷定前綴碼的另一種思路,這個方法不須要構造Huffman Tree。

  首先,假設如今有兩個編碼,若是這兩個編碼不知足前綴碼的話,好比"110"和"1101",那麼其中一個編碼會與另一個編碼前的m個位置的相同(其中m是指這兩個編碼長度中最小的那個長度)。也就是說"110",與"1101"的前3個位置的"110"相同,就說明"110"和"1101"不知足前綴碼。

  咱們須要對同組編碼的每兩個字符進行比較,須要比較的次數爲 C(n, 2) = n * (n - 1) / 2 。

  check函數改進的代碼以下:

 1 bool check(TNode *huffmanTree, int n) {
 2     int wpl = WPL(huffmanTree, 0);
 3 
 4     std::string code[n];
 5     int codeLen = 0;
 6     bool ret = true;
 7     
 8     for (int i = 0; i < n; i++) {
 9         char letter;
10         getchar();
11         scanf("%c", &letter);
12         getchar();
13         std::cin >> code[i];
14         
15         if (ret) {
16             if (code[i].size() > n - 1) ret = false;
17             codeLen += code[i].size() * findFreq(huffmanTree, letter);
18         }
19     }
20     
21     if (ret && codeLen == wpl) {        // 同樣的,若是ret == true而且編碼長度與WPL相同,才判斷該組編碼是否爲前綴碼
22         for (int i = 0; i < n; i++) {   // 每一個字符都跟它以後的字符進行判斷是否知足前綴碼的要求 
23             for (int j = i + 1; j < n; j++) {
24                 // 判斷某個編碼是否與另一個編碼前m個位置的相同,詳細請看圖片 
25                 if (code[i].substr(0, code[j].size()) == code[j].substr(0, code[i].size())) {
26                     ret = false;        // 只要有一對編碼的前綴相同,就說明這組的編碼不知足前綴碼 
27                     break;              // 後面的字符不須要判斷了,直接退出退出判斷前綴碼的循環 
28                 }
29             }
30             if (ret == false) break;
31         }
32     }
33     else {
34         ret = false;
35     }
36     
37     return ret;
38 }

   code[i].substr(0, code[j].size()) == code[j].substr(0, code[i].size()) ,這麼作始終可以保證取到兩個編碼中,長度最小那個編碼的所有,以及另一個編碼的前面一樣長度的部分,來進行判斷是否知足前綴碼。

  這個方法也能夠經過,下面給出完整的AC代碼,其中改動的部分就是check部分,其餘的不變。

#include <cstdio>
#include <iostream>
#include <string>

struct Data {
    char letter;
    int freq;
};

struct TNode {
    Data data;
    TNode *left, *right;
};

struct Heap {
    TNode *H;
    int size;
    int capacity;
};

Heap *createMinHeap(int n);
void insertHeap(Heap *minHeap, TNode *treeNode);
TNode *deleteHeap(Heap *minHeap);
TNode *createHuffmanTree(Heap *minHeap);
bool check(TNode *huffmanTree, int n);
int WPL(TNode *T, int depth);
int findFreq(TNode *huffmanTree, char letter);

int main() {
    int n;
    scanf("%d", &n);
    
    Heap *minHeap = createMinHeap(n);
    TNode *huffmanTree = createHuffmanTree(minHeap);
    
    int m;
    scanf("%d", &m);
    for (int i = 0; i < m; i++) {
        bool ret = check(huffmanTree, n);
        printf("%s\n", ret ? "Yes" : "No");
    }
    
    return 0;
}

Heap *createMinHeap(int n) {
    Heap *minHeap = new Heap;
    minHeap->size = 0;
    minHeap->capacity = n;
    minHeap->H = new TNode[n + 1];
    minHeap->H[0].data.freq = -1;
    
    for (int i = 0; i < minHeap->capacity; i++) {
        TNode *tmp = new TNode;
        tmp->left = tmp->right = NULL;
        getchar();
        scanf("%c %d", &tmp->data.letter, &tmp->data.freq);
        insertHeap(minHeap, tmp);
    }
    
    return minHeap;
}

void insertHeap(Heap *minHeap, TNode *treeNode) {
    int pos = ++minHeap->size;
    for ( ; treeNode->data.freq < minHeap->H[pos / 2].data.freq; pos /= 2) {
        minHeap->H[pos] = minHeap->H[pos / 2];
    }
    minHeap->H[pos] = *treeNode;
}

TNode *deleteHeap(Heap *minHeap) {
    TNode *minTreeNode = new TNode;
    *minTreeNode = minHeap->H[1];
    TNode tmp = minHeap->H[minHeap->size--];
    
    int parent = 1, child;
    for ( ; parent * 2 <= minHeap->size; parent = child) {
        child = parent * 2;
        if (child != minHeap->size && minHeap->H[child].data.freq > minHeap->H[child + 1].data.freq) child++;
        if (tmp.data.freq < minHeap->H[child].data.freq) break;
        else minHeap->H[parent] = minHeap->H[child];
    }
    minHeap->H[parent] = tmp;
    
    return minTreeNode;
}

TNode *createHuffmanTree(Heap *minHeap) {
    int n = minHeap->size - 1;
    while (n--) {
        TNode *tmp = new TNode;
        tmp->left = deleteHeap(minHeap);
        tmp->right = deleteHeap(minHeap);
        tmp->data.freq = tmp->left->data.freq + tmp->right->data.freq;
        
        insertHeap(minHeap, tmp);
    }
    
    return deleteHeap(minHeap);
}

bool check(TNode *huffmanTree, int n) {
    int wpl = WPL(huffmanTree, 0);

    std::string code[n];
    int codeLen = 0;
    bool ret = true;
    
    for (int i = 0; i < n; i++) {
        char letter;
        getchar();
        scanf("%c", &letter);
        getchar();
        std::cin >> code[i];
        
        if (ret) {
            if (code[i].size() > n - 1) ret = false;
            codeLen += code[i].size() * findFreq(huffmanTree, letter);
        }
    }

    if (ret && codeLen == wpl) {
        for (int i = 0; i < n; i++) {
            for (int j = i + 1; j < n; j++) {
                if (code[i].substr(0, code[j].size()) == code[j].substr(0, code[i].size())) {
                    ret = false;
                    break;
                }
            }
            if (ret == false) break;
        }
    }
    else {
        ret = false;
    }
    
    return ret;
}

int WPL(TNode *T, int depth) {
    if (T->left == NULL && T->right == NULL) return depth * T->data.freq;
    else return WPL(T->left, depth + 1) + WPL(T->right, depth + 1);
}

int findFreq(TNode *huffmanTree, char letter) {
    int ret = 0;
    if (huffmanTree) {
        if (huffmanTree->left == NULL && huffmanTree->right == NULL && huffmanTree->data.letter == letter) ret = huffmanTree->data.freq;
        if (ret == 0) ret = findFreq(huffmanTree->left, letter);
        if (ret == 0) ret = findFreq(huffmanTree->right, letter);
    }
    
    return ret;
}
AC Code2

  還能夠再改進!咱們不用堆,而改用優先隊列,同時不須要構造任何的Huffman Tree,甚至不須要定義樹節點,也能夠計算出給定頻率的WPL!而且代碼長度也縮短許多。

  這裏主要說明如何經過不構造Huffman Tree來計算給定頻率的WPL。其實計算WPL不必定要用深度乘以頻率再求和來獲得。另一種方法是把Huffman Tree中度爲2的節點存放的頻率都相加起來,最後獲得的結果也是WPL。這是由於葉子節點被重複計算,和用深度乘以頻率的原理基本同樣。

  就拿題目給的測試樣例來舉例:

  咱們往優先隊列裏面壓的就是每一個字符對應的頻率,而不是樹節點。代碼實現的過程是:咱們要有一個變量來累加如上圖度爲2節點存放頻率。每次從優先隊列裏彈出兩個頻率,這兩個頻率是優先隊列中所包含頻率裏面最小的那兩個,而後把這兩個頻率相加,相加的結果其實就對應上圖度爲2節點存放的頻率,也就是紅色的數字。而後把相加的結果累加到那個變量,同時把相加的結果壓入優先隊列中。其實這個累加的過程就是累加上圖紅色的那些數字。一直重複,直到優先隊列爲空,那麼那個變量最後累加的結果就是咱們要計算的WPL。

  AC代碼以下:

 1 #include <cstdio>
 2 #include <iostream>
 3 #include <string>
 4 #include <vector>
 5 #include <queue>
 6 #include <map>
 7 using namespace std;
 8 
 9 void readLetterFreq(map<char, int> &letterFreq, priority_queue< int, vector<int>, greater<int> > &pq, int n);
10 void checkOptimalCode(map<char, int> &letterFreq, priority_queue< int, vector<int>, greater<int> > &pq, int n);
11 int getWPL(priority_queue< int, vector<int>, greater<int> > &pq);
12 
13 int main() {
14     map<char, int> letterFreq;        // 用map來存儲字符和對應的頻率,字符映射爲對應的頻率 
15     priority_queue< int, vector<int>, greater<int> > pq;
16     
17     int n;
18     cin >> n;
19     readLetterFreq(letterFreq, pq, n);
20     checkOptimalCode(letterFreq, pq, n);
21     
22     return 0;
23 }
24 
25 void readLetterFreq(map<char, int> &letterFreq, priority_queue< int, vector<int>, greater<int> > &pq, int n) {
26     for (int i = 0; i < n; i++) {
27         char letter;
28         getchar();
29         cin >> letter;              // 讀入字符 
30         getchar();
31         cin >> letterFreq[letter];  // 讀入頻率,爲字符的映射 
32         
33         pq.push(letterFreq[letter]);// 把讀入的頻率壓入到優先隊列中 
34     }
35 }
36 
37 void checkOptimalCode(map<char, int> &letterFreq, priority_queue< int, vector<int>, greater<int> > &pq, int n) {
38     int wpl = getWPL(pq);           // 用不構造Huffman Tree的方法來計算WPL 
39     
40     int m;
41     cin >> m;
42     for (int i = 0; i < m; i++) {
43         string code[n];
44         int codeLen = 0;
45         bool ret = true;
46         
47         for (int i = 0; i < n; i++) {
48             char letter;
49             getchar();
50             cin >> letter >> code[i];
51             
52             if (ret) {
53                 if (code[i].size() > n - 1) ret = false;
54                 codeLen += code[i].size() * letterFreq[letter];
55             } 
56         }
57         
58         if (ret && codeLen == wpl) {
59             for (int i = 0; i < n; i++) {
60                 for (int j = i + 1; j < n; j++) {
61                     if (code[i].substr(0, code[j].size()) == code[j].substr(0, code[i].size())) {
62                         ret = false;
63                         break;
64                     }
65                 }
66                 if (ret == false) break;
67             }
68         }
69         else {
70             ret = false;
71         }
72         
73         cout << (ret ? "Yes\n" : "No\n");
74     }
75 }
76 
77 int getWPL(priority_queue< int, vector<int>, greater<int> > &pq) {
78     int wpl = 0;                // 用來保存累加的結果 
79     while (!pq.empty()) {       // 當優先隊列不爲空 
80         int tmp = pq.top();     // 從優先隊列彈出一個元素,這個元素就是最小頻率 
81         pq.pop();
82         
83         if (pq.empty()) break;  // 若是彈出那個頻元素優先隊列就爲空了,退出循環 
84         
85         tmp += pq.top();        // 若是優先隊列不爲空,再彈出一個元素,同時把兩個頻率進行相加 
86         pq.pop();
87         pq.push(tmp);           // 把兩個頻率相加的結果壓入優先隊列中 
88         
89         wpl += tmp;             // 同時,把這個相加結果進行累加,對應着累加度爲2節點存放的頻率 
90     }
91     
92     return wpl;
93 }

 

參考資料

  浙江大學——數據結構:https://www.icourse163.org/course/ZJU-93001?tid=1461682474

  priority_queue的用法:https://www.cnblogs.com/Deribs4/p/5657746.html

  pta5-9 Huffman Codes (30分):https://www.cnblogs.com/Deribs4/p/4801656.html

相關文章
相關標籤/搜索