數據結構篇——字典樹（trie樹）

時間 2019-11-09

標籤數據結構字典 trie 简体版

原文原文鏈接

引入

如今有這樣一個問題，給出\(n\)個單詞和\(m\)個詢問，每次詢問一個單詞，回答這個單詞是否在單詞表中出現過。php

好像還行，用 map<string,bool> ，幾行就完事了。node

那若是n的範圍是 \(10^5\) 呢？再用 \(map\) 妥妥的超時，說不定還會超內存。ios

這時候就須要一種強大的數據結構——字典樹數組

基本性質

字典樹，又叫Trie樹、前綴樹，用於統計，排序和保存大量的字符串，常常被搜索引擎系統用於文本詞頻統計。數據結構

基本思想： 利用字符串的公共前綴來減小查詢時間，最大限度地減小無謂的字符串比較。搜索引擎

假設全部單詞都只由小寫字母構成，由（abd,abcd,b,bcd,efg,hil）構成的字典樹以下。（百科的圖最後少了一個字母，姑且認爲它是'l'吧）spa

能夠看出字典樹具備如下特色：code

用邊表示字母blog
具備相同前綴的單詞共用前綴節點
每一個節點最多有26個子節點（在單詞只包含小寫字母的狀況下）排序
樹的根節點是空的

基本操做

數據結構定義

用pass記錄有多少字符串通過該節點，就是多少單詞以根結點到該結點的邊組成的字符串爲前綴。

用end記錄有多少字符串以該節點結尾，就是多少單詞是以根結點到該結點的邊組成的字符串。

typedef struct node{
    int pass;//有多少單詞通過該結點
    int end;//有多少單詞以該結點結尾
    struct node* next[26];
}*trieTree;

插入

向字典樹中插入字符串 \(S\)

void insert(trieTree T,string s) {
    node *n = T;
    for (int i = 0; i < s.length(); i++) {
        int index = s[i] - 'a';
        if (T->next[index] == NULL) {
            node *t = new node();
            T->next[index] = t;
        }
        T = T->next[index];
        T->pass++;
    }
    T->end++;
}

查找

查找文章中有多少單詞以字符串 \(S\) 爲前綴。

若是要查找字符串 \(s\) 在文章中出現了多少次，則返回值改爲 T->end 。

int find(trieTree T, string s) {
    node *n = T;
    for (int i = 0; i < s.length(); i++) {
        int index = s[i] - 'a';
        if (T->next[index] == NULL) {
            return NULL;
        }
        T = T->next[index];
    }
    return T->pass;
}

完整實現

#include <iostream>
#include <string>
using namespace std;

typedef struct node{
    int pass;
    int end;
    struct node* next[26];
}*trieTree;

void insert(trieTree T,string s) {
    node *n = T;
    for (int i = 0; i < s.length(); i++) {
        int index = s[i] - 'a';
        if (T->next[index] == NULL) {
            node *t = new node();
            T->next[index] = t;
        }
        T = T->next[index];
        T->pass++;
    }
    T->end++;
}
int find(trieTree T, string s) {
    node *n = T;
    for (int i = 0; i < s.length(); i++) {
        int index = s[i] - 'a';
        if (T->next[index] == NULL) {
            return NULL;
        }
        T = T->next[index];
    }
    return T->pass;
}

map實現

用 node* next[26] 會浪費不少空間，由於不可能每一個結點都用掉 26 個 next

#include <iostream>
#include <map>
#include <string>
using namespace std;
typedef struct node{
public:
    int pass;
    int end;
    map<char,struct node *>m;
}* trieTree;

void insert(trieTree T,string s) {
    for (int i = 0; i < s.length(); i++) {
        if (T->m.find(s[i]) == T->m.end()) {
            node *t = new node();
            T->m.insert(make_pair(s[i], t));
        }
        T = T->m[s[i]];
        T->pass++;
    }
    T->end++;
}
int find(trieTree T, string s) {
    node *n = T;
    for (int i = 0; i < s.length(); i++) {
        if (T->m.find(s[i]) == T->m.end()) {
            return NULL;
        }
        T = T->m[s[i]];
    }
    return T->pass;
}

適用例題

前綴匹配、字符串檢索、詞頻統計，這些差很少都是一類題目，具體實現有一點點不一樣。

好比前綴匹配，咱們只須要pass就好了，用不到end；詞頻統計的話，咱們又只用獲得end了；若是隻是字符串檢索的話，那更方便了，end定義成bool變量就好了。具體用啥，怎麼用要變通。

題目連接： http://acm.hdu.edu.cn/showproblem.php?pid=1251

這題有點小坑，用 node* next[26] 交G++會超內存，交C++就不會。但確實用數組會浪費不少空間，推薦使用map實現。

#include <iostream>
#include <map>
#include <string>
using namespace std;

typedef struct node{
    int pass;
    map<char,struct node *>m;
}*trieTree;

void insert(trieTree T,string s) {
    for (int i = 0; i < s.length(); i++) {
        if (T->m.find(s[i]) == T->m.end()) {
            node *t = new node();
            T->m.insert(make_pair(s[i], t));
        }
        T = T->m[s[i]];
        T->pass++;
    }
}

int find(trieTree T, string s) {
    node *n = T;
    for (int i = 0; i < s.length(); i++) {
        if (T->m.find(s[i]) == T->m.end()) {
            return NULL;
        }
        T = T->m[s[i]];
    }
    return T->pass;
}
int main() {
    trieTree T = new node();
    string s;
    while (getline(cin,s)) {
        if (s.empty()) break;
        insert(T, s);
    }

    while (getline(cin,s)) {
        cout << find(T, s) << endl;
    }
    return 0;
}

此外，還適用於字符串排序，字典樹是一棵多叉樹，只要先序遍歷整棵樹，輸出相應的字符串即是按字典序排序的結果。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。