字典樹（trie樹

時間 2019-11-13

標籤字典 trie 简体版

原文原文鏈接

字典樹：node

大意：以消耗內存爲代價去節約時間。利用字符串的公共前綴來節約存儲空間。相對來講,Trie樹是一種比較簡單的數據結構.理解起來比較簡單,正所謂簡單的東西也得付出代價.故Trie樹也有它的缺點,Trie樹的內存消耗很是大。git

主要應用：統計和排序大量的字符串（但不只限於字符串），因此常常被搜索引擎系統用於文本詞頻統計web

例子：算法

給你100000個長度不超過10的單詞。對於每個單詞，咱們要判斷他出沒出現過，若是出現了，第一次出現第幾個位置。
若是咱們用最傻的方法，對於每個單詞，咱們都要去查找它前面的單詞中是否有它。那麼這個算法的複雜度就是O(n^2)。顯然對於100000的範圍難以接受。如今咱們換個思路想。假設我要查詢的單詞是abcd，那麼在他前面的單詞中，以b，c，d，f之類開頭的我顯然沒必要考慮。而只要找以a開頭的中是否存在abcd就能夠了。一樣的，在以a開頭中的單詞中，咱們只要考慮以b做爲第二個字母的……這樣一個樹的模型就漸漸清晰了……數組

例題：數據結構

Phone List

Time Limit: 3000/1000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Others)
Total Submission(s): 5081 Accepted Submission(s): 1714

this

Problem Description搜索引擎

Given a list of phone numbers, determine if it is consistent in the sense that no number is the prefix of another. Let’s say the phone catalogue listed these numbers:
1. Emergency 911
2. Alice 97 625 999
3. Bob 91 12 54 26
In this case, it’s not possible to call Bob, because the central would direct your call to the emergency line as soon as you had dialled the first three digits of Bob’s phone number. So this list would not be consistent.spa

Inputcode

The first line of input gives a single integer, 1 <= t <= 40, the number of test cases. Each test case starts with n, the number of phone numbers, on a separate line, 1 <= n <= 10000. Then follows n lines with one unique phone number on each line. A phone number is a sequence of at most ten digits.

Output

For each test case, output 「YES」 if the list is consistent, or 「NO」 otherwise.

Sample Input

2 3 911 97625999 91125426 5 113 12340 123440 12345 98346

code：

#include<stdio.h>
#include<string.h>
typedef struct node
{
    int num;    //標記該字符是不是某一字符串的結尾
    struct node *next[10];
}node;
node memory[1000000];
int k;
int insert(char *s,node *T)
{
    int i,len,id,j;
    node *p,*q;
    p=T;
    len=strlen(s);
    for(i=0;i<len;++i)
    {
        id=s[i]-'0';
        if(p->num==1)   //說明存在先前字符能夠做爲s的前綴----（先短後長）
            return 1;
        if(p->next[id]==NULL)
        {
            q=&memory[k++];
            q->num=0;
            for(j=0;j<10;++j)
                q->next[j]=NULL;
            p->next[id]=q;
        }
        p=p->next[id];
    }
    for(i=0;i<10;++i)      //若是p的後繼結點不爲空的話說明s時先前字符的前綴----（先長後短）
        if(p->next[i]!=NULL)
            return 1;
    p->num=1;
    return 0;
}
int main()
{
    int m,n,flag,i;
    node *T;
    char s[15];
    scanf("%d",&m);
    while(m--)
    {
        k=0;          //每次都從數組下標爲0的地方開始分配內存，可使內存循環利用，從而不會形成內存超限
        T=&memory[k++];
        T->num=0;
        for(i=0;i<10;++i)
            T->next[i]=NULL;
        flag=0;
        scanf("%d",&n);
        while(n--)
        {
            scanf("%s",s);
            if(flag)
                continue;
            if(insert(s,T))
                flag=1;
        }
        if(flag)
            printf("NO\n");
        else
            printf("YES\n");
    }
    return 0;
}

字典樹：

三個基本性質：

1. 根結點不包含字符，除根結點外每個結點都只包含一個字符。

2. 從根結點到某一結點，路徑上通過的字符鏈接起來，爲該結點對應的字符串。

3. 每一個結點的全部子結點包含的字符都不相同。

優勢：利用字符串的公共前綴來節約存儲空間,最大限度地減小無謂的字符串比較，查詢效率比哈希表高。

缺點：若是存在大量字符串且這些字符串基本沒有公共前綴，則相應的trie樹將很是消耗內存。