數據結構之Trie字典樹

時間 2021-02-03

標籤 java node 算法數組數據結構 ide this 搜索引擎設計指針欄目 Java 简体版

原文原文鏈接

什麼是Trie字典樹

Trie 樹，也叫「字典樹」或「前綴樹」。顧名思義，它是一個樹形結構。但與二分搜索樹、紅黑樹等不一樣的是，Trie 樹是一種多叉樹，即每一個節點能夠有 m 個子節點。它是一種專門處理字符串匹配的數據結構，用來解決在一組字符串集合中快速查找某個字符串的問題。java

例如，在一個字典中有 $n$ 個條目，若是使用普通的二分搜索樹（不考慮退化），那麼在該字典中查詢指定條目的時間複雜度是 $O(logn)$，若是有100w個條目（$2^{20}$），$logn$ 大約爲20。node

而若是使用 Trie 樹的話，查詢每一個條目的時間複雜度，和字典中一共有多少條目無關。時間複雜度爲 $O(w)$，其中 $w$ 爲查詢單詞的長度，並且絕大多數的單詞長度都小於 10。因而可知，使用 Trie 樹實現字符串查詢，特別是只查詢其前綴的狀況下，是比普通的樹形結構效率要更高的。算法

那麼 Trie 樹是如何作到其查詢時間複雜度與條目數量無關的呢？這是由於 Trie 樹的本質，就是利用字符串之間的公共前綴，將重複的前綴合並在一塊兒。例如，咱們將：how，hi，her，hello，so，see 這6個字符串構形成一顆 Trie 樹。那麼，最後構造出來的就是下面這個圖中的樣子：
數組

其中，根節點不包含任何信息。每一個節點表示一個字符串中的字符，從根節點到紅色節點的一條路徑表示一個字符串（注意：紅色節點並不都是葉子節點）。數據結構

爲了更容易理解 Trie 樹是怎麼構造出來的，咱們能夠看以下 Trie 樹構造的分解過程。構造過程的每一步，都至關於往 Trie 樹中插入一個字符串。當全部字符串都插入完成以後，Trie 樹就構造好了：

ide

當咱們在 Trie 樹中查找一個字符串的時候，好比查找字符串「her」，那咱們將要查找的字符串分割成單個的字符 h，e，r，而後從 Trie 樹的根節點開始匹配。如圖所示，綠色的路徑就是在 Trie 樹中匹配的路徑：
this

以前有提到過， Trie 樹是多叉樹，那麼這個「多叉」是怎麼體現的呢？一般來說，若是你只針對小寫字母構造一棵 Trie 樹，就像咱們上面的例子，那麼每一個節點中使用一個長度爲26的數組來表示其多個子節點便可。以下所示：搜索引擎

class Node {
    char data;
    Node children[26];
}

而若是咱們的需求不單單是隻包含小寫字母，但願這是一棵通用的 Trie 樹，那麼就須要設計一個能動態變化的子節點容器，使得每一個節點有若干指向下個節點的指針。例如，咱們能夠使用一個 Map 來實現，以下所示：設計

class Node {
    boolean isWord;  // 標識是不是單詞的結尾
    Map<Character, Node> next;
}

Trie字典樹基礎代碼

經過以上的介紹，咱們已經瞭解到了 Trie 樹的基本概念。接下來，讓咱們實現一下 Trie 樹的基礎功能代碼，從代碼上對 Trie 樹有個直觀的認識。具體代碼以下：指針

package tree;

import java.util.Map;
import java.util.TreeMap;

/**
 * Trie樹
 *
 * @author 01
 * @date 2021-01-28
 **/
public class TrieTree {

    private final Node root;

    private int size;

    /**
     * Trie樹中每一個節點的結構
     */
    private static class Node {
        /**
         * 標識是不是單詞的結尾
         */
        private boolean isWord;

        /**
         * 使用Map來實現動態存儲多個子節點
         */
        private final Map<Character, Node> next;

        public Node(boolean isWord) {
            this.isWord = isWord;
            next = new TreeMap<>();
        }

        public Node() {
            this(false);
        }
    }

    public TrieTree() {
        root = new Node();
        size = 0;
    }

    /**
     * 獲取Trie中存儲的單詞數量
     */
    public int getSize() {
        return size;
    }

    /**
     * 向Trie中添加一個新的單詞word
     */
    public void add(String word) {
        Node current = root;
        for (int i = 0; i < word.length(); i++) {
            char c = word.charAt(i);
            if (current.next.get(c) == null) {
                // 沒有與之對應的子節點，建立一個新的子節點
                current.next.put(c, new Node());
            }
            current = current.next.get(c);
        }

        if (!current.isWord) {
            // 添加的是新的單詞，標識該節點是單詞的結尾
            current.isWord = true;
            size++;
        }
    }
}

Trie字典樹的查詢

Trie 字典樹的查詢主要就是查詢某個單詞是否存在於 Trie 中，其主要邏輯與 add 方法基本上是同樣的。代碼以下：

/**
 * 查詢單詞word是否在Trie中
 */
public boolean contains(String word){
    Node current = root;
    for (int i = 0; i < word.length(); i++) {
        char c = word.charAt(i);
        if (current.next.get(c) == null) {
            return false;
        }
        current = current.next.get(c);
    }

    // 只有當最後一個字母所對應的節點標識了是一個單詞的結尾，
    // 才能認爲這個單詞存在於Trie中
    return current.isWord;
}

Trie字典樹的前綴查詢

相比於查詢某個單詞是否存在 Trie 樹中，前綴查詢的使用範圍更廣，也是 Trie 樹中的主要查詢操做。經過前綴查詢，咱們能夠實現像搜索引擎那樣的搜索關鍵詞提示功能。實現前綴查詢的代碼與查詢某個單詞基本上是同樣的，以下所示：

/**
 * 查詢是否在Trie中有單詞以prefix爲前綴
 */
public boolean hasPrefix(String prefix) {
    Node current = root;
    for (int i = 0; i < prefix.length(); i++) {
        char c = prefix.charAt(i);
        if (current.next.get(c) == null) {
            return false;
        }
        current = current.next.get(c);
    }

    return true;
}

Trie字典樹和簡單的模式匹配

接下來，咱們嘗試使用Trie字典樹來解決LeetCode上的一個簡單模式匹配的問題，該問題的編號是211：

https://leetcode-cn.com/problems/design-add-and-search-words-data-structure/description/

關於這個問題的詳細內容，能夠查看以上連接，這裏就不作贅述了。對於該問題，具體的實現代碼以下：

package tree.trie;

import java.util.Map;
import java.util.TreeMap;

/**
 * Leetcode 211. Add and Search Word - Data structure design
 * https://leetcode.com/problems/add-and-search-word-data-structure-design/description/
 *
 * @author 01
 */
public class WordDictionary {

    private static class Node {

        private boolean isWord;
        private final Map<Character, Node> next;

        public Node(boolean isWord) {
            this.isWord = isWord;
            next = new TreeMap<>();
        }

        public Node() {
            this(false);
        }
    }

    private final Node root;

    /**
     * Initialize your data structure here.
     */
    public WordDictionary() {
        root = new Node();
    }

    /**
     * Adds a word into the data structure.
     */
    public void addWord(String word) {
        Node cur = root;
        for (int i = 0; i < word.length(); i++) {
            char c = word.charAt(i);
            if (cur.next.get(c) == null) {
                cur.next.put(c, new Node());
            }
            cur = cur.next.get(c);
        }
        cur.isWord = true;
    }

    /**
     * Returns if the word is in the data structure.
     * A word could contain the dot character '.' to represent any one letter.
     */
    public boolean search(String word) {
        return match(root, word, 0);
    }

    private boolean match(Node node, String word, int index) {
        // 遞歸到底了，返回該節點是不是一個單詞
        if (index == word.length()) {
            return node.isWord;
        }

        char c = word.charAt(index);
        if (c != '.') {
            if (node.next.get(c) == null) {
                return false;
            }

            // 遞歸繼續匹配下一個字母
            return match(node.next.get(c), word, index + 1);
        } else {
            // 包含通配符，須要遍歷匹配所有子節點
            for (char nextChar : node.next.keySet()) {
                if (match(node.next.get(nextChar), word, index + 1)) {
                    return true;
                }
            }

            return false;
        }
    }
}

Trie字典樹和字符串映射

最後，咱們再來解決一個LeetCode上的677號問題，該問題的連接以下：

https://leetcode-cn.com/problems/map-sum-pairs/

對於該問題咱們就是要將Trie字典樹做爲一個映射，每一個單詞就是一個 key，對應着一個 value，該 value 只存在於單詞最後一個字母對應的節點。以下圖所示：

有了這個形象的概念後，代碼編寫起來就簡單了，因此也建議各位實現算法和數據結構時能夠嘗試多畫圖。對於該問題的具體實現代碼以下：

package tree.trie;

import java.util.Map;
import java.util.TreeMap;

/**
 * 鍵值映射
 * https://leetcode-cn.com/problems/map-sum-pairs/
 *
 * @author 01
 */
public class MapSum {

    private static class Node {

        private int value;
        private final Map<Character, Node> next;

        public Node(int value) {
            this.value = value;
            next = new TreeMap<>();
        }

        public Node() {
            this(0);
        }
    }

    private final Node root;

    /**
     * Initialize your data structure here.
     */
    public MapSum() {
        root = new Node();
    }

    public void insert(String key, int val) {
        Node cur = root;
        for (int i = 0; i < key.length(); i++) {
            char c = key.charAt(i);
            if (cur.next.get(c) == null) {
                cur.next.put(c, new Node());
            }
            cur = cur.next.get(c);
        }
        cur.value = val;
    }

    public int sum(String prefix) {
        Node cur = root;
        // 找到這個前綴最後一個字母所對應的節點
        for (int i = 0; i < prefix.length(); i++) {
            char c = prefix.charAt(i);
            if (cur.next.get(c) == null) {
                return 0;
            }
            cur = cur.next.get(c);
        }

        // 對該節點全部路徑下的子節點的value進行求和
        return sum(cur);
    }

    private int sum(Node node) {
        int res = node.value;
        // 遍歷全部子節點
        for (char c : node.next.keySet()) {
            // 對每一個子節點路徑上的value進行遞歸求和
            res += sum(node.next.get(c));
        }

        return res;
    }
}