字典樹的實現和介紹

時間 2019-12-09

標籤字典實現介紹简体版

原文原文鏈接

優化老代碼的時候，用到了字典樹。我用Java寫了一個字典樹。分享一下。node

先說一下常見的引用場景，單詞匹配，統計（敏感詞檢測，單詞檢測），還有輸入提示等等。算法

下面是代碼了
node節點代碼數據庫

public class Node{
    private List<Node> nodeList = new ArrayList<>();
    private char word; //這裏保存的一個字符
    private int isEnd = 0; //這裏是一個結束標識

    public Node(char w){
        this.word = w;
    }

    public Node(){ }

    public List<Node> getNodeList() {
        return nodeList;
    }

    public void setNodeList(List<Node> nodeList) {
        this.nodeList = nodeList;
    }

    public char getWord() {
        return word;
    }

    public void setWord(char word) {
        this.word = word;
    }

    public int getIsEnd() {
        return isEnd;
    }

    public void setIsEnd(int isEnd) {
        this.isEnd = isEnd;
    }
}

Node節點重點就是保存的char和isEnd這個兩個屬性，這裏我保存的是字符串，其實能夠保存成utf8的編碼，防止一些編碼問題。
由於是多叉樹結構，可能這兩個單詞 sad，saddy，須要一個結束的標識位。json

添加節點代碼數據結構

public void addNode(List<Node> nodeList,char[] word){
        List<Node> temp = new ArrayList<>();
        //遍歷單詞
        for (int i=0;i < word.length; i++ ){
            //查看子節點
            for (int j = nodeList.size(); j >= 0; j--) {
                //有子節點而且字相同，則更新nodeList而且跳出循環，檢查下一個字
                if (j > 0 && nodeList.get(j-1).getWord() == word[i]) {
                    nodeList = nodeList.get(j-1).getNodeList();
                    break;
                //若是子節點爲零，則說明須要添加新節點    
                }else if(j == 0 ){
                    Node n = new Node(word[i]);
                    //判斷是否達到單詞結尾，添加標誌位
                    if( nodeList.size() == 0 && (i == word.length -1)){
                        n.setIsEnd(1);
                    }
                    temp = n.getNodeList();
                    nodeList.add(n);
                    //nodeList賦值給新節點，結束循環
                    nodeList = temp;
                }
            }
        }
    }

這一段須要注意的一點是，我是用了List這個數據結構，這個地方能夠優化爲Map結構，Hash表的時間複雜度是O(1)。優化

搜索單詞this

public boolean searchNode(List<Node> nodeList,char[] word){
    for (int i=0;i < word.length; i++ ){
        for (int j = nodeList.size() - 1; j >= 0; j--) {
            if (nodeList.get(j).getWord() == word[i]) {
                //單詞處於結尾，和有標誌位，則直接返回
                if( (i == word.length -1) && nodeList.get(j).getIsEnd() == 1){
                    return true;
                }
                nodeList = nodeList.get(j).getNodeList();
                break;
            }
        }
    }

    return false;
}

搜索文本編碼

public boolean searchText(List<Node> nodeList,char[] word){
    //記錄頭節點
    List<Node> head = nodeList;
    for (int i=0;i < word.length; i++ ){
        for (int j = nodeList.size() - 1; j >= 0; j--) {
            if (nodeList.get(j).getWord() == word[i]) {
            //搜索文本就不要判斷單詞是否處於結尾了，查到直接就返回結果
                if( nodeList.get(j).getIsEnd() == 1){
                    return true;
                }
                nodeList = nodeList.get(j).getNodeList();
                break;
            }
            //當節點沒有子節點，而且程序運行到此，將nodeList復位到頭節點
            if(j == 0){
                nodeList = head;
            }
        }
    }
    return false;
}

處理敏感詞部分，或者類似功能應該作分詞的處理。若是不作分詞處理的，會出現錯誤，好比瑪麗露A。日後再推一個單詞。
我這裏是一個字一個字去進行順序查找的。可是應該有相關的文本搜索算法和字典樹相結合。能夠提升效率。code

我這裏實現的是O（m*n）上面也提到了能夠優化到O（n），可是也比以前快了很多了。好比輸入提示，比每一次查詢數據庫之類的要快不少。若是字典樹更新不頻繁，好比地名，字典樹是能夠json化，保存到Redis中。這樣能夠給其餘語言去使用，並且比一次性查詢數據庫，以後再結構化，也是要快一點的。字符串

若是還哪裏寫錯了，或者有什麼更好的優化建議，歡迎討論。