前綴樹 - 一種好玩的樹型數據結構

時間 2019-12-05

標籤前綴一種好玩數據結構简体版

原文原文鏈接

上篇內容有在介紹 Gin 的路由實現時提到了前綴樹，此次咱們稍微深刻探究一下前綴樹的實現。
本文以一道編程題爲例，講述前綴樹的實現，以及前綴樹的一種優化形態壓縮前綴樹。

MapSum 問題

LeetCode 上有一道編程題是這樣的算法

實現一個 MapSum 類裏的兩個方法，insert 和 sum。
對於方法 insert，你將獲得一對（字符串，整數）的鍵值對。字符串表示鍵，整數表示值。若是鍵已經存在，那麼原來的鍵值對將被替代成新的鍵值對。
對於方法 sum，你將獲得一個表示前綴的字符串，你須要返回全部以該前綴開頭的鍵的值的總和。
示例 1:編程

輸入: insert("apple", 3), 輸出: Null
輸入: sum("ap"), 輸出: 3
輸入: insert("app", 2), 輸出: Null
輸入: sum("ap"), 輸出: 5

前綴樹

根據題意，咱們定義的 MapSum 的數據結構爲：segmentfault

type MapSum struct {
   char        byte
   children    map[byte]*MapSum
   val         int
}

/** Initialize your data structure here. */
func Constructor() MapSum { 
}

func (this *MapSum) Insert(key string, val int)  {  
}

func (this *MapSum) Sum(prefix string) int {  
}

假設輸入數據爲：數組

m := Constructor()
m.Insert("inter", 1)
m.Insert("inner", 2)
m.Insert("in", 2)
m.Insert("if", 4)
m.Insert("game", 8)

則構造的前綴樹應該是：數據結構

前綴樹特性：app

根節點不包含字符，除根節點外的每個子節點都包含一個字符
從根節點到某一節點的路徑上的字符鏈接起來，就是該節點對應的字符串。
每一個節點的全部子節點包含的字符都不相同。

Insert 函數

Insert 函數的簽名：框架

func (this *MapSum) Insert(key string, val int)

咱們把 this 當作父節點，當插入的 key 長度爲 1 時，則直接說明 key 對應的節點應該是 this 的孩子節點。函數

if len(key) == 1 {
   for i, m := range this.children {
      // c 存在與孩子節點
      // 直接更新
      if i == c {
         m.val = val
         return
      }
   }

   // 未找到對應孩子
   // 直接生成新孩子
   this.children[c] = &MapSum{
      char: c,
      val: val,
      children: make(map[byte]*MapSum),
   }

   return
}

當插入的 key 長度大於 1，則尋找 key[0] 對應的子樹，若是不存在，則插入新孩子節點；設置 this = this.children[key[0]] 繼續迭代;優化

c := key[0]
for i, m := range this.children {
   if i == c {
      key = key[1:]
      this = m
      continue walk
   }
}

// 未找到節點
this.children[c] = &MapSum{
   char: c,
   val: 0,
   children: make(map[byte]*MapSum),
}

this = this.children[c]
key = key[1:]
continue walk

Sum 函數

Sum 函數簽名：this

func (this *MapSum) Sum(prefix string) int

Sum 函數的基本思想爲：先找到前綴 prefix 對應的節點，而後統計以該節點爲樹根的的子樹的 val 和。

// 先找到符合前綴的節點
// 而後統計和
for prefix != "" {
   c := prefix[0]
   var ok bool
   if this, ok = this.children[c]; ok {
      prefix = prefix[1:]
      continue
   } else{
      // prefix 不存在
      return 0
   }
}
return this.sumNode()

sumNode 函數統計了子樹的 val 和，使用遞歸遍歷樹：

s := this.val
for _, child := range this.children{
   s += child.sumNode()
}
return s

以上是一種標準的前綴樹的作法。當字符串公用的節點比較少的時候，對於每一個字符都要建立單獨的節點，有點浪費空間。有一種壓縮前綴樹的算法，在處理前綴樹問題的時候可以使用更少的節點。

壓縮前綴樹

對與上面的例子來講，壓縮前綴樹是這樣的結果：

對於該例子來講，明顯少了不少節點。另外，咱們的 MapSum 結構體也稍微有了變化：

type MapSum struct {
   // 以前的 char  byte 變成了 key  string
   key       string
   children   map[byte]*MapSum
   val       int
}

Insert

壓縮前綴樹與前綴樹的實現不一樣點在於節點的分裂。好比，當樹中已經存在 "inner", "inter" 的狀況加，再加入 "info" 時，原 "in" 節點須要分裂成 "i" -> "n" 兩個節點，如圖：

在 Insert 時，須要判斷當前插入字符串 key 與節點字符串 this.key 的最長公共前綴長度 n：

minLen := min(len(key), len(this.key))
// 找出最長公共前綴長度 n
n := 0
for n < minLen && key[n] == this.key[n] {
   n ++
}

而後拿 n 與 len(this.key) 比較，若是比 this.key 長度短，則 this.key 須要分裂，不然，不須要分裂。

this 節點分裂邏輯：

// 最前公共前綴 n < len(this.key)
// 則該節點須要分裂
child := &MapSum{
   val: this.val,
   key: this.key[n:],
   children: this.children,
}

// 更新當前節點
this.key = this.key[:n]
this.val = 0
this.children = make(map[byte]*MapSum)
this.children[child.key[0]] = child

而後再判斷 n 與 len(key)，若是 n == len(key)，則說明 key 對應該節點。直接更新 val

if n == len(key) {
   this.val = val
   return
}

n < len(key) 時，若是有符合條件子樹，則繼續迭代，不然直接插入孩子節點：

key = key[n:]
c := key[0]

// 若是剩餘 子key 的第一個字符存在與 children
// 則繼續向下遍歷樹
if a, ok := this.children[c]; ok {
   this = a
   continue walk
} else{
   // 不然，新建節點
   this.children[c] = &MapSum{
      key: key,
      val: val,
      children: make(map[byte]*MapSum),
   }
   return
}

以上是壓縮前綴樹的作法。

算法優化

上述 MapSum 的 children 使用的是 map，可是 map 通常佔用內存較大。可使用節點數組children + 節點前綴數組 indices 的方式維護子節點，其中 indices 與 children 一一對應。

此時的結構體應該是這樣的：

type MapSum struct {
   key        string
   indices    []byte
   children   []*MapSum
   val        int
}

查找子樹時，須要拿 key[:n][0] 與 indices 中的字符比較，找到下標後繼續迭代子樹；未找到時插入子樹便可。

以上。

Y_xx