路由查找之Radix Tree

時間 2019-11-16

原文原文鏈接

什麼是Radix Tree

在計算機科學中，基數樹，或稱Patricia trie/tree，或crit bit tree，壓縮前綴樹，是一種更節省空間的Trie（前綴樹）。對於基數樹的每一個節點，若是該節點是惟一的子樹的話，就和父節點合併。node

golang的web框架echo和gin都使用了radix tree做爲路由查找的算法，咱們以gin的實現來分析一下。git

在gin的路由中，每個Http Method(GET, PUT, POST…)都對應了一棵 radix treegithub

func (engine *Engine) addRoute(method, path string, handlers HandlersChain) {
    // ...
    
    // 獲取method對應的樹，若是沒有就建立
    root := engine.trees.get(method)
    if root == nil {
        // 建立radix tree，只有根節點
        root = new(node)
        engine.trees = append(engine.trees, methodTree{method: method, root: root})
    }
    root.addRoute(path, handlers)
}
複製代碼

radix tree能夠被認爲是一棵簡潔版的前綴樹。擁有共同前綴的節點也共享同一個父節點。下面是一個GET方法對應的路由樹的結構：golang

Priority   Path             Handle
9          \                *<1>
3          ├s               nil
2          |├earch\         *<2>
1          |└upport\        *<3>
2          ├blog\           *<4>
1          |    └:post      nil
1          |         └\     *<5>
2          ├about-us\       *<6>
1          |        └team\  *<7>
1          └contact\        *<8>
複製代碼

*<num>是方法（handler）對應的指針，從根節點遍歷到葉子節點咱們就能獲得完整的路由表，圖中的示例實現瞭如下路由：web

GET("/", func1)
GET("/search/", func2)
GET("/support/", func3)
GET("/blog/", func4)
GET("/blog/:post/", func5)
GET("/about-us/", func6)
GET("/about-us/team/", func7)
GET("/contact/", func8)
複製代碼

:post是真實的post name的一個佔位符（就是一個參數）。這裏體現了radix tree相較於hash-map的一個優勢，樹結構容許咱們的路徑中存在動態的部分（參數）,由於咱們匹配的是路由的模式而不是hash值算法

爲了更具擴展性，每一層的節點按照priority排序，priority是節點的子節點（兒子節點，孫子節點等）註冊的handler的數量，這樣作有兩個好處：後端

被最多路徑包含的節點會被最早評估。這樣可讓儘可能多的路由快速被定位。
有點像成本補償。最長的路徑能夠被最早評估，補償體如今最長的路徑須要花費更長的時間來定位，若是最長路徑的節點能被優先評估（即每次拿子節點都命中），那麼所花時間不必定比短路徑的路由長。下面展現了節點（每一個-能夠看作一個節點）評估的路徑：從左到右，從上到下

├------------
├---------
├-----
├----
├--
├--
└-
複製代碼

節點數據結構

節點的數據結構以下：bash

type node struct {
    // 節點路徑，好比上面的s，earch，和upport
    path      string
    // 節點是不是參數節點，好比上面的:post
    wildChild bool
    // 節點類型，包括static, root, param, catchAll
    // static: 靜態節點，好比上面的s，earch等節點
    // root: 樹的根節點
    // catchAll: 有*匹配的節點
    // param: 參數節點
    nType     nodeType
    // 路徑上最大參數個數
    maxParams uint8
    // 和children字段對應, 保存的是分裂的分支的第一個字符
    // 例如search和support, 那麼s節點的indices對應的"eu"
    // 表明有兩個分支, 分支的首字母分別是e和u
    indices   string
    // 兒子節點
    children  []*node
    // 處理函數
    handlers  HandlersChain
    // 優先級，子節點註冊的handler數量
    priority  uint32
}
複製代碼

添加路由

func (n *node) addRoute(path string, handlers HandlersChain) {
    fullPath := path
    n.priority++
    numParams := countParams(path)
    // non-empty tree
    if len(n.path) > 0 || len(n.children) > 0 {
    walk:
        for {
            // Update maxParams of the current node
            if numParams > n.maxParams {
                n.maxParams = numParams
            }
            // Find the longest common prefix.
            // This also implies that the common prefix contains no ':' or '*'
            // since the existing key can't contain those chars.
            i := 0
            max := min(len(path), len(n.path))
            for i < max && path[i] == n.path[i] {
                i++
            }
            // Split edge
            // 開始分裂，好比一開始path是search，新來了support，s是他們匹配的部分，
            // 那麼會將s拿出來做爲parent節點，增長earch和upport做爲child節點
            if i < len(n.path) {
                child := node{
                    path:      n.path[i:],  // 不匹配的部分做爲child節點
                    wildChild: n.wildChild,
                    indices:   n.indices,
                    children:  n.children,
                    handlers:  n.handlers,
                    priority:  n.priority - 1,  // 降級成子節點，priority減1
                }
                // Update maxParams (max of all children)
                for i := range child.children {
                    if child.children[i].maxParams > child.maxParams {
                        child.maxParams = child.children[i].maxParams
                    }
                }
                
                // 當前節點的子節點變成剛剛分裂的出來的節點
                n.children = []*node{&child}
                // []byte for proper unicode char conversion, see #65
                n.indices = string([]byte{n.path[i]})
                n.path = path[:i]
                n.handlers = nil
                n.wildChild = false
            }
            // Make new node a child of this node
            // 將新來的節點插入新的parent節點做爲子節點
            if i < len(path) {
                path = path[i:]
					// 若是是參數節點（包含:或*）
                if n.wildChild {
                    n = n.children[0]
                    n.priority++
                    // Update maxParams of the child node
                    if numParams > n.maxParams {
                        n.maxParams = numParams
                    }
                    numParams--
                    // Check if the wildcard matches
                    // 例如：/blog/:pp 和 /blog/:ppp，須要檢查更長的通配符
                    if len(path) >= len(n.path) && n.path == path[:len(n.path)] {
                        // check for longer wildcard, e.g. :name and :names
                        if len(n.path) >= len(path) || path[len(n.path)] == '/' {
                            continue walk
                        }
                    }
                    panic("path segment '" + path +
                        "' conflicts with existing wildcard '" + n.path +
                        "' in path '" + fullPath + "'")
                }
					// 首字母，用來與indices作比較
                c := path[0]
                // slash after param
                if n.nType == param && c == '/' && len(n.children) == 1 {
                    n = n.children[0]
                    n.priority++
                    continue walk
                }
                // Check if a child with the next path byte exists
                // 判斷子節點中是否有和當前path有匹配的，只須要查看子節點path的第一個字母便可，即indices
                // 好比s的子節點如今是earch和upport，indices爲eu
                // 若是新來的路由爲super，那麼就是和upport有匹配的部分u，將繼續分類如今的upport節點
                for i := 0; i < len(n.indices); i++ {
                    if c == n.indices[i] {
                        i = n.incrementChildPrio(i)
                        n = n.children[i]
                        continue walk
                    }
                }
                // Otherwise insert it
                if c != ':' && c != '*' {
                    // []byte for proper unicode char conversion, see #65
                    // 記錄第一個字符，放在indices中
                    n.indices += string([]byte{c})
                    child := &node{
                        maxParams: numParams,
                    }
                    // 增長子節點
                    n.children = append(n.children, child)
                    n.incrementChildPrio(len(n.indices) - 1)
                    n = child
                }
                n.insertChild(numParams, path, fullPath, handlers)
                return
            } else if i == len(path) { // Make node a (in-path) leaf
                // 路徑相同，若是已有handler就報錯，沒有就賦值
                if n.handlers != nil {
                    panic("handlers are already registered for path ''" + fullPath + "'")
                }
                n.handlers = handlers
            }
            return
        }
    } else { // Empty tree，空樹，插入節點，節點種類是root
        n.insertChild(numParams, path, fullPath, handlers)
        n.nType = root
    }
}
複製代碼

此函數的主要目的是找到插入節點的位置，若是和現有節點存在相同的前綴，那麼要將現有節點進行分裂，而後再插入，下面是insertChild函數數據結構

插入子節點

// @1: 參數個數
// @2: 路徑
// @3: 完整路徑
// @4: 處理函數
func (n *node) insertChild(numParams uint8, path string, fullPath string, handlers HandlersChain) {
    var offset int // already handled bytes of the path
    // find prefix until first wildcard (beginning with ':'' or '*'')
    // 找到前綴，只要匹配到wildcard
    for i, max := 0, len(path); numParams > 0; i++ {
        c := path[i]
        if c != ':' && c != '*' {
            continue
        }
        // find wildcard end (either '/' or path end)
        end := i + 1
        for end < max && path[end] != '/' {
            switch path[end] {
            // the wildcard name must not contain ':' and '*'
            case ':', '*':
                panic("only one wildcard per path segment is allowed, has: '" +
                    path[i:] + "' in path '" + fullPath + "'")
            default:
                end++
            }
        }
        // check if this Node existing children which would be
        // unreachable if we insert the wildcard here
        if len(n.children) > 0 {
            panic("wildcard route '" + path[i:end] +
                "' conflicts with existing children in path '" + fullPath + "'")
        }
        // check if the wildcard has a name
        if end-i < 2 {
            panic("wildcards must be named with a non-empty name in path '" + fullPath + "'")
        }
        if c == ':' { // param
            // split path at the beginning of the wildcard
            if i > 0 {
                n.path = path[offset:i]
                offset = i
            }
            child := &node{
                nType:     param,
                maxParams: numParams,
            }
            n.children = []*node{child}
            n.wildChild = true
            n = child
            n.priority++
            numParams--
            // if the path doesn't end with the wildcard, then there
            // will be another non-wildcard subpath starting with '/'
            if end < max {
                n.path = path[offset:end]
                offset = end
                
                child := &node{
                    maxParams: numParams,
                    priority:  1,
                }
                n.children = []*node{child}
                // 下次循環這個新的child節點
                n = child
            }
        } else { // catchAll
            if end != max || numParams > 1 {
                panic("catch-all routes are only allowed at the end of the path in path '" + fullPath + "'")
            }
            if len(n.path) > 0 && n.path[len(n.path)-1] == '/' {
                panic("catch-all conflicts with existing handle for the path segment root in path '" + fullPath + "'")
            }
            // currently fixed width 1 for '/'
            i--
            if path[i] != '/' {
                panic("no / before catch-all in path '" + fullPath + "'")
            }
            n.path = path[offset:i]
            // first node: catchAll node with empty path
            child := &node{
                wildChild: true,
                nType:     catchAll,
                maxParams: 1,
            }
            n.children = []*node{child}
            n.indices = string(path[i])
            n = child
            n.priority++
            // second node: node holding the variable
            child = &node{
                path:      path[i:],
                nType:     catchAll,
                maxParams: 1,
                handlers:  handlers,
                priority:  1,
            }
            n.children = []*node{child}
            return
        }
    }
    // insert remaining path part and handle to the leaf
    n.path = path[offset:]
    n.handlers = handlers
}
複製代碼

insertChild函數是根據path自己進行分割, 將/分開的部分分別做爲節點保存, 造成一棵樹結構. 注意參數匹配中的:和*的區別, 前者是匹配一個字段, 後者是匹配後面全部的路徑app

路徑查找

匹配每一個children的path，最長匹配

// Returns the handle registered with the given path (key). The values of
// wildcards are saved to a map.
// If no handle can be found, a TSR (trailing slash redirect) recommendation is
// made if a handle exists with an extra (without the) trailing slash for the
// given path.
func (n *node) getValue(path string, po Params, unescape bool) (handlers HandlersChain, p Params, tsr bool) {
    p = po
walk: // Outer loop for walking the tree
    for {
        // 還沒有到達path的終點
        if len(path) > len(n.path) {
            // 前面一段須要一致
            if path[:len(n.path)] == n.path {
                path = path[len(n.path):]
                // If this node does not have a wildcard (param or catchAll)
                // child, we can just look up the next child node and continue
                // to walk down the tree
                if !n.wildChild {
                    c := path[0]
                    for i := 0; i < len(n.indices); i++ {
                        if c == n.indices[i] {
                            n = n.children[i]
                            continue walk
                        }
                    }
                    // Nothing found.
                    // We can recommend to redirect to the same URL without a
                    // trailing slash if a leaf exists for that path.
                    tsr = (path == "/" && n.handlers != nil)
                    return
                }
                // handle wildcard child
                n = n.children[0]
                switch n.nType {
                case param:
                    // find param end (either '/' or path end)
                    end := 0
                    for end < len(path) && path[end] != '/' {
                        end++
                    }
                    // save param value
                    if cap(p) < int(n.maxParams) {
                        p = make(Params, 0, n.maxParams)
                    }
                    i := len(p)
                    p = p[:i+1] // expand slice within preallocated capacity
                    p[i].Key = n.path[1:]
                    val := path[:end]
                    if unescape {
                        var err error
                        if p[i].Value, err = url.QueryUnescape(val); err != nil {
                            p[i].Value = val // fallback, in case of error
                        }
                    } else {
                        p[i].Value = val
                    }
                    // we need to go deeper!
                                        if end < len(path) {
                        if len(n.children) > 0 {
                            path = path[end:]
                            n = n.children[0]
                            continue walk
                        }
                        // ... but we can't
                        tsr = (len(path) == end+1)
                        return
                    }
                    if handlers = n.handlers; handlers != nil {
                        return
                    }
                    if len(n.children) == 1 {
                        // No handle found. Check if a handle for this path + a
                        // trailing slash exists for TSR recommendation
                        n = n.children[0]
                        tsr = (n.path == "/" && n.handlers != nil)
                    }
                    return
                case catchAll:
                    // save param value
                    if cap(p) < int(n.maxParams) {
                        p = make(Params, 0, n.maxParams)
                    }
                    i := len(p)
                    p = p[:i+1] // expand slice within preallocated capacity
                    p[i].Key = n.path[2:]
                    if unescape {
                        var err error
                        if p[i].Value, err = url.QueryUnescape(path); err != nil {
                            p[i].Value = path // fallback, in case of error
                        }
                    } else {
                        p[i].Value = path
                    }
                    handlers = n.handlers
                    return
                default:
                    panic("invalid node type")
                }
            }
        } else if path == n.path {
            // We should have reached the node containing the handle.
            // Check if this node has a handle registered.
            if handlers = n.handlers; handlers != nil {
                return
            }
            if path == "/" && n.wildChild && n.nType != root {
                tsr = true
                return
            }
            // No handle found. Check if a handle for this path + a
            // trailing slash exists for trailing slash recommendation
            for i := 0; i < len(n.indices); i++ {
                if n.indices[i] == '/' {
                    n = n.children[i]
                    tsr = (len(n.path) == 1 && n.handlers != nil) ||
                        (n.nType == catchAll && n.children[0].handlers != nil)
                    return
                }
            }
            return
        }
        // Nothing found. We can recommend to redirect to the same URL with an
        // extra trailing slash if a leaf exists for that path
        tsr = (path == "/") ||
            (len(n.path) == len(path)+1 && n.path[len(path)] == '/' &&
                path == n.path[:len(n.path)-1] && n.handlers != nil)
        return
    }
}
複製代碼