嵌套集合模型(Nested set model)介紹

時間 2019-11-07

原文原文鏈接

原文連接： http://www.pilishen.com/posts...; 歡迎做客咱們的php&Laravel學習羣：109256050
本文翻譯自維基百科 Nested set model

nested set model(嵌套集合模型)是一種在關係型數據庫中表示nested sets（嵌套集合）的特殊技術。[nested sets]一般指的是關係樹或者層級關係。這個術語是由 Joe Celko清晰的提出來的，還有人使用不一樣的術語來描述這一技術。php

誘因

該技術的出現解決了標準關係代數和關係演算以及基於它們的SQL操做不能直接在層次結構上表示全部指望操做的問題。層級能夠用parent-child relation (父子關係)術語來表示 - Celko稱之爲 [adjacency list model]，可是若是能夠有任意的深度，這種模型不能用來展現相似的操做好比比較兩個元素的層級或者肯定一個元素是否位於另外一個元素的子層級，當一個層級結構是固定的或者有固定的深度，這種操做必須經過每一層的 relational join#Joins_and_join-like_operators) （關係鏈接）來實現。可是這將很低效。這一般被稱爲物料清單問題。node

經過切換到圖形數據庫，能夠很容易地表達層次結構。另外在一些關係型數據庫系統中存在並提供了這種關係模型的解決方案：數據庫

支持專門的層級結構數據類型，好比SQL的hierarchical query facility(層級查詢工具)。
使用層級操做擴展關係型語言，好比 nested relational algebra。
使用transitive closure擴展關係型語言，好比SQL的CONNECT語句；這能夠在parent-child relation 使用可是執行起來比較低效。
層級結構查詢能夠在支持循環且包裹關係的操做的語言中實現。好比 PL/SQL, T-SQL or a general-purpose programming language

當這些解決方案沒被提供或不容易實現，就必須使用另外一種方法express

技術

嵌套集模型是根據樹遍從來對節點進行編號，遍歷會訪問每一個節點兩次，按訪問順序分配數字，並在兩次訪問中都分配。這將爲每一個節點留下兩個數字，它們做爲節點兩個屬性存儲。這使得查詢變得高效：經過比較這些數字來得到層級結構關係。可是更新數據將須要給節點從新分配數字，所以變得低效。儘管很複雜可是能夠經過不使用整數而是用有理數來改進更新速度。app

例子

在衣服庫存目錄中，衣服可能會更加層級機構來分類：less

[](//en.wikipedia.org/wiki/File:NestedSetModel.svg)dom

[](//en.wikipedia.org/wiki/File:Clothing-hierarchy-traversal-2.svg)svg

處於層級結構頂端的Clothing分類包含全部的子類，所以它的左值和右值分別賦值爲1和22，後面的值即這裏的22是展示的全部節點總數的兩倍。下一層級包含Men's和Women's兩子類，各自包含必須被計算在內的層級。每一層的節點都根據它們包含的子層級來給左值和右值賦值。如上表所示。工具

表現

使用nested sets 將比使用一個遍歷adjacency list的儲存過程更快，對於天生缺少遞歸的查詢結構也是更快的選擇。好比MySQL.可是遞歸SQL查詢語句也能提供相似「迅速查詢後代」的語句而且在其餘深度搜索查詢是更快，因此也是對於提供這一功能的數據庫的更快選擇。例如 PostgreSQL,[[5]](//en.wikipedia.org/wiki/Nested_set_model#cite_note-5)
Oracle,[[6]](//en.wikipedia.org/wiki/Nested_set_model#cite_note-6)
and Microsoft SQL Server.[[7]](//en.wikipedia.org/wiki/Nested_set_model#cite_note-7)post

缺點

The use case for a dynamic endless database tree hierarchy is rare. The Nested Set model is appropriate where the tree element and one or two attributes are the only data, but is a poor choice when more complex relational data exists for the elements in the tree. Given an arbitrary starting depth for a category of 'Vehicles' and a child of 'Cars' with a child of 'Mercedes', a foreign key table relationship must be established unless the tree table is naively non-normalized. Attributes of a newly created tree item may not share all attributes with a parent, child or even a sibling. If a foreign key table is established for a table of 'Plants' attributes, no integrity is given to the child attribute data of 'Trees' and its child 'Oak'. Therefore, in each case of an item inserted into the tree, a foreign key table of the item's attributes must be created for all but the most trivial of use cases.
If the tree isn't expected to change often, a properly normalized hierarchy of attribute tables can be created in the initial design of a system, leading to simpler, more portable SQL statements; specifically ones that don't require an arbitrary number of runtime, programmatically created or deleted tables for changes to the tree. For more complex systems, hierarchy can be developed through relational models rather than an implicit numeric tree structure. Depth of an item is simply another attribute rather than the basis for an entire DB architecture. As stated in SQL Antipatterns:[[8]](//en.wikipedia.org/wiki/Nested_set_model#cite_note-8)

Nested Sets is a clever solution – maybe too clever. It also fails to support referential integrity. It’s best used when you need to query a tree more frequently than you need to modify the tree.[[9]](//en.wikipedia.org/wiki/Nested_set_model#cite_note-9)

The model doesn't allow for multiple parent categories. For example, an 'Oak' could be a child of 'Tree-Type', but also 'Wood-Type'. An additional tagging or taxonomy has to be established to accommodate this, again leading to a design more complex than a straightforward fixed model.
Nested sets are very slow for inserts because it requires updating left and right domain values for all records in the table after the insert. This can cause a lot of database stress as many rows are rewritten and indexes rebuilt. However, if it is possible to store a forest of small trees in table instead of a single big tree, the overhead may be significantly reduced, since only one small tree must be updated.
The nested interval model does not suffer from this problem, but is more complex to implement, and is not as well known. It still suffers from the relational foreign-key table problem. The nested interval model stores the position of the nodes as rational numbers expressed as quotients (n/d). [[1]](//www.sigmod.org/publications/sigmod-record/0506/p47-article-tropashko.pdf)

變體

使用上面描述的nested set modal 在一些特定的樹遍歷操做上有性能限制。好比根據父節點查找直接子節點須要刪選子樹到一個指定的層級以下所示：

SELECT Child.Node, Child.Left, Child.Right
FROM Tree as Parent, Tree as Child
WHERE
    Child.Left BETWEEN Parent.Left AND Parent.Right
    AND NOT EXISTS (    -- No Middle Node
        SELECT *
        FROM Tree as Mid
        WHERE Mid.Left BETWEEN Parent.Left AND Parent.Right
                 AND Child.Left BETWEEN Mid.Left AND Mid.Right
            AND Mid.Node NOT IN (Parent.Node AND Child.Node)
    )
    AND Parent.Left = 1  -- Given Parent Node Left Index

或者:

SELECT DISTINCT Child.Node, Child.Left, Child.Right
FROM Tree as Child, Tree as Parent 
WHERE Parent.Left < Child.Left AND Parent.Right > Child.Right  -- associate Child Nodes with ancestors
GROUP BY Child.Node, Child.Left, Child.Right
HAVING max(Parent.Left) = 1  -- Subset for those with the given Parent Node as the nearest ancestor

當查詢不止一層深度的子節點的時候，查詢將更加的複雜，爲了突破限制和簡化遍歷樹，在模型上增長一個額外的字段來維護樹內節點的深度：

在這個模型中，找到指定父節點的緊跟直接子節點可使用下面的SQL語句實現：

SELECT Child.Node, Child.Left, Child.Right
FROM Tree as Child, Tree as Parent
WHERE
    Child.Depth = Parent.Depth + 1
    AND Child.Left > Parent.Left
    AND Child.Right < Parent.Right
    AND Parent.Left = 1  -- Given Parent Node Left Index

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。