發現幾種樹結構數據庫存儲方案

時間 2019-11-12

標籤發現幾種結構數據庫存儲方案欄目 SQL 简体版

原文原文鏈接

最近在開發jSqlBox過程當中，研究樹形結構的操做，忽然發現一種新的樹結構數據庫存儲方案，在網上找了一下，沒有找到雷同的（也多是花的時間不夠），現介紹以下: 目前常見的樹形結構數據庫存儲方案有如下四種，可是都存在必定問題:
1)Adjacency List:：記錄父節點。優勢是簡單，缺點是訪問子樹須要遍歷，發出許多條SQL，對數據庫壓力大。
2)Path Enumerations：用一個字符串記錄整個路徑。優勢是查詢方便，缺點是插入新記錄時要手工更改此節點如下全部路徑，很容易出錯。
3)Closure Table：專門一張表維護Path，缺點是佔用空間大，操做不直觀。
4)Nested Sets：記錄左值和右值，缺點是複雜難操做。
以上方法都存在一個共同缺點：操做不直觀，不能直接看到樹結構，不利於開發和調試。
本文介紹的方法我暫稱它爲「簡單粗暴多列存儲法」，它與Path Enumerations有點相似，但區別是用不少的數據庫列來存儲一個佔位符(1或空值)，以下圖(https://github.com/drinkjava2/Multiple-Columns-Tree/blob/master/treemapping.jpg) 左邊的樹結構，映射在數據庫裏的結構見右圖表格：
java

各類SQL操做以下：git

1.獲取(或刪除)指定節點下全部子節點，已知節點的行號爲"X",列名"cY":
select *(or delete) from tb where 
  line>=X and line<(select min(line) from tb where line>X and  (cY=1 or c(Y-1)=1 or c(Y-2)=1 ... or c1=1))
例如獲取D節點及其全部子節點：
select * from tb where line>=7 and line< (select min(line) from tb where line>7 and (c2=1 or c1=1)) 
刪除D節點及其全部子節點：
delete from tb where line>=7 and line< (select min(line) from tb where line>7 and (c2=1 or c1=1)) 

僅獲取D節點的次級全部子節點：
select * from tb where line>=7 and c3=1 and line< (select min(line) from tb where line>7 and (c2=1 or c1=1)) 

2.查詢指定節點的根節點, 已知節點的行號爲"X",列名"cY":
select * from tb where line=(select max(line) from tb where line<=X and c1=1)
例如查I節點的根節點：
select * from tb where line=(select max(line) from tb where line<=12 and c1=1) 

3.查詢指定節點的上一級父節點, 已知節點的行號爲"X",列名"cY":
select * from tb where line=(select max(line) from tb where line<X and c(Y-1)=1)
例如查L節點的上一級父節點：
select * from tb where line=(select max(line) from tb where line<11 and c3=1) 

3.查詢指定節點的全部父節點, 已知節點的行號爲"X",列名"cY":
select * from tb where line=(select max(line) from tb where line<X and c(Y-1)=1)
union select * from tb where line=(select max(line) from tb where line<X and c(Y-2)=1)
...
union select * from tb where line=(select max(line) from tb where line<X and c1=1)
例如查I節點的全部父節點：
select * from tb where line=(select max(line) from tb where line<12 and c2=1)
union  select * from tb where line=(select max(line) from tb where line<12 and c1=1) 

4.插入新節點：
視需求而定，例如在J和K之間插入一個新節點T：
update tb set line=line+1 where line>=10;
insert into tb (line,id,c4) values (10,'T',1)
這是與Path Enumerations模式最大的區別，插入很是方便，只須要利用SQL將後面的全部行號加1便可，無須花很大精力維護path字串，  
不容易出錯。
另外若是表很是大，爲了不update tb set line=line+1 形成全表更新，影響性能，能夠考慮增長
一個GroupID字段，同一個根節點下的全部節點共用一個GroupID，全部操做均在groupID組內進行，例如插入新節點改成:
update tb set line=line+1 where groupid=2 and line>=8;
insert into tb (groupid,line,c4) values (2, 8,'T')
由於一個groupid下的操做不會影響到其它groupid,對於複雜的增刪改操做甚至能夠在內存中完成操做後，一次性刪除整個group的內容  
並從新插入一個新group便可。

總結：
以上介紹的這種方法優勢有：
1）直觀易懂，方便調試，是全部樹結構數據庫方案中惟一所見即所得，可以直接看到樹的形狀的方案，空值的採用使得樹形結構一目瞭然。
2）SQL查詢、刪除、插入很是方便，沒有用到Like語法。
3）只須要一張表
4)兼容全部數據庫
5)佔位符即爲實際要顯示的內容應出現的地方，方便編寫Grid之類的表格顯示控件 github

缺點有 1)不是無限深度樹，數據庫最大容許列數有限制，一般最多爲1000，這致使了樹的深度不能超過1000，並且考慮到列數過多對性能也有影響, 使用時建議定一個比較小的深度限制例如100。
2)SQL語句比較長，不少時候會出現c9=1 or c8=1 or c7=1 ... or c1=1這種n階乘式的查詢條件
3)樹的節點總體移動操做比較麻煩，須要將整個子樹平移或上下稱動，當節點需要常常移動時，不建議採用這種方案。對於一些只增減，不常移動節點的應用如論壇貼子和評論倒比較合適。
4)列很是多時，空間佔用有點大。算法

如下爲追加內容，是在前述基礎上，一種更簡單的無限深度樹方案

突然發現上面的方法仍是太笨了，若是不用多列而是隻用一個列來存儲深度等級，則能夠不受數據庫列數限制，從而進化爲無限深度樹，雖然再也不具備所見即所得的效果，可是在性能和簡單性上要遠遠超過上述「簡單粗暴多列存儲法」，暫時給它取名"朱氏深度樹V2.0法"(備註：若是已有人發明了這個方法，刪掉前兩個字就行了），方法以下：以下圖 (https://github.com/drinkjava2/Multiple-Columns-Tree/blob/master/treemappingv2.png) 左邊的樹結構，映射在數據庫裏的結構見右圖表格，注意每一個表格的最後一行必須有一個END標記，level設爲0： sql

1.獲取指定節點下全部子節點，已知節點的行號爲X,level爲Y, groupID爲Z
select * from tb2 where groupID=Z and 
  line>=X and line<(select min(line) from tb where line>X and level<=Y and groupID=Z)
例如獲取D節點及其全部子節點：
select * from tb2 where groupID=1 and 
  line>=7 and line< (select min(line) from tb2 where groupid=1 and line>7 and level<=2)
刪除和獲取類似，只要將sql中select * 換成delete便可。

僅獲取D節點的次級全部子節點：(查詢條件加一個level=Y+1便可)：
select * from tb2 where groupID=1 and 
  line>=7 and level=3 and line< (select min(line) from tb2 where groupid=1 and line>7 and level<=2) 

2.查詢任意節點的根節點, 已知節點的groupid爲Z
select * from tb2 where groupID=Z and line=1 (或level=1) 

3.查詢指定節點的上一級父節點, 已知節點的行號爲X,level爲Y, groupID爲Z
select * from tb2 where groupID=Z and 
  line=(select max(line) from tb2 where groupID=Z and line<X and level=(Y-1))
例如查L節點的上一級父節點：
select * from tb2 where groupID=1 
  and line=(select max(line) from tb2 where groupID=1 and line<11 and level=3) 

4.查詢指定節點的全部父節點, 已知節點的行號爲X,深度爲Y:
select * from tb2 where groupID=Z and 
  line=(select max(line) from tb2 where groupID=Z and line<X and level=(Y-1))
union select * from tb2 where groupID=Z and 
  line=(select max(line) from tb2 where groupID=Z and line<X and level=(Y-2))
...
union select * from tb2 where groupID=Z and 
  line=(select max(line) from tb2 where groupID=Z and line<X and level=1)
例如查I節點的全部父節點：
select * from tb2 where groupID=1 and 
  line=(select max(line) from tb2 where groupID=1 and line<12 and level=2)
union  select * from tb2 where groupID=1 and 
  line=(select max(line) from tb2 where groupID=1 and line<12 and level=1)

5.插入新節點：例如在J和K之間插入一個新節點T：
update tb2 set line=line+1 where  groupID=1 and line>=10;
insert into tb (groupid,line,id,level) values (1,10,'T',4);

總結：此方法優勢有：
1）是無限深度樹
2）雖然不象第一種方案那樣具備所見即所得的效果，可是依然具備直觀易懂，方便調試的特色。
3）能充分利用SQL，查詢、刪除、插入很是方便，SQL比第一種方案簡單多了，也沒有用到like模糊查詢語法。
4）只須要一張表。
5）兼容全部數據庫。
6）佔用空間小數據庫

缺點有:
1)樹的節點總體移動操做有點麻煩, 適用於一些只增減，不常移動節點的場合如論壇貼子和評論等。當確實須要進行復雜的移動節點操做時，一種方案是在內存中進行整個樹的操做並完成排序，操做完成後刪除整個舊group再總體將新group一次性批量插入數據庫。app

1月22日補充：
節點的移動操做有點麻煩，只是相對於查詢/刪除/插入來講，並非說難上天了。例如在MySQL下移動整個B節點樹到H節點下，並位於J和K之間的操做以下：
update tb2 set tempno=line*1000000 where groupid=1;
set @nextNodeLine=(select min(line) from tb2 where groupid=1 and line>2 and level<=2);
update tb2 set tempno=9*1000000+line, level=level+2 where groupID=1 and line>=2 and line< @nextNodeLine;
set @mycnt=0;
update tb2 set line=(@mycnt := @mycnt + 1) where groupid=1 order by tempno;
上例須要在表中新增一個名爲tempno的整數類型列，這是個懶人算法，雖然簡單明瞭，可是對整棵樹進行了從新排序，因此效率並不高。在須要頻繁移動節點的場合下，用Adjacency List方案可能更合適一些。性能

如果須要頻繁移動節點的場合，又想保留方案2高效查詢的優勢，還有一種方案就是再添加一個父節點pid字段和兩個輔助字段tempno和 temporder用於排序，(暫時稱其爲「深度樹V3.0法"), 這樣至關於V2.0法和Adjacency List模式的合併了，優勢是每次移動節點，只須要更改PID便可，不須要複雜的算法，一次能夠任意移動、增長、刪除多個節點，最後統一調用如下算法簡單地進行一下重排序便可，下面這個示例完整演示了一個Adjacency List模式到V2.0模式的轉換，這至關於一個從新給樹建查詢索引的過程：設計

create table tb3 (
id varchar(10),
comments varchar(55),
pid varchar(10),
line integer,
level integer,
tempno bigint,
temporder integer
)

insert into tb3 (id,comments,Pid) values('A','found a bug',null);
insert into tb3 (id,comments,Pid) values('B','is a worm','A');
insert into tb3 (id,comments,Pid) values('C','no','A');
insert into tb3 (id,comments,Pid) values('D','is a bug','A');
insert into tb3 (id,comments,Pid) values('E','oh, a bug','B');
insert into tb3 (id,comments,Pid) values('F','solve it','B');
insert into tb3 (id,comments,Pid) values('G','careful it bites','C');
insert into tb3 (id,comments,Pid) values('H','it does not bit','D');
insert into tb3 (id,comments,Pid) values('I','found the reason','D');
insert into tb3 (id,comments,Pid) values('J','solved','H');
insert into tb3 (id,comments,Pid) values('K','uploaded','H');
insert into tb3 (id,comments,Pid) values('L','well done!','H');

set @mycnt=0;
update tb3 set  line=0,level=0, tempno=0, temporder=(@mycnt := @mycnt + 1) order by id;
update tb3 set level=1, line=1 where pid is null;

update tb3 set tempno=line*10000000 where line>0; 
update tb3 a, tb3 b set a.level=2, a.tempno=b.tempno+a.temporder where a.level=0 and 
a.pid=b.id and b.level=1;
set @mycnt=0;
update tb3 set line=(@mycnt := @mycnt + 1) where level>0 order by tempno;

update tb3 set tempno=line*10000000 where line>0; 
update tb3 a, tb3 b set a.level=3, a.tempno=b.tempno+a.temporder where a.level=0 and 
a.pid=b.id and b.level=2;
set @mycnt=0;
update tb3 set line=(@mycnt := @mycnt + 1) where level>0 order by tempno;

update tb3 set tempno=line*10000000 where line>0; 
update tb3 a, tb3 b set a.level=4, a.tempno=b.tempno+a.temporder where a.level=0 and 
a.pid=b.id and b.level=3;
set @mycnt=0;
update tb3 set line=(@mycnt := @mycnt + 1) where level>0 order by tempno;

以上算法利用了SQL的功能，將原來可能須要很是多SQL遞歸查詢的過程轉變成了有限次數(=樹最大深度)的SQL操做，爲了突出算法，以上示例假設只有一個根節點,刪除了groupid和endtag，實際使用中要完善一下這個細節, order by id也可改爲以其它字段排序。因時間關係我就不給出V2.0模式到Adjacency List模式逆推的算法了（也即pid爲空，根據V2.0表格倒過來給pid賦值的過程），不過這個算法倒不重要，由於一般v3.0表中每一行會一直保存着一個pid)。
總結一下：
Adjacency List模式:移/增/刪節點方便，查詢不方便
深度樹V2.0模式:查詢方便，增/刪節點方便，但存在效率問題，移動節點不方便
深度樹V3.0模式:移/增/刪節點方便，查詢方便，缺點是每次移/增/刪節點後要重建line和level值以供查詢用。它是結合了上兩種模式的合併體，並能夠根據側重，隨時在這兩種模式(修改模式和查詢模式)間切換。v3.0法至關於給Adjacency List模式設計了一個查詢索引。調試