轉載自:html
https://github.com/digoal/blog/blob/master/201708/20170818_02.md?spm=a2c4e.11153940.blogcont179210.17.6f682764HWr8pC&file=20170818_02.mdgit
Greenplum支持行存和列存,支持堆表和AO表,那麼他們有什麼不一樣,如何選擇呢?github
一、行存,以行爲形式組織存儲,一行是一個tuple,存在一塊兒。當須要讀取某列時,須要將這列前面的全部列都進行deform,因此訪問第一列和訪問最後一列的成本其實是不同的。sql
在這篇文檔中,有deform的詳細介紹。《PostgreSQL 向量化執行插件(瓦片式實現) 10x提速OLAP》數據庫
行存小結:架構
全表掃描要掃描更多的數據塊。app
壓縮比較低。dom
讀取任意列的成本不同,越靠後的列,成本越高。函數
不適合向量計算、JIT架構。(簡單來講,就是不適合批處理形式的計算)oop
須要REWRITE表時,須要對全表進行REWRITE,例如加字段有默認值。
二、列存,以列爲形式組織存儲,每列對應一個或一批文件。讀取任一列的成本是同樣的,可是若是要讀取多列,須要訪問多個文件,訪問的列越多,開銷越大。
列存小結:
壓縮比高。
僅僅支持AO存儲(後面會將)。
讀取任意列的成本是同樣的。
很是適合向量計算、JIT架構。對大批量數據的訪問和統計,效率更高。
讀取不少列時,因爲須要訪問更多的文件,成本更高。例如查詢明細。
須要REWRITE表時,不須要對全表操做,例如加字段有默認值,只是添加字段對應的那個文件。
若是OLTP的需求偏多,例如常常須要查詢表的明細(輸出不少列),須要更多的更新和刪除操做時。能夠考慮行存。
若是OLAP的需求偏多,常常須要對數據進行統計時,選擇列存。
須要比較高的壓縮比時,選擇列存。
若是用戶有混合需求,能夠採用分區表,例如按時間維度的需求分區,近期的數據明細查詢多,那就使用行存,對歷史的數據統計需求多那就使用列存。
一、堆表,實際上就是PG的堆存儲,堆表的全部變動都會產生REDO,能夠實現時間點恢復。可是堆表不能實現邏輯增量備份(由於表的任意一個數據塊都有可能變動,不方便經過堆存儲來記錄位點。)。
一個事務結束時,經過clog以及REDO來實現它的可靠性。同時支持經過REDO來構建MIRROR節點實現數據冗餘。
二、AO表,看名字就知道,只追加的存儲,刪除更新數據時,經過另外一個BITMAP文件來標記被刪除的行,經過bit以及偏移對齊來斷定AO表上的某一行是否被刪除。
事務結束時,須要調用FSYNC,記錄最後一次寫入對應的數據塊的偏移。(而且這個數據塊即便只有一條記錄,下次再發起事務又會從新追加一個數據塊)同時發送對應的數據塊給MIRROR實現數據冗餘。
所以AO表不適合小事務,由於每次事務結束都會FSYNC,同時事務結束後這個數據塊即便有空餘也不會被複用。(你能夠測試一下,AO表單條提交的IO放大很嚴重)。
雖然如此,AO表很是適合OLAP場景,批量的數據寫入,高壓縮比,邏輯備份支持增量備份,所以每次記錄備份到的偏移量便可。加上每次備份全量的BITMAP刪除標記(很小)。
當數據寫入時,小事務偏多時選擇堆表。
當須要時間點恢復時,選擇堆表。
當須要列存時,選擇AO表。
當數據批量寫入時,選擇AO表。
一、建立一個函數,用於建立400列的表(行存堆表、AO行存表、AO列存表)。
create or replace function f(name, int, text) returns void as $$ declare res text := ''; begin for i in 1..$2 loop res := res||'c'||i||' int8,'; end loop; res := rtrim(res, ','); if $3 = 'ao_col' then res := 'create table '||$1||'('||res||') with (appendonly=true, blocksize=8192, compresstype=none, orientation=column)'; elsif $3 = 'ao_row' then res := 'create table '||$1||'('||res||') with (appendonly=true, blocksize=8192, orientation=row)'; elsif $3 = 'heap_row' then res := 'create table '||$1||'('||res||') with (appendonly=false)'; else raise notice 'use ao_col, ao_row, heap_row as $3'; return; end if; execute res; end; $$ language plpgsql;
二、建立表以下
postgres=# select f('tbl_ao_col', 400, 'ao_col'); postgres=# select f('tbl_ao_row', 400, 'ao_row'); postgres=# select f('tbl_heap_row', 400, 'heap_row');
三、建立1個函數,用於填充數據,其中第一個和最後3個字段爲測試數據的字段,其餘都填充1。
create or replace function f_ins1(name, int, int8) returns void as $$ declare res text := ''; begin for i in 1..($2-4) loop res := res||'1,'; end loop; res := 'id,'||res; res := rtrim(res, ','); res := 'insert into '||$1||' select '||res||'id,random()*10000,random()*100000 from generate_series(1,'||$3||') t(id)'; execute res; end; $$ language plpgsql;
四、填充數據
postgres=# select f_ins1('tbl_ao_col',400,1000000);
五、建立1個函數,用於填充數據,其中前4個字段爲測試數據的字段,其餘都填充1。
create or replace function f_ins2(name, int, int8) returns void as $$ declare res text := ''; begin for i in 1..($2-4) loop res := res||'1,'; end loop; res := 'id,id,random()*10000,random()*100000,'||res; res := rtrim(res, ','); res := 'insert into '||$1||' select '||res||' from generate_series(1,'||$3||') t(id)'; execute res; end; $$ language plpgsql;
六、填充數據
postgres=# select f_ins1('tbl_ao_col',400,1000000); f_ins1 -------- (1 row) postgres=# insert into tbl_ao_row select * from tbl_ao_col; INSERT 0 1000000 postgres=# insert into tbl_heap_row select * from tbl_ao_col; INSERT 0 1000000
七、表分析
postgres=# analyze tbl_ao_col ; ANALYZE postgres=# analyze tbl_ao_row; ANALYZE postgres=# analyze tbl_heap_row; ANALYZE
八、表大小
postgres=# select pg_size_pretty(pg_relation_size('tbl_ao_col')); pg_size_pretty ---------------- 3060 MB (1 row) postgres=# select pg_size_pretty(pg_relation_size('tbl_ao_row')); pg_size_pretty ---------------- 3117 MB (1 row) postgres=# select pg_size_pretty(pg_relation_size('tbl_heap_row')); pg_size_pretty ---------------- 3473 MB (1 row)
九、行存堆表,前面幾個字段的統計
postgres=# explain analyze select c2,count(*),sum(c3),avg(c3),min(c3),max(c3) from tbl_heap_row group by c2; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------- Gather Motion 48:1 (slice2; segments: 48) (cost=136132.40..136132.42 rows=1 width=96) Rows out: 1 rows at destination with 135 ms to end, start offset by 1.922 ms. -> HashAggregate (cost=136132.40..136132.42 rows=1 width=96) Group By: tbl_heap_row.c2 Rows out: 1 rows (seg42) with 0.002 ms to first row, 36 ms to end, start offset by 48 ms. -> Redistribute Motion 48:48 (slice1; segments: 48) (cost=136132.35..136132.37 rows=1 width=96) Hash Key: tbl_heap_row.c2 Rows out: 48 rows at destination (seg42) with 53 ms to end, start offset by 48 ms. -> HashAggregate (cost=136132.35..136132.35 rows=1 width=96) Group By: tbl_heap_row.c2 Rows out: Avg 1.0 rows x 48 workers. Max 1 rows (seg0) with 0.008 ms to first row, 1.993 ms to end, start offset by 48 ms. -> Seq Scan on tbl_heap_row (cost=0.00..121134.54 rows=20831 width=16) Rows out: Avg 20833.3 rows x 48 workers. Max 20854 rows (seg42) with 40 ms to first row, 73 ms to end, start offset by 50 ms. Slice statistics: (slice0) Executor memory: 345K bytes. (slice1) Executor memory: 751K bytes avg x 48 workers, 751K bytes max (seg0). (slice2) Executor memory: 359K bytes avg x 48 workers, 374K bytes max (seg42). Statement statistics: Memory used: 128000K bytes Settings: optimizer=off Optimizer status: legacy query optimizer Total runtime: 138.524 ms (22 rows)
十、行存堆表,末尾幾個字段的統計
postgres=# explain analyze select c398,count(*),sum(c399),avg(c399),min(c399),max(c399) from tbl_heap_row group by c398; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------ Gather Motion 48:1 (slice2; segments: 48) (cost=136576.82..136799.05 rows=9877 width=96) Rows out: 10001 rows at destination with 212 ms to end, start offset by 1.917 ms. -> HashAggregate (cost=136576.82..136799.05 rows=206 width=96) Group By: tbl_heap_row.c398 Rows out: Avg 208.4 rows x 48 workers. Max 223 rows (seg17) with 0.001 ms to first row, 70 ms to end, start offset by 14 ms. -> Redistribute Motion 48:48 (slice1; segments: 48) (cost=136132.35..136329.89 rows=206 width=96) Hash Key: tbl_heap_row.c398 Rows out: Avg 8762.2 rows x 48 workers at destination. Max 9422 rows (seg46) with 93 ms to end, start offset by 48 ms. -> HashAggregate (cost=136132.35..136132.35 rows=206 width=96) Group By: tbl_heap_row.c398 Rows out: Avg 8762.2 rows x 48 workers. Max 8835 rows (seg2) with 0.003 ms to first row, 12 ms to end, start offset by 49 ms. -> Seq Scan on tbl_heap_row (cost=0.00..121134.54 rows=20831 width=16) Rows out: Avg 20833.3 rows x 48 workers. Max 20854 rows (seg42) with 40 ms to first row, 133 ms to end, start offset by 51 ms. Slice statistics: (slice0) Executor memory: 377K bytes. (slice1) Executor memory: 1156K bytes avg x 48 workers, 1156K bytes max (seg0). (slice2) Executor memory: 414K bytes avg x 48 workers, 414K bytes max (seg1). Statement statistics: Memory used: 128000K bytes Settings: optimizer=off Optimizer status: legacy query optimizer Total runtime: 214.024 ms (22 rows)
十一、行存AO表,前面幾個字段的統計
postgres=# explain analyze select c2,count(*),sum(c3),avg(c3),min(c3),max(c3) from tbl_ao_row group by c2; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------- Gather Motion 48:1 (slice2; segments: 48) (cost=124755.04..124755.07 rows=1 width=96) Rows out: 1 rows at destination with 149 ms to end, start offset by 1.890 ms. -> HashAggregate (cost=124755.04..124755.07 rows=1 width=96) Group By: tbl_ao_row.c2 Rows out: 1 rows (seg42) with 0.004 ms to first row, 55 ms to end, start offset by 64 ms. -> Redistribute Motion 48:48 (slice1; segments: 48) (cost=124755.00..124755.02 rows=1 width=96) Hash Key: tbl_ao_row.c2 Rows out: 48 rows at destination (seg42) with 32 ms to end, start offset by 64 ms. -> HashAggregate (cost=124755.00..124755.00 rows=1 width=96) Group By: tbl_ao_row.c2 Rows out: Avg 1.0 rows x 48 workers. Max 1 rows (seg0) with 0.001 ms to first row, 46 ms to end, start offset by 59 ms. -> Append-only Scan on tbl_ao_row (cost=0.00..109755.00 rows=20834 width=16) Rows out: Avg 20833.3 rows x 48 workers. Max 20854 rows (seg42) with 24 ms to end, start offset by 59 ms. Slice statistics: (slice0) Executor memory: 345K bytes. (slice1) Executor memory: 770K bytes avg x 48 workers, 770K bytes max (seg0). (slice2) Executor memory: 359K bytes avg x 48 workers, 374K bytes max (seg42). Statement statistics: Memory used: 128000K bytes Settings: optimizer=off Optimizer status: legacy query optimizer Total runtime: 152.386 ms (22 rows)
十二、行存AO表,末尾幾個字段的統計
postgres=# explain analyze select c398,count(*),sum(c399),avg(c399),min(c399),max(c399) from tbl_ao_row group by c398; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------- Gather Motion 48:1 (slice2; segments: 48) (cost=125186.01..125401.52 rows=9578 width=96) Rows out: 10001 rows at destination with 183 ms to end, start offset by 1.846 ms. -> HashAggregate (cost=125186.01..125401.52 rows=200 width=96) Group By: tbl_ao_row.c398 Rows out: Avg 208.4 rows x 48 workers. Max 223 rows (seg17) with 0.003 ms to first row, 97 ms to end, start offset by 22 ms. -> Redistribute Motion 48:48 (slice1; segments: 48) (cost=124755.00..124946.56 rows=200 width=96) Hash Key: tbl_ao_row.c398 Rows out: Avg 8762.2 rows x 48 workers at destination. Max 9422 rows (seg46) with 32 ms to end, start offset by 68 ms. -> HashAggregate (cost=124755.00..124755.00 rows=200 width=96) Group By: tbl_ao_row.c398 Rows out: Avg 8762.2 rows x 48 workers. Max 8835 rows (seg2) with 0.013 ms to first row, 48 ms to end, start offset by 22 ms. -> Append-only Scan on tbl_ao_row (cost=0.00..109755.00 rows=20834 width=16) Rows out: Avg 20833.3 rows x 48 workers. Max 20854 rows (seg42) with 22 ms to end, start offset by 71 ms. Slice statistics: (slice0) Executor memory: 377K bytes. (slice1) Executor memory: 1144K bytes avg x 48 workers, 1144K bytes max (seg0). (slice2) Executor memory: 414K bytes avg x 48 workers, 414K bytes max (seg0). Statement statistics: Memory used: 128000K bytes Settings: optimizer=off Optimizer status: legacy query optimizer Total runtime: 184.723 ms (22 rows)
1三、列存AO表,前面幾個字段的統計
postgres=# explain analyze select c2,count(*),sum(c3),avg(c3),min(c3),max(c3) from tbl_ao_col group by c2; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------- Gather Motion 48:1 (slice2; segments: 48) (cost=122928.04..122928.07 rows=1 width=96) Rows out: 1 rows at destination with 104 ms to end, start offset by 1.878 ms. -> HashAggregate (cost=122928.04..122928.07 rows=1 width=96) Group By: tbl_ao_col.c2 Rows out: 1 rows (seg42) with 0.003 ms to first row, 18 ms to end, start offset by 55 ms. -> Redistribute Motion 48:48 (slice1; segments: 48) (cost=122928.00..122928.02 rows=1 width=96) Hash Key: tbl_ao_col.c2 Rows out: 48 rows at destination (seg42) with 30 ms to end, start offset by 55 ms. -> HashAggregate (cost=122928.00..122928.00 rows=1 width=96) Group By: tbl_ao_col.c2 Rows out: Avg 1.0 rows x 48 workers. Max 1 rows (seg0) with 0.007 ms to first row, 3.991 ms to end, start offset by 54 ms. -> Append-only Columnar Scan on tbl_ao_col (cost=0.00..107928.00 rows=20834 width=16) Rows out: 0 rows (seg0) with 40 ms to end, start offset by 56 ms. Slice statistics: (slice0) Executor memory: 345K bytes. (slice1) Executor memory: 903K bytes avg x 48 workers, 903K bytes max (seg0). (slice2) Executor memory: 359K bytes avg x 48 workers, 374K bytes max (seg42). Statement statistics: Memory used: 128000K bytes Settings: optimizer=off Optimizer status: legacy query optimizer Total runtime: 106.859 ms (22 rows)
1四、列存AO表,末尾幾個字段的統計
postgres=# explain analyze select c398,count(*),sum(c399),avg(c399),min(c399),max(c399) from tbl_ao_col group by c398; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------- Gather Motion 48:1 (slice2; segments: 48) (cost=123364.18..123582.28 rows=9693 width=96) Rows out: 10001 rows at destination with 120 ms to end, start offset by 1.921 ms. -> HashAggregate (cost=123364.18..123582.28 rows=202 width=96) Group By: tbl_ao_col.c398 Rows out: Avg 208.4 rows x 48 workers. Max 223 rows (seg17) with 0.001 ms to first row, 54 ms to end, start offset by 35 ms. -> Redistribute Motion 48:48 (slice1; segments: 48) (cost=122928.00..123121.86 rows=202 width=96) Hash Key: tbl_ao_col.c398 Rows out: Avg 8762.2 rows x 48 workers at destination. Max 9422 rows (seg46) with 31 ms to end, start offset by 63 ms. -> HashAggregate (cost=122928.00..122928.00 rows=202 width=96) Group By: tbl_ao_col.c398 Rows out: Avg 8762.2 rows x 48 workers. Max 8835 rows (seg2) with 0.004 ms to first row, 8.004 ms to end, start offset by 82 ms. -> Append-only Columnar Scan on tbl_ao_col (cost=0.00..107928.00 rows=20834 width=16) Rows out: 0 rows (seg0) with 28 ms to end, start offset by 64 ms. Slice statistics: (slice0) Executor memory: 377K bytes. (slice1) Executor memory: 1272K bytes avg x 48 workers, 1272K bytes max (seg0). (slice2) Executor memory: 414K bytes avg x 48 workers, 414K bytes max (seg0). Statement statistics: Memory used: 128000K bytes Settings: optimizer=off Optimizer status: legacy query optimizer Total runtime: 122.173 ms (22 rows)
對於非分佈鍵的分組聚合請求,Greenplum採用了多階段聚合以下:
第一階段,在SEGMENT本地聚合。(須要掃描全部數據,這裏不一樣存儲,前面的列和後面的列的差異就體現出來了,行存儲的deform開銷,在對後面的列進行統計時性能影響很明顯。) Greenplum會根據字段的distinct值的比例,考慮是直接重分佈數據,仍是先在本地聚合後再重分佈數據(減小重分佈的數據量)。
第二階段,根據分組字段,將結果數據重分佈。(重分佈須要用到的字段,此時結果很小。)
第三階段,再次在SEGMENT本地聚合。(須要對重分佈後的數據進行聚合。)
第四階段,返回結果給master,有必要的話master節點調用聚合函數的final func(已是不多的記錄數和運算量)。
pg_class.relstorage表示這個對象是什麼存儲:
postgres=# select distinct relstorage from pg_class ; relstorage ------------ a -- 行存儲AO表 h -- heap堆表、索引 x -- 外部表(external table) v -- 視圖 c -- 列存儲AO表 (5 rows)
查詢當前數據庫有哪些AO表:
postgres=# select t2.nspname, t1.relname from pg_class t1, pg_namespace t2 where t1.relnamespace=t2.oid and relstorage in ('c', 'a'); nspname | relname ----------+------------------- postgres | tbl_tag postgres | tbl_pos_1_prt_p1 postgres | tbl_pos_1_prt_p2 postgres | tbl_pos_1_prt_p3 postgres | tbl_pos_1_prt_p4 postgres | tbl_pos_1_prt_p5 postgres | tbl_pos_1_prt_p6 postgres | tbl_pos_1_prt_p7 postgres | tbl_pos_1_prt_p8 postgres | tbl_pos_1_prt_p9 postgres | tbl_pos_1_prt_p10 postgres | tbl_pos postgres | xx_czrk_qm_col postgres | ao1 (14 rows)
查詢當前數據庫有哪些堆表:
select t2.nspname, t1.relname from pg_class t1, pg_namespace t2 where t1.relnamespace=t2.oid and relstorage in ('h') and relkind='r';