Deepgreen DB簡介(轉)

原文連接git

 
Deepgreen DB 全稱 Vitesse Deepgreen DB,它是一個可擴展的大規模並行(一般稱爲MPP)數據倉庫解決方案,起源於開源數據倉庫項目Greenplum DB(一般稱爲GP或GPDB)。因此已經熟悉了GP的朋友,能夠無縫切換到Deepgreen。
 
它幾乎擁有GP的全部功能,在保有GP全部優點的基礎上,Deepgreen對原查詢處理引擎進行了優化,新一代查詢處理引擎擴展了:
  • 優越的鏈接和聚合算法
  • 新的溢出處理子系統
  • 基於JIT的查詢優化、矢量掃描和數據路徑優化
下面簡單介紹一下Deepgreen的主要特性(主要與Greenplum對比):
 
1. 100% GPDB
Deepgreen與Greenplum幾乎100%一致,這裏說幾乎,是由於Deepgreen也剔除了一些Greenplum上的雞肋功能,例如MapReduce支持,能夠說保有的都是精華。從SQL語法、存儲過程語法,到數據存儲格式,再到像gpstart/gpfdist等組件,Deepgreen爲想要從Greenplum遷移過來的用戶將遷移影響降到最低。尤爲是在下面這些方面:
  • 除了以quicklz方式壓縮的數據須要修改外,其餘數據無需從新裝載
  • DML和DDL語句沒有任何改變
  • UDF(用戶定義函數)語法沒有任何改變
  • 存儲過程語法沒有任何改變
  • JDBC/ODBC等鏈接和受權協議沒有任何改變
  • 運行腳本沒有任何改變(例如備份腳本)
那麼Deepgreen和Greenplum的不一樣之處在哪呢?總結成一個詞就是:快!快!快!(重要的事情說三遍)。由於大部分的OLAP工做都與CPU的性能有關,因此針對CPU優化後的Deepgreen在性能測試中,能夠達到比原Greenplum快3~5倍的性能。
 
2.更快的Decimal類型
Deepgreen提供了兩個更精確的Decimal類型:Decimal64和Decimal128,它們比Greenplum原有的Decimal類型(Numeric)更有效。由於它們更精確,相比於fload/double類型,更適合用在銀行等對數據準確性要求高的業務場景。
安裝:
這兩個數據類型須要在數據庫初始化之後,經過命令加載到須要的數據庫中:
dgadmin@flash:~$ source deepgreendb/greenplum_path.sh
dgadmin@flash:~$ cd $GPHOME/share/postgresql/contrib/
dgadmin@flash:~/deepgreendb/share/postgresql/contrib$ psql postgres -f pg_decimal.sql
測試一把:
使用語句:select avg(x), sum(2*x) from table
數據量:100萬
dgadmin@flash:~$ psql -d postgres
psql (8.2.15)
Type "help" for help.

postgres=# drop table if exists tt;
NOTICE:  table "tt" does not exist, skipping
DROP TABLE
postgres=# create table tt(
postgres(# ii bigint,
postgres(#  f64 double precision,
postgres(# d64 decimal64,
postgres(# d128 decimal128,
postgres(# n numeric(15, 3))
postgres-# distributed randomly;
CREATE TABLE
postgres=# insert into tt
postgres-# select i,
postgres-#     i + 0.123,
postgres-#     (i + 0.123)::decimal64,
postgres-#     (i + 0.123)::decimal128,
postgres-#     i + 0.123
postgres-# from generate_series(1, 1000000) i;
INSERT 0 1000000
postgres=# \timing on
Timing is on.
postgres=# select count(*) from tt;
  count
---------
 1000000
(1 row)

Time: 161.500 ms
postgres=# set vitesse.enable=1;
SET
Time: 1.695 ms
postgres=# select avg(f64),sum(2*f64) from tt;
       avg        |       sum
------------------+------------------
 500000.622996815 | 1000001245993.63
(1 row)

Time: 45.368 ms
postgres=# select avg(d64),sum(2*d64) from tt;
    avg     |        sum
------------+-------------------
 500000.623 | 1000001246000.000
(1 row)

Time: 135.693 ms
postgres=# select avg(d128),sum(2*d128) from tt;
    avg     |        sum
------------+-------------------
 500000.623 | 1000001246000.000
(1 row)

Time: 148.286 ms
postgres=# set vitesse.enable=1;
SET
Time: 11.691 ms
postgres=# select avg(n),sum(2*n) from tt;
         avg         |        sum
---------------------+-------------------
 500000.623000000000 | 1000001246000.000
(1 row)

Time: 154.189 ms
postgres=# set vitesse.enable=0;
SET
Time: 1.426 ms
postgres=# select avg(n),sum(2*n) from tt;
         avg         |        sum
---------------------+-------------------
 500000.623000000000 | 1000001246000.000
(1 row)

Time: 296.291 ms
結果列表:
45ms - 64位float
136ms - decimal64
148ms - decimal128
154ms - deepgreen numeric
296ms - greenplum numeric
經過上面的測試,decimal64(136ms)類型比deepgreen numeric(154ms)類型快,比greenplum numeric快兩倍,生產環境中快5倍以上。
 
3.支持JSON
Deepgreen支持JSON類型,可是並不徹底支持。不支持的函數有:json_each,json_each_text,json_extract_path,json_extract_path_text, json_object_keys, json_populate_record, json_populate_recordset, json_array_elements, and json_agg.
安裝:
執行下面命令擴展json支持:
dgadmin@flash:~$ psql postgres -f $GPHOME/share/postgresql/contrib/json.sql
測試一把:
dgadmin@flash:~$ psql postgres
psql (8.2.15)
Type "help" for help.

postgres=# select '[1,2,3]'::json->2;
 ?column?
----------
 3
(1 row)

postgres=# create temp table mytab(i int, j json) distributed by (i);
CREATE TABLE
postgres=# insert into mytab values (1, null), (2, '[2,3,4]'), (3, '[3000,4000,5000]');
INSERT 0 3
postgres=#
postgres=# insert into mytab values (1, null), (2, '[2,3,4]'), (3, '[3000,4000,5000]');
INSERT 0 3
postgres=# select i, j->2 from mytab;
 i | ?column?
---+----------
 2 | 4
 2 | 4
 1 |
 3 | 5000
 1 |
 3 | 5000
(6 rows)
4.高效壓縮算法
Deepgreen延續了Greenplum的zlib壓縮算法用於存儲壓縮。除此以外,Deepgreen還提供兩種對數據庫負載更優的壓縮格式:zstd和lz4.
若是客戶在列存或者只追加堆表存儲時要求更優的壓縮比,請選擇zstd壓縮算法。相比於zlib,zstd有更好的壓縮比,而且能更有效利用CPU。
若是客戶有大量讀取需求,那麼能夠選擇lz4壓縮算法,由於它有着驚人的解壓速度。雖然在壓縮比上lz4並無zlib和zstd那麼出衆,可是爲了知足高讀取負載做出一些犧牲仍是值得的。
有關於這兩種壓縮算法的具體內容,詳見其主頁:
  • zstd主頁 http://facebook.github.io/zstd/
  • lz4主頁 http://lz4.github.io/lz4/
測試一把:
這裏只針對 不壓縮/zlib/zstd/lz4四種,進行簡單的測試,個人機器性能並不高,全部結果僅供參考:
postgres=# create temp table ttnone (
postgres(#     i int,
postgres(#     t text,
postgres(#     default column encoding (compresstype=none))
postgres-# with (appendonly=true, orientation=column)
postgres-# distributed by (i);
CREATE TABLE
postgres=# \timing on
Timing is on.
postgres=# create temp table ttzlib(
postgres(#     i int,
postgres(#     t text,
postgres(#     default column encoding (compresstype=zlib, compresslevel=1))
postgres-# with (appendonly=true, orientation=column)
postgres-# distributed by (i);
CREATE TABLE
Time: 762.596 ms
postgres=# create temp table ttzstd (
postgres(#     i int,
postgres(#     t text,
postgres(#     default column encoding (compresstype=zstd, compresslevel=1))
postgres-# with (appendonly=true, orientation=column)
postgres-# distributed by (i);
CREATE TABLE
Time: 827.033 ms
postgres=# create temp table ttlz4 (
postgres(#     i int,
postgres(#     t text,
postgres(#     default column encoding (compresstype=lz4))
postgres-# with (appendonly=true, orientation=column)
postgres-# distributed by (i);
CREATE TABLE
Time: 845.728 ms
postgres=# insert into ttnone select i, 'user '||i from generate_series(1, 100000000) i;
INSERT 0 100000000
Time: 104641.369 ms
postgres=# insert into ttzlib select i, 'user '||i from generate_series(1, 100000000) i;
INSERT 0 100000000
Time: 99557.505 ms
postgres=# insert into ttzstd select i, 'user '||i from generate_series(1, 100000000) i;
INSERT 0 100000000
Time: 98800.567 ms
postgres=# insert into ttlz4 select i, 'user '||i from generate_series(1, 100000000) i;
INSERT 0 100000000
Time: 96886.107 ms
postgres=# select pg_size_pretty(pg_relation_size('ttnone'));
 pg_size_pretty
----------------
 1708 MB
(1 row)

Time: 83.411 ms
postgres=# select pg_size_pretty(pg_relation_size('ttzlib'));
 pg_size_pretty
----------------
 374 MB
(1 row)

Time: 4.641 ms
postgres=# select pg_size_pretty(pg_relation_size('ttzstd'));
 pg_size_pretty
----------------
 325 MB
(1 row)

Time: 5.015 ms
postgres=# select pg_size_pretty(pg_relation_size('ttlz4'));
 pg_size_pretty
----------------
 785 MB
(1 row)

Time: 4.483 ms
postgres=# select sum(length(t)) from ttnone;
    sum
------------
 1288888898
(1 row)

Time: 4414.965 ms
postgres=# select sum(length(t)) from ttzlib;
    sum
------------
 1288888898
(1 row)

Time: 4500.671 ms
postgres=# select sum(length(t)) from ttzstd;
    sum
------------
 1288888898
(1 row)

Time: 3849.648 ms
postgres=# select sum(length(t)) from ttlz4;
    sum
------------
 1288888898
(1 row)

Time: 3160.477 ms
5.數據採樣
從Deepgreen 16.16版本開始,內建支持經過SQL進行數據真實採樣,您能夠經過定義行數或者定義採樣比兩種方式進行採樣:
  • SELECT {select-clauses} LIMIT SAMPLE {n} ROWS;
  • SELECT {select-clauses} LIMIT SAMPLE {n} PERCENT;
測試一把:
postgres=# select count(*) from ttlz4;
   count
-----------
 100000000
(1 row)

Time: 903.661 ms
postgres=# select * from ttlz4 limit sample 0.00001 percent;
    i     |       t
----------+---------------
  3442917 | user 3442917
  9182620 | user 9182620
  9665879 | user 9665879
 13791056 | user 13791056
 15669131 | user 15669131
 16234351 | user 16234351
 19592531 | user 19592531
 39097955 | user 39097955
 48822058 | user 48822058
 83021724 | user 83021724
  1342299 | user 1342299
 20309120 | user 20309120
 34448511 | user 34448511
 38060122 | user 38060122
 69084858 | user 69084858
 73307236 | user 73307236
 95421406 | user 95421406
(17 rows)

Time: 4208.847 ms
postgres=# select * from ttlz4 limit sample 10 rows;
    i     |       t
----------+---------------
 78259144 | user 78259144
 85551752 | user 85551752
 90848887 | user 90848887
 53923527 | user 53923527
 46524603 | user 46524603
 31635115 | user 31635115
 19030885 | user 19030885
 97877732 | user 97877732
 33238448 | user 33238448
 20916240 | user 20916240
(10 rows)

Time: 3578.031 ms

6.TPC-H性能
Deepgreen與Greenplum的性能對比,請參考我另外兩個帖子:
 
另外Deepgreen自身搭載的高性能組件Xdrive,在後期會另行分享~
 
End~
相關文章
相關標籤/搜索