GreenPlum簡單性能測試與分析--續

版權聲明:本文由黃輝原創文章,轉載請註明出處: 
文章原文連接:https://www.qcloud.com/community/article/259算法

來源:騰雲閣 https://www.qcloud.com/communitysql

 

以前對GreenPlum與Mysql進行了TPC-H類的對比測試,發現同等資源配比條件下,GreenPlum的性能遠好於Mysql,有部分緣由是得益於GreenPlum自己採用了更高效的算法,好比說作多表join時,採用的是hash join方式。若是採用一樣高效的算法,二者的性能又如何?因爲GreenPlum是由PostgreSQL演變而來,徹底採用了PostgreSQL的優化算法,此次,咱們將GreenPlum與PostgreSQL進行對比測試,在同等資源配比條件下,查看GreenPlum(分佈式PostgreSQL)和單機版PostgreSQL的性能表現。服務器

一.目的

  1. 比較在同等資源條件下具備分佈式屬性的GreenPlum與PostgreSQL在進行TPC-H類測試的性能區別。
  2. 分析和總結兩種DB形成性能區別的緣由。

二.測試環境與配置信息

測試環境:騰訊雲
測試對象:GreenPlum、PostgreSQL,二者的配置信息統計以下:架構

表1 GreenPlum集羣服務器分佈式

  Master Host Segment Host Segment Host
操做系統 CentOS 6.7 64位 CentOS 6.7 64位 CentOS 6.7 64位
CPU Intel(R) Xeon(R) CPU E5-26xx v3 2核 Intel(R) Xeon(R) CPU E5-26xx v3 2核 Intel(R) Xeon(R) CPU E5-26xx v3 2核
內存 8GB 8GB 8GB
公網帶寬 100Mbps 100Mbps 100Mbps
IP 123.207.228.40 123.207.228.21 123.207.85.105
Segment數量 0 2 2
版本 greenplum-db-4.3.8.1-build-1-RHEL5-x86_64 greenplum-db-4.3.8.1-build-1-RHEL5-x86_64 greenplum-db-4.3.8.1-build-1-RHEL5-x86_64

表2 PostgreSQL服務器工具

指標 參數
操做系統 CentOS 6.7 64位
cpu Intel(R) Xeon(R) CPU E5-26xx v3 8核
內存 24GB
公網帶寬 100Mbps
IP 119.29.229.209
版本 PostgreSQL 9.5.4

三.測試結果與分析

1.總測試數據量爲1G時
結果統計信息以下:性能

表3 總量爲1GB時各測試表數據量統計測試

表名稱 數據條數
customer 150000
lineitem 6001215
nation 25
orders 1500000
part 200000
partsupp 800000
region 5
supplier 10000

表4 總量爲1GB時22條sql執行時間統計大數據

執行的sql GeenPlum執行時間(單位:秒) PostgreSQL執行時間(單位:秒)
Q1 4.01 12.93
Q2 0.50 0.62
Q3 1.35 1.29
Q4 0.11 0.52
Q5 0.19 0.72
Q6 0.01 0.79
Q7 6.06 1.84
Q8 1.46 0.59
Q9 4.00 7.04
Q10 0.14 2.19
Q11 0.30 0.18
Q12 0.08 2.15
Q13 1.04 4.05
Q14 0.04 0.42
Q15 0.07 1.66
Q16 0.51 0.80
Q17 3.21 23.07
Q18 14.23 5.86
Q19 0.95 0.17
Q20 0.16 3.10
Q21 7.23 2.22
Q22 0.96 0.28

分析:從以上的表4能夠看出,PostgreSQL在22條sql中有8條sql的執行時間比GreenPlum少,接近一半的比例,咱們直接放大10倍的測試數據量進行下一步測試。優化

2.總測試數據量爲10G時
結果統計以下:

表5 總量爲10GB時各測試表數據量統計

表名稱 數據條數
customer 1500000
lineitem 59986052
nation 25
orders 15000000
part 2000000
partsupp 8000000
region 5
supplier 100000

表6 總量爲10GB時22條sql執行時間統計

執行的sql GeenPlum執行時間(單位:秒) PostgreSQL執行時間(單位:秒)
Q1 36.98 130.61
Q2 3.10 17.08
Q3 14.39 117.83
Q4 0.11 6.81
Q5 0.20 114.46
Q6 0.01 11.08
Q7 80.12 42.96
Q8 6.61 45.13
Q9 49.72 118.36
Q10 0.16 40.51
Q11 2.28 3.06
Q12 0.08 21.47
Q13 19.29 68.83
Q14 0.05 36.28
Q15 0.09 23.16
Q16 6.30 12.77
Q17 134.22 127.79
Q18 168.03 199.48
Q19 6.25 1.96
Q20 0.54 52.10
Q21 84.68 190.59
Q22 17.93 2.98

分析:放大數據量到10G後能夠明顯看出,PostgreSQL執行測試sql的時間大幅度增多,性能降低比較厲害,但仍有3條測試sql快於GreenPlum,咱們選取其中一條對比查看下二者的性能區別緣由。
這裏咱們以Q7爲例,Greenplum的執行時間大約是PostgreSQL的兩倍,Q7以下:

圖1 Q7表示的sql語句

在PostgreSQL上執行explain Q7,獲得結果以下:

圖2 數據量爲10G時PostgreSQL上執行explain Q7的結果

對執行進行分析,能夠看出,整個過程最耗時的部分如上圖紅色框部分標識,對應的條件查詢操做分別是:
1).在lineitem表上對l_shipdata字段按條件查詢,由於在字段有索引,採用了高效的Bitmap索引查詢(Bitmap索引查詢分兩步:1.建位圖;2.掃表。詳細瞭解可看http://kb.cnblogs.com/page/515258/ )。
2).lineitem和orders表hash join操做。
爲了方便進一步分析,咱們加上analyze參數,獲取詳細的執行時間,因爲內容過多,這裏只截取部分重要信息以下:

圖3 數據量爲10G時PostgreSQL上執行explain analyze Q7的部分結果

根據以上信息,咱們能夠得出這兩部分操做的具體執行時間,但因爲PostgreSQL採起多任務並行,所以,咱們須要對每步操做計算出一個滯留時間(該時間段內系統只執行該步操做),縮短滯留時間可直接提高執行速度,每步的滯留時間爲前步的結束時間與該步結束時間之差。兩部分的滯留時間分別爲:

1).Bitmap Heap Scan:20197-2233=17964ms
2).Hash join:42889-26200=16689ms

PostgreSQL執行Q7的總時間爲42963ms,所以,能夠印證系統的耗時主要集中在上述兩步操做上。
接下來,咱們在GreenPlum上執行explain Q7,結果以下:


圖4 數據量爲10G時GreenPlum上執行explain Q7的結果

與PostgreSQL不一樣的是,GreenPlum的耗時多了數據重分佈部分。一樣,咱們經過analyze參數獲得詳細的執行時間以下:

圖5 數據量爲10G時GreenPlum上執行explain analyze Q7的部分結果

根據執行計劃信息,選出耗時最長的三步操做,計算出在一個segment(耗時最長的)上這三部分的滯留時間爲:
1).Scan lineitem: 6216ms
2).Redistribute: 36273ms
3).Hash join: 29885ms

GreenPlum執行Q7的總時間爲80121ms,可見數據重分佈的時間佔據了整個執行時間的一半,進行Hash join操做的時間佔比也較多,主要是segment的內存不足,引發了磁盤的IO。

小結:對比PostgreSQL和GreenPlum在Q7的執行計劃,GreenPlum的耗時較多的緣由主要是數據重分佈的大量時間消耗和hash join時超出內存引發磁盤IO。雖然GreenPlum各segment並行掃lineitem表節省了時間,但佔比較小,對總時間的消耗影響較小。

基於此,是否能夠減小數據重分佈操做的耗時佔比?咱們嘗試進一步增長測試的數據量,比較10G的測試數據對於真實的OLAP場景仍是過少,擴大5倍的測試量,繼續查看耗時狀況是否有所改變。

3. 總測試數據量爲50G時
表7 總量爲50GB時各測試表數據量統計

表名稱 數據條數
customer 7500000
lineitem 300005811
nation 25
orders 75000000
part 10000000
partsupp 40000000
region 5
supplier 500000

表8 總量爲50GB時22條sql執行時間統計

執行的sql GeenPlum執行時間(單位:秒) PostgreSQL執行時間(單位:秒)
Q1 212.27 802.24
Q2 16.53 164.20
Q3 156.31 2142.18
Q4 0.13 2934.76
Q5 0.23 2322.92
Q6 0.01 6439.26
Q7 535.66 11906.74
Q8 76.76 9171.83
Q9 313.91 >26060.36
Q10 0.41 1905.13
Q11 7.71 17.65
Q12 0.19 >3948.07
Q13 108.05 354.59
Q14 0.05 8054.72
Q15 0.07 >2036.03
Q16 34.74 221.49
Q17 862.90 >9010.56
Q18 913.97 3174.24
Q19 129.14 8666.38
Q20 2.28 9389.21
Q21 1064.67 >26868.31
Q22 90.90 1066.44

分析:從結果表可明顯看出,在22條SQL中,GreenPlum的執行效率都比PostgreSQL高出不少,咱們仍是以Q7爲例,查看兩種數據量下執行效率不一致的直接緣由。

通過對執行計劃的分析,發現區別仍是集中在步驟2提到的幾個部分,這裏就再也不重複給出總體的查詢計劃,直接查看耗時較多的部分以下:


圖6 數據量爲50G時PostgreSQL上執行explain analyze Q7的部分結果

圖7 數據量爲50G時GreenPlum上執行explain analyze Q7的部分結果

PostgreSQL的主要滯留時間有:
1).Bitmap Heap Scan: 9290197ms
2).Hash join: 713138ms

總執行時間爲10219009ms,可見主要的耗時集中在Bitmap Heap Scan上,
GreenPlum的主要滯留時間有:
1).Scan lineitem: 130397ms
2).Redistribute: 140685ms
3).Hash join: 211456ms

總的執行時間爲537134ms,相比步驟2的10G測試數據量,數據重分佈的耗時佔比明顯降低,主要耗時已集中在hash join操做上。

GreenPlum和PostgreSQL在執行一樣的wheret條件時,掃表的方式不同,緣由在於GreenPlum裏的lineitem表爲列存儲,直接掃表更方便更快。

對比PostgreSQL兩次的測試結果,發現Bitmao Heap Scan操做的性能降低比較明顯,第一次掃18188314 行用時17秒,而第二次掃90522811行用時9190秒。

小結:增大數據量,會減小數據重分佈耗時對總體執行時間的影響比重,主要耗時集中在內部數據的計算上。因爲掃表涉及到磁盤IO,GreenPlum將掃表任務分割給多個segment同時進行,減小了單個節點要執行的掃表量,至關於並行IO操做,對總體的性能提高較大。

四.總結

經過對不一樣數據量(1G,10G,50G)的測試對比以及分析,能夠看出,在TPC-H類的測試時,數據量越大,GreenPlum性能越好於單機版的PostgreSQL。因爲GreenPlum採用分佈式架構,爲了實現各節點並行計算能力,須要在節點間進行廣播或者數據重分佈,對總體的性能有必定影響,當數據量較小時,計算量小,廣播或者重分佈耗時佔總耗時比例大,影響總體的執行效率,可能會出現GreenPlum不如單機版PostgreSQL效率高;當數據量較大時,總體計算的量很大,廣播或者重分佈耗時再也不是影響性能的關鍵因素,分佈式屬性的GreenPlum在關於複雜語句執行查詢效率較高,緣由在於,一是多節點同時進行計算(hash join、sort等),提高計算速度,且能夠充分利用系統CPU資源;二是掃表時,將任務分派到多節點,減小了單個節點的IO次數,達到並行IO的目的,更適用於OLAP場景。

五.其餘事項

  1. 因爲原生的TPC-H的測試用例不直接支持GreenPlum和PostgreSQL,所以須要修改測試腳本,生成新的建表語句如《附錄一》所示,測試sql如《附錄二》。

  2. GreenPlum的數據導入可使用GreenPlum自帶的gpfdist工具,搭建多個gpfdsit文件服務器並行導入,但文件服務器的數量不能多於segment數量,這點官方文檔並未說明。

附錄一:建表語句

GreenPlum:
BEGIN;
        CREATE TABLE PART (
                P_PARTKEY               SERIAL8,
                P_NAME                  VARCHAR(55),
                P_MFGR                  CHAR(25),
                P_BRAND                 CHAR(10),
                P_TYPE                  VARCHAR(25),
                P_SIZE                  INTEGER,
                P_CONTAINER             CHAR(10),
                P_RETAILPRICE   DECIMAL,
                P_COMMENT               VARCHAR(23)
        ) with (APPENDONLY=true,BLOCKSIZE=2097152,ORIENTATION=COLUMN,CHECKSUM=true,OIDS=false) DISTRIBUTED BY (p_partkey);

        COPY part FROM '/tmp/dss-data/part.csv' WITH csv DELIMITER '|';

COMMIT;

BEGIN;

        CREATE TABLE REGION (
                R_REGIONKEY     SERIAL8,
                R_NAME          CHAR(25),
                R_COMMENT       VARCHAR(152)
        )  with (APPENDONLY=true,BLOCKSIZE=2097152,ORIENTATION=COLUMN,CHECKSUM=true,OIDS=false) DISTRIBUTED BY (r_regionkey);

        COPY region FROM '/tmp/dss-data/region.csv' WITH csv DELIMITER '|';

COMMIT;

BEGIN;

        CREATE TABLE NATION (
                N_NATIONKEY             SERIAL8,
                N_NAME                  CHAR(25),
                N_REGIONKEY             BIGINT NOT NULL,  -- references R_REGIONKEY
                N_COMMENT               VARCHAR(152)
        )  with (APPENDONLY=true,BLOCKSIZE=2097152,ORIENTATION=COLUMN,CHECKSUM=true,OIDS=false) DISTRIBUTED BY (n_nationkey);

        COPY nation FROM '/tmp/dss-data/nation.csv' WITH csv DELIMITER '|';

COMMIT;

BEGIN;

        CREATE TABLE SUPPLIER (
                S_SUPPKEY               SERIAL8,
                S_NAME                  CHAR(25),
                S_ADDRESS               VARCHAR(40),
                S_NATIONKEY             BIGINT NOT NULL, -- references N_NATIONKEY
                S_PHONE                 CHAR(15),
                S_ACCTBAL               DECIMAL,
                S_COMMENT               VARCHAR(101)
        )  with (APPENDONLY=true,BLOCKSIZE=2097152,ORIENTATION=COLUMN,CHECKSUM=true,OIDS=false) DISTRIBUTED BY (s_suppkey);

        COPY supplier FROM '/tmp/dss-data/supplier.csv' WITH csv DELIMITER '|';

COMMIT;

BEGIN;

        CREATE TABLE CUSTOMER (
                C_CUSTKEY               SERIAL8,
                C_NAME                  VARCHAR(25),
                C_ADDRESS               VARCHAR(40),
                C_NATIONKEY             BIGINT NOT NULL, -- references N_NATIONKEY
                C_PHONE                 CHAR(15),
                C_ACCTBAL               DECIMAL,
                C_MKTSEGMENT    CHAR(10),
                C_COMMENT               VARCHAR(117)
        )  with (APPENDONLY=true,BLOCKSIZE=2097152,ORIENTATION=COLUMN,CHECKSUM=true,OIDS=false) DISTRIBUTED BY (c_custkey);

        COPY customer FROM '/tmp/dss-data/customer.csv' WITH csv DELIMITER '|';

COMMIT;

BEGIN;

        CREATE TABLE PARTSUPP (
                PS_PARTKEY              BIGINT NOT NULL, -- references P_PARTKEY
                PS_SUPPKEY              BIGINT NOT NULL, -- references S_SUPPKEY
                PS_AVAILQTY             INTEGER,
                PS_SUPPLYCOST   DECIMAL,
                PS_COMMENT              VARCHAR(199)
        )  with (APPENDONLY=true,BLOCKSIZE=2097152,ORIENTATION=COLUMN,CHECKSUM=true,OIDS=false) DISTRIBUTED BY (ps_partkey,ps_suppkey);

        COPY partsupp FROM '/tmp/dss-data/partsupp.csv' WITH csv DELIMITER '|';

COMMIT;

BEGIN;

        CREATE TABLE ORDERS (
                O_ORDERKEY              SERIAL8,
                O_CUSTKEY               BIGINT NOT NULL, -- references C_CUSTKEY
                O_ORDERSTATUS   CHAR(1),
                O_TOTALPRICE    DECIMAL,
                O_ORDERDATE             DATE,
                O_ORDERPRIORITY CHAR(15),
                O_CLERK                 CHAR(15),
                O_SHIPPRIORITY  INTEGER,
                O_COMMENT               VARCHAR(79)
        )  with (APPENDONLY=true,BLOCKSIZE=2097152,ORIENTATION=COLUMN,CHECKSUM=true,OIDS=false) DISTRIBUTED BY (o_orderkey);

        COPY orders FROM '/tmp/dss-data/orders.csv' WITH csv DELIMITER '|';

COMMIT;

BEGIN;

        CREATE TABLE LINEITEM (
                L_ORDERKEY              BIGINT NOT NULL, -- references O_ORDERKEY
                L_PARTKEY               BIGINT NOT NULL, -- references P_PARTKEY (compound fk to PARTSUPP)
                L_SUPPKEY               BIGINT NOT NULL, -- references S_SUPPKEY (compound fk to PARTSUPP)
                L_LINENUMBER    INTEGER,
                L_QUANTITY              DECIMAL,
                L_EXTENDEDPRICE DECIMAL,
                L_DISCOUNT              DECIMAL,
                L_TAX                   DECIMAL,
                L_RETURNFLAG    CHAR(1),
                L_LINESTATUS    CHAR(1),
                L_SHIPDATE              DATE,
                L_COMMITDATE    DATE,
                L_RECEIPTDATE   DATE,
                L_SHIPINSTRUCT  CHAR(25),
                L_SHIPMODE              CHAR(10),
                L_COMMENT               VARCHAR(44)
        )  with (APPENDONLY=true,BLOCKSIZE=2097152,ORIENTATION=COLUMN,CHECKSUM=true,OIDS=false) DISTRIBUTED BY (l_orderkey, l_linenumber);

        COPY lineitem FROM '/tmp/dss-data/lineitem.csv' WITH csv DELIMITER '|';
COMMIT;

PostgreSQL:
BEGIN;

        CREATE TABLE PART (

                P_PARTKEY               SERIAL,
                P_NAME                  VARCHAR(55),
                P_MFGR                  CHAR(25),
                P_BRAND                 CHAR(10),
                P_TYPE                  VARCHAR(25),
                P_SIZE                  INTEGER,
                P_CONTAINER             CHAR(10),
                P_RETAILPRICE   DECIMAL,
                P_COMMENT               VARCHAR(23)
        );

        COPY part FROM '/tmp/dss-data-copy/part.csv' WITH csv DELIMITER '|';

COMMIT;

BEGIN;

        CREATE TABLE REGION (
                R_REGIONKEY     SERIAL,
                R_NAME          CHAR(25),
                R_COMMENT       VARCHAR(152)
        );

        COPY region FROM '/tmp/dss-data-copy/region.csv' WITH (FORMAT csv, DELIMITER '|');

COMMIT;

BEGIN;

        CREATE TABLE NATION (
                N_NATIONKEY             SERIAL,
                N_NAME                  CHAR(25),
                N_REGIONKEY             BIGINT NOT NULL,  -- references R_REGIONKEY
                N_COMMENT               VARCHAR(152)
        );

        COPY nation FROM '/tmp/dss-data-copy/nation.csv' WITH (FORMAT csv, DELIMITER '|');

COMMIT;

BEGIN;

        CREATE TABLE SUPPLIER (
                S_SUPPKEY               SERIAL,
                S_NAME                  CHAR(25),
                S_ADDRESS               VARCHAR(40),
                S_NATIONKEY             BIGINT NOT NULL, -- references N_NATIONKEY
                S_PHONE                 CHAR(15),
                S_ACCTBAL               DECIMAL,
                S_COMMENT               VARCHAR(101)
        );

        COPY supplier FROM '/tmp/dss-data-copy/supplier.csv' WITH (FORMAT csv, DELIMITER '|');

COMMIT;

BEGIN;

        CREATE TABLE CUSTOMER (
                C_CUSTKEY               SERIAL,
                C_NAME                  VARCHAR(25),
                C_ADDRESS               VARCHAR(40),
                C_NATIONKEY             BIGINT NOT NULL, -- references N_NATIONKEY
                C_PHONE                 CHAR(15),
                C_ACCTBAL               DECIMAL,
                C_MKTSEGMENT    CHAR(10),
                C_COMMENT               VARCHAR(117)
        );

        COPY customer FROM '/tmp/dss-data-copy/customer.csv' WITH (FORMAT csv, DELIMITER '|');

COMMIT;

BEGIN;

        CREATE TABLE PARTSUPP (
                PS_PARTKEY              BIGINT NOT NULL, -- references P_PARTKEY
                PS_SUPPKEY              BIGINT NOT NULL, -- references S_SUPPKEY
                PS_AVAILQTY             INTEGER,
                PS_SUPPLYCOST   DECIMAL,
                PS_COMMENT              VARCHAR(199)
        );

        COPY partsupp FROM '/tmp/dss-data-copy/partsupp.csv' WITH (FORMAT csv, DELIMITER '|');

COMMIT;

BEGIN;

        CREATE TABLE ORDERS (
                O_ORDERKEY              SERIAL,
                O_CUSTKEY               BIGINT NOT NULL, -- references C_CUSTKEY
                O_ORDERSTATUS   CHAR(1),
                O_TOTALPRICE    DECIMAL,
                O_ORDERDATE             DATE,
                O_ORDERPRIORITY CHAR(15),
                O_CLERK                 CHAR(15),
                O_SHIPPRIORITY  INTEGER,
                O_COMMENT               VARCHAR(79)
        );

        COPY orders FROM '/tmp/dss-data-copy/orders.csv' WITH (FORMAT csv, DELIMITER '|');

COMMIT;

BEGIN;

        CREATE TABLE LINEITEM (
                L_ORDERKEY              BIGINT NOT NULL, -- references O_ORDERKEY
                L_PARTKEY               BIGINT NOT NULL, -- references P_PARTKEY (compound fk to PARTSUPP)
                L_SUPPKEY               BIGINT NOT NULL, -- references S_SUPPKEY (compound fk to PARTSUPP)
                L_LINENUMBER    INTEGER,
                L_QUANTITY              DECIMAL,
                L_EXTENDEDPRICE DECIMAL,
                L_DISCOUNT              DECIMAL,
                L_TAX                   DECIMAL,
                L_RETURNFLAG    CHAR(1),
                L_LINESTATUS    CHAR(1),
                L_SHIPDATE              DATE,
                L_COMMITDATE    DATE,
                L_RECEIPTDATE   DATE,
                L_SHIPINSTRUCT  CHAR(25),
                L_SHIPMODE              CHAR(10),
                L_COMMENT               VARCHAR(44)
        );

        COPY lineitem FROM '/tmp/dss-data-copy/lineitem.csv' WITH (FORMAT csv, DELIMITER '|');

COMMIT;

附錄二:查詢語句

Q1:
-- using 1471398061 as a seed to the RNG
select
        l_returnflag,
        l_linestatus,
        sum(l_quantity) as sum_qty,
        sum(l_extendedprice) as sum_base_price,
        sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
        sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
        avg(l_quantity) as avg_qty,
        avg(l_extendedprice) as avg_price,
        avg(l_discount) as avg_disc,
        count(*) as count_order
from
        lineitem
where
        l_shipdate <= date '1998-12-01' - interval '85' day
group by
        l_returnflag,
        l_linestatus
order by
        l_returnflag,
        l_linestatus
LIMIT 1;

Q2:
-- using 1471398061 as a seed to the RNG


select
        s_acctbal,
        s_name,
        n_name,
        p_partkey,
        p_mfgr,
        s_address,
        s_phone,
        s_comment
from
        part,
        supplier,
        partsupp,
        nation,
        region
where
        p_partkey = ps_partkey
        and s_suppkey = ps_suppkey
        and p_size = 48
        and p_type like '%STEEL'
        and s_nationkey = n_nationkey
        and n_regionkey = r_regionkey
        and r_name = 'AFRICA'
        and ps_supplycost = (
                select
                        min(ps_supplycost)
                from
                        partsupp,
                        supplier,
                        nation,
                        region
                where
                        p_partkey = ps_partkey
                        and s_suppkey = ps_suppkey
                        and s_nationkey = n_nationkey
                        and n_regionkey = r_regionkey
                        and r_name = 'AFRICA'
        )
order by
        s_acctbal desc,
        n_name,
        s_name,
        p_partkey
LIMIT 100;

Q3:
-- using 1471398061 as a seed to the RNG


select
        l_orderkey,
        sum(l_extendedprice * (1 - l_discount)) as revenue,
        o_orderdate,
        o_shippriority
from
        customer,
        orders,
        lineitem
where
        c_mktsegment = 'HOUSEHOLD'
        and c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and o_orderdate < date '1995-03-03'
        and l_shipdate > date '1995-03-03'
group by
        l_orderkey,
        o_orderdate,
        o_shippriority
order by
        revenue desc,
        o_orderdate
LIMIT 10;

Q4:
-- using 1471398061 as a seed to the RNG

select
        o_orderpriority,
        count(*) as order_count
from
        orders
where
        o_orderdate >= date '1993-06-01'
        and o_orderdate < date '1993-06-01' + interval '3' month
        and exists (
                select
                        *
                from
                        lineitem
                where
                        l_orderkey = o_orderkey
                        and l_commitdate < l_receiptdate
        )
group by
        o_orderpriority
order by
        o_orderpriority
LIMIT 1;

Q5:
-- using 1471398061 as a seed to the RNG


select
        n_name,
        sum(l_extendedprice * (1 - l_discount)) as revenue
from
        customer,
        orders,
        lineitem,
        supplier,
        nation,
        region
where
        c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and l_suppkey = s_suppkey
        and c_nationkey = s_nationkey
        and s_nationkey = n_nationkey
        and n_regionkey = r_regionkey
        and r_name = 'AMERICA'
        and o_orderdate >= date '1993-01-01'
        and o_orderdate < date '1993-01-01' + interval '1' year
group by
        n_name
order by
        revenue desc
LIMIT 1;

Q6:
-- using 1471398061 as a seed to the RNG


select
        sum(l_extendedprice * l_discount) as revenue
from
        lineitem
where
        l_shipdate >= date '1993-01-01'
        and l_shipdate < date '1993-01-01' + interval '1' year
        and l_discount between 0.02 - 0.01 and 0.02 + 0.01
        and l_quantity < 24
LIMIT 1;

Q7:
-- using 1471398061 as a seed to the RNG


select
        supp_nation,
        cust_nation,
        l_year,
        sum(volume) as revenue
from
        (
                select
                        n1.n_name as supp_nation,
                        n2.n_name as cust_nation,
                        extract(year from l_shipdate) as l_year,
                        l_extendedprice * (1 - l_discount) as volume
                from
                        supplier,
                        lineitem,
                        orders,
                        customer,
                        nation n1,
                        nation n2
                where
                        s_suppkey = l_suppkey
                        and o_orderkey = l_orderkey
                        and c_custkey = o_custkey
                        and s_nationkey = n1.n_nationkey
                        and c_nationkey = n2.n_nationkey
                        and (
                                (n1.n_name = 'BRAZIL' and n2.n_name = 'INDONESIA')
                                or (n1.n_name = 'INDONESIA' and n2.n_name = 'BRAZIL')
                        )
                        and l_shipdate between date '1995-01-01' and date '1996-12-31'
        ) as shipping
group by
        supp_nation,
        cust_nation,
        l_year
order by
        supp_nation,
        cust_nation,
        l_year
LIMIT 1;

Q8:
-- using 1471398061 as a seed to the RNG


select
        o_year,
        sum(case
                when nation = 'INDONESIA' then volume
                else 0
        end) / sum(volume) as mkt_share
from
        (
                select
                        extract(year from o_orderdate) as o_year,
                        l_extendedprice * (1 - l_discount) as volume,
                        n2.n_name as nation
                from
                        part,
                        supplier,
                        lineitem,
                        orders,
                        customer,
                        nation n1,
                        nation n2,
                        region
                where
                        p_partkey = l_partkey
                        and s_suppkey = l_suppkey
                        and l_orderkey = o_orderkey
                        and o_custkey = c_custkey
                        and c_nationkey = n1.n_nationkey
                        and n1.n_regionkey = r_regionkey
                        and r_name = 'ASIA'
                        and s_nationkey = n2.n_nationkey
                        and o_orderdate between date '1995-01-01' and date '1996-12-31'
                        and p_type = 'ECONOMY BURNISHED BRASS'
        ) as all_nations
group by
        o_year
order by
        o_year
LIMIT 1;

Q9:
-- using 1471398061 as a seed to the RNG


select
        nation,
        o_year,
        sum(amount) as sum_profit
from
        (
                select
                        n_name as nation,
                        extract(year from o_orderdate) as o_year,
                        l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
                from
                        part,
                        supplier,
                        lineitem,
                        partsupp,
                        orders,
                        nation
                where
                        s_suppkey = l_suppkey
                        and ps_suppkey = l_suppkey
                        and ps_partkey = l_partkey
                        and p_partkey = l_partkey
                        and o_orderkey = l_orderkey
                        and s_nationkey = n_nationkey
                        and p_name like '%powder%'
        ) as profit
group by
        nation,
        o_year
order by
        nation,
        o_year desc
LIMIT 1;-- using 1471398061 as a seed to the RNG

Q10
select
        c_custkey,
        c_name,
        sum(l_extendedprice * (1 - l_discount)) as revenue,
        c_acctbal,
        n_name,
        c_address,
        c_phone,
        c_comment
from
        customer,
        orders,
        lineitem,
        nation
where
        c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and o_orderdate >= date '1993-06-01'
        and o_orderdate < date '1993-06-01' + interval '3' month
        and l_returnflag = 'R'
        and c_nationkey = n_nationkey
group by
        c_custkey,
        c_name,
        c_acctbal,
        c_phone,
        n_name,
        c_address,
        c_comment
order by
        revenue desc
LIMIT 20;

Q11
-- using 1471398061 as a seed to the RNG


select
        ps_partkey,
        sum(ps_supplycost * ps_availqty) as value
from
        partsupp,
        supplier,
        nation
where
        ps_suppkey = s_suppkey
        and s_nationkey = n_nationkey
        and n_name = 'PERU'
group by
        ps_partkey having
                sum(ps_supplycost * ps_availqty) > (
                        select
                                sum(ps_supplycost * ps_availqty) * 0.0001000000
                        from
                                partsupp,
                                supplier,
                                nation
                        where
                                ps_suppkey = s_suppkey
                                and s_nationkey = n_nationkey
                                and n_name = 'PERU'
                )
order by
        value desc
LIMIT 1;-- using 1471398061 as a seed to the RNG

Q12
select
        l_shipmode,
        sum(case
                when o_orderpriority = '1-URGENT'
                        or o_orderpriority = '2-HIGH'
                        then 1
                else 0
        end) as high_line_count,
        sum(case
                when o_orderpriority <> '1-URGENT'
                        and o_orderpriority <> '2-HIGH'
                        then 1
                else 0
        end) as low_line_count
from
        orders,
        lineitem
where
        o_orderkey = l_orderkey
        and l_shipmode in ('REG AIR', 'RAIL')
        and l_commitdate < l_receiptdate
        and l_shipdate < l_commitdate
        and l_receiptdate >= date '1993-01-01'
        and l_receiptdate < date '1993-01-01' + interval '1' year
group by
        l_shipmode
order by
        l_shipmode
LIMIT 1;-- using 1471398061 as a seed to the RNG

Q13
select
        c_count,
        count(*) as custdist
from
        (
                select
                        c_custkey,
                        count(o_orderkey)
                from
                        customer left outer join orders on
                                c_custkey = o_custkey
                                and o_comment not like '%pending%packages%'
                group by
                        c_custkey
        ) as c_orders (c_custkey, c_count)
group by
        c_count
order by
        custdist desc,
        c_count desc

Q14
LIMIT 1;-- using 1471398061 as a seed to the RNG


select
        100.00 * sum(case
                when p_type like 'PROMO%'
                        then l_extendedprice * (1 - l_discount)
                else 0
        end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue
from
        lineitem,
        part
where
        l_partkey = p_partkey
        and l_shipdate >= date '1993-09-01'
        and l_shipdate < date '1993-09-01' + interval '1' month
LIMIT 1;

Q15
-- using 1471398061 as a seed to the RNG

create view revenue0 (supplier_no, total_revenue) as
        select
                l_suppkey,
                sum(l_extendedprice * (1 - l_discount))
        from
                lineitem
        where
                l_shipdate >= date '1994-11-01'
                and l_shipdate < date '1994-11-01' + interval '3' month
        group by
                l_suppkey;


select
        s_suppkey,
        s_name,
        s_address,
        s_phone,
        total_revenue
from
        supplier,
        revenue0
where
        s_suppkey = supplier_no
        and total_revenue = (
                select
                        max(total_revenue)
                from
                        revenue0
        )
order by
        s_suppkey
LIMIT 1;

Q16
drop view revenue0;-- using 1471398061 as a seed to the RNG


select
        p_brand,
        p_type,
        p_size,
        count(distinct ps_suppkey) as supplier_cnt
from
        partsupp,
        part
where
        p_partkey = ps_partkey
        and p_brand <> 'Brand#22'
        and p_type not like 'STANDARD PLATED%'
        and p_size in (34, 17, 18, 16, 15, 49, 1, 48)
        and ps_suppkey not in (
                select
                        s_suppkey
                from
                        supplier
                where
                        s_comment like '%Customer%Complaints%'
        )
group by
        p_brand,
        p_type,
        p_size
order by
        supplier_cnt desc,
        p_brand,
        p_type,
        p_size
LIMIT 1;

Q17:
-- using 1471398061 as a seed to the RNG
select
        sum(l_extendedprice) / 7.0 as avg_yearly
from
        lineitem,
        part,
        (SELECT l_partkey AS agg_partkey, 0.2 * avg(l_quantity) AS avg_quantity FROM lineitem GROUP BY l_partkey) part_agg
where
        p_partkey = l_partkey
        and agg_partkey = l_partkey
        and p_brand = 'Brand#21'
        and p_container = 'JUMBO JAR'
        and l_quantity < avg_quantity
LIMIT 1;

Q18
-- using 1471398061 as a seed to the RNG
select
        c_name,
        c_custkey,
        o_orderkey,
        o_orderdate,
        o_totalprice,
        sum(l_quantity)
from
        customer,
        orders,
        lineitem
where
        o_orderkey in (
                select
                        l_orderkey
                from
                        lineitem
                group by
                        l_orderkey having
                                sum(l_quantity) > 312
        )
        and c_custkey = o_custkey
        and o_orderkey = l_orderkey
group by
        c_name,
        c_custkey,
        o_orderkey,
        o_orderdate,
        o_totalprice
order by
        o_totalprice desc,
        o_orderdate
LIMIT 100;-- using 1471398061 as a seed to the RNG

Q19
select
        sum(l_extendedprice* (1 - l_discount)) as revenue
from
        lineitem,
        part
where
        (
                p_partkey = l_partkey
                and p_brand = 'Brand#42'
                and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
                and l_quantity >= 7 and l_quantity <= 7 + 10
                and p_size between 1 and 5
                and l_shipmode in ('AIR', 'AIR REG')
                and l_shipinstruct = 'DELIVER IN PERSON'
        )
        or
        (
                p_partkey = l_partkey
                and p_brand = 'Brand#22'
                and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
                and l_quantity >= 20 and l_quantity <= 20 + 10
                and p_size between 1 and 10
                and l_shipmode in ('AIR', 'AIR REG')
                and l_shipinstruct = 'DELIVER IN PERSON'
        )
        or
        (
                p_partkey = l_partkey
                and p_brand = 'Brand#25'
                and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
                and l_quantity >= 21 and l_quantity <= 21 + 10
                and p_size between 1 and 15
                and l_shipmode in ('AIR', 'AIR REG')
                and l_shipinstruct = 'DELIVER IN PERSON'
        )
LIMIT 1;

Q20
-- using 1471398061 as a seed to the RNG
select
        s_name,
        s_address
from
        supplier,
        nation
where
        s_suppkey in (
                select
                        ps_suppkey
                from
                        partsupp,
                        (
                                select
                                        l_partkey agg_partkey,
                                        l_suppkey agg_suppkey,
                                        0.5 * sum(l_quantity) AS agg_quantity
                                from
                                        lineitem
                                where
                                        l_shipdate >= date '1994-01-01'
                                        and l_shipdate < date '1994-01-01' + interval '1' year
                                group by
                                        l_partkey,
                                        l_suppkey
                        ) agg_lineitem
                where
                        agg_partkey = ps_partkey
                        and agg_suppkey = ps_suppkey
                        and ps_partkey in (
                                select
                                        p_partkey
                                from
                                        part
                                where
                                        p_name like 'forest%'
                        )
                        and ps_availqty > agg_quantity
        )
        and s_nationkey = n_nationkey
        and n_name = 'FRANCE'
order by
        s_name
LIMIT 1;

Q21
-- using 1471398061 as a seed to the RNG
select
        s_name,
        count(*) as numwait
from
        supplier,
        lineitem l1,
        orders,
        nation
where
        s_suppkey = l1.l_suppkey
        and o_orderkey = l1.l_orderkey
        and o_orderstatus = 'F'
        and l1.l_receiptdate > l1.l_commitdate
        and exists (
                select
                        *
                from
                        lineitem l2
                where
                        l2.l_orderkey = l1.l_orderkey
                        and l2.l_suppkey <> l1.l_suppkey
        )
        and not exists (
                select
                        *
                from
                        lineitem l3
                where
                        l3.l_orderkey = l1.l_orderkey
                        and l3.l_suppkey <> l1.l_suppkey
                        and l3.l_receiptdate > l3.l_commitdate
        )
        and s_nationkey = n_nationkey
        and n_name = 'GERMANY'
group by
        s_name
order by
        numwait desc,
        s_name
LIMIT 100;
Q22
-- using 1471398061 as a seed to the RNG
select
        cntrycode,
        count(*) as numcust,
        sum(c_acctbal) as totacctbal
from
        (
                select
                        substring(c_phone from 1 for 2) as cntrycode,
                        c_acctbal
                from
                        customer
                where
                        substring(c_phone from 1 for 2) in
                                ('16', '10', '34', '26', '33', '18', '11')
                        and c_acctbal > (
                                select
                                        avg(c_acctbal)
                                from
                                        customer
                                where
                                        c_acctbal > 0.00
                                        and substring(c_phone from 1 for 2) in
                                                ('16', '10', '34', '26', '33', '18', '11')
                        )
                        and not exists (
                                select
                                        *
                                from
                                        orders
                                where
                                        o_custkey = c_custkey
                        )
        ) as custsale
group by
        cntrycode
order by
        cntrycode
LIMIT 1;
相關文章
相關標籤/搜索