版權聲明:本文由黃輝原創文章,轉載請註明出處:
文章原文連接:https://www.qcloud.com/community/article/259算法
來源:騰雲閣 https://www.qcloud.com/communitysql
以前對GreenPlum與Mysql進行了TPC-H類的對比測試,發現同等資源配比條件下,GreenPlum的性能遠好於Mysql,有部分緣由是得益於GreenPlum自己採用了更高效的算法,好比說作多表join時,採用的是hash join方式。若是採用一樣高效的算法,二者的性能又如何?因爲GreenPlum是由PostgreSQL演變而來,徹底採用了PostgreSQL的優化算法,此次,咱們將GreenPlum與PostgreSQL進行對比測試,在同等資源配比條件下,查看GreenPlum(分佈式PostgreSQL)和單機版PostgreSQL的性能表現。服務器
測試環境:騰訊雲
測試對象:GreenPlum、PostgreSQL,二者的配置信息統計以下:架構
表1 GreenPlum集羣服務器分佈式
Master Host | Segment Host | Segment Host | |
---|---|---|---|
操做系統 | CentOS 6.7 64位 | CentOS 6.7 64位 | CentOS 6.7 64位 |
CPU | Intel(R) Xeon(R) CPU E5-26xx v3 2核 | Intel(R) Xeon(R) CPU E5-26xx v3 2核 | Intel(R) Xeon(R) CPU E5-26xx v3 2核 |
內存 | 8GB | 8GB | 8GB |
公網帶寬 | 100Mbps | 100Mbps | 100Mbps |
IP | 123.207.228.40 | 123.207.228.21 | 123.207.85.105 |
Segment數量 | 0 | 2 | 2 |
版本 | greenplum-db-4.3.8.1-build-1-RHEL5-x86_64 | greenplum-db-4.3.8.1-build-1-RHEL5-x86_64 | greenplum-db-4.3.8.1-build-1-RHEL5-x86_64 |
表2 PostgreSQL服務器工具
指標 | 參數 |
---|---|
操做系統 | CentOS 6.7 64位 |
cpu | Intel(R) Xeon(R) CPU E5-26xx v3 8核 |
內存 | 24GB |
公網帶寬 | 100Mbps |
IP | 119.29.229.209 |
版本 | PostgreSQL 9.5.4 |
1.總測試數據量爲1G時
結果統計信息以下:性能
表3 總量爲1GB時各測試表數據量統計測試
表名稱 | 數據條數 |
---|---|
customer | 150000 |
lineitem | 6001215 |
nation | 25 |
orders | 1500000 |
part | 200000 |
partsupp | 800000 |
region | 5 |
supplier | 10000 |
表4 總量爲1GB時22條sql執行時間統計大數據
執行的sql | GeenPlum執行時間(單位:秒) | PostgreSQL執行時間(單位:秒) |
---|---|---|
Q1 | 4.01 | 12.93 |
Q2 | 0.50 | 0.62 |
Q3 | 1.35 | 1.29 |
Q4 | 0.11 | 0.52 |
Q5 | 0.19 | 0.72 |
Q6 | 0.01 | 0.79 |
Q7 | 6.06 | 1.84 |
Q8 | 1.46 | 0.59 |
Q9 | 4.00 | 7.04 |
Q10 | 0.14 | 2.19 |
Q11 | 0.30 | 0.18 |
Q12 | 0.08 | 2.15 |
Q13 | 1.04 | 4.05 |
Q14 | 0.04 | 0.42 |
Q15 | 0.07 | 1.66 |
Q16 | 0.51 | 0.80 |
Q17 | 3.21 | 23.07 |
Q18 | 14.23 | 5.86 |
Q19 | 0.95 | 0.17 |
Q20 | 0.16 | 3.10 |
Q21 | 7.23 | 2.22 |
Q22 | 0.96 | 0.28 |
分析:從以上的表4能夠看出,PostgreSQL在22條sql中有8條sql的執行時間比GreenPlum少,接近一半的比例,咱們直接放大10倍的測試數據量進行下一步測試。優化
2.總測試數據量爲10G時
結果統計以下:
表5 總量爲10GB時各測試表數據量統計
表名稱 | 數據條數 |
---|---|
customer | 1500000 |
lineitem | 59986052 |
nation | 25 |
orders | 15000000 |
part | 2000000 |
partsupp | 8000000 |
region | 5 |
supplier | 100000 |
表6 總量爲10GB時22條sql執行時間統計
執行的sql | GeenPlum執行時間(單位:秒) | PostgreSQL執行時間(單位:秒) |
---|---|---|
Q1 | 36.98 | 130.61 |
Q2 | 3.10 | 17.08 |
Q3 | 14.39 | 117.83 |
Q4 | 0.11 | 6.81 |
Q5 | 0.20 | 114.46 |
Q6 | 0.01 | 11.08 |
Q7 | 80.12 | 42.96 |
Q8 | 6.61 | 45.13 |
Q9 | 49.72 | 118.36 |
Q10 | 0.16 | 40.51 |
Q11 | 2.28 | 3.06 |
Q12 | 0.08 | 21.47 |
Q13 | 19.29 | 68.83 |
Q14 | 0.05 | 36.28 |
Q15 | 0.09 | 23.16 |
Q16 | 6.30 | 12.77 |
Q17 | 134.22 | 127.79 |
Q18 | 168.03 | 199.48 |
Q19 | 6.25 | 1.96 |
Q20 | 0.54 | 52.10 |
Q21 | 84.68 | 190.59 |
Q22 | 17.93 | 2.98 |
分析:放大數據量到10G後能夠明顯看出,PostgreSQL執行測試sql的時間大幅度增多,性能降低比較厲害,但仍有3條測試sql快於GreenPlum,咱們選取其中一條對比查看下二者的性能區別緣由。
這裏咱們以Q7爲例,Greenplum的執行時間大約是PostgreSQL的兩倍,Q7以下:
圖1 Q7表示的sql語句
在PostgreSQL上執行explain Q7,獲得結果以下:
圖2 數據量爲10G時PostgreSQL上執行explain Q7的結果
對執行進行分析,能夠看出,整個過程最耗時的部分如上圖紅色框部分標識,對應的條件查詢操做分別是:
1).在lineitem表上對l_shipdata字段按條件查詢,由於在字段有索引,採用了高效的Bitmap索引查詢(Bitmap索引查詢分兩步:1.建位圖;2.掃表。詳細瞭解可看http://kb.cnblogs.com/page/515258/ )。
2).lineitem和orders表hash join操做。
爲了方便進一步分析,咱們加上analyze參數,獲取詳細的執行時間,因爲內容過多,這裏只截取部分重要信息以下:
圖3 數據量爲10G時PostgreSQL上執行explain analyze Q7的部分結果
根據以上信息,咱們能夠得出這兩部分操做的具體執行時間,但因爲PostgreSQL採起多任務並行,所以,咱們須要對每步操做計算出一個滯留時間(該時間段內系統只執行該步操做),縮短滯留時間可直接提高執行速度,每步的滯留時間爲前步的結束時間與該步結束時間之差。兩部分的滯留時間分別爲:
1).Bitmap Heap Scan:20197-2233=17964ms
2).Hash join:42889-26200=16689ms
PostgreSQL執行Q7的總時間爲42963ms,所以,能夠印證系統的耗時主要集中在上述兩步操做上。
接下來,咱們在GreenPlum上執行explain Q7,結果以下:
圖4 數據量爲10G時GreenPlum上執行explain Q7的結果
與PostgreSQL不一樣的是,GreenPlum的耗時多了數據重分佈部分。一樣,咱們經過analyze參數獲得詳細的執行時間以下:
圖5 數據量爲10G時GreenPlum上執行explain analyze Q7的部分結果
根據執行計劃信息,選出耗時最長的三步操做,計算出在一個segment(耗時最長的)上這三部分的滯留時間爲:
1).Scan lineitem: 6216ms
2).Redistribute: 36273ms
3).Hash join: 29885ms
GreenPlum執行Q7的總時間爲80121ms,可見數據重分佈的時間佔據了整個執行時間的一半,進行Hash join操做的時間佔比也較多,主要是segment的內存不足,引發了磁盤的IO。
小結:對比PostgreSQL和GreenPlum在Q7的執行計劃,GreenPlum的耗時較多的緣由主要是數據重分佈的大量時間消耗和hash join時超出內存引發磁盤IO。雖然GreenPlum各segment並行掃lineitem表節省了時間,但佔比較小,對總時間的消耗影響較小。
基於此,是否能夠減小數據重分佈操做的耗時佔比?咱們嘗試進一步增長測試的數據量,比較10G的測試數據對於真實的OLAP場景仍是過少,擴大5倍的測試量,繼續查看耗時狀況是否有所改變。
3. 總測試數據量爲50G時
表7 總量爲50GB時各測試表數據量統計
表名稱 | 數據條數 |
---|---|
customer | 7500000 |
lineitem | 300005811 |
nation | 25 |
orders | 75000000 |
part | 10000000 |
partsupp | 40000000 |
region | 5 |
supplier | 500000 |
表8 總量爲50GB時22條sql執行時間統計
執行的sql | GeenPlum執行時間(單位:秒) | PostgreSQL執行時間(單位:秒) |
---|---|---|
Q1 | 212.27 | 802.24 |
Q2 | 16.53 | 164.20 |
Q3 | 156.31 | 2142.18 |
Q4 | 0.13 | 2934.76 |
Q5 | 0.23 | 2322.92 |
Q6 | 0.01 | 6439.26 |
Q7 | 535.66 | 11906.74 |
Q8 | 76.76 | 9171.83 |
Q9 | 313.91 | >26060.36 |
Q10 | 0.41 | 1905.13 |
Q11 | 7.71 | 17.65 |
Q12 | 0.19 | >3948.07 |
Q13 | 108.05 | 354.59 |
Q14 | 0.05 | 8054.72 |
Q15 | 0.07 | >2036.03 |
Q16 | 34.74 | 221.49 |
Q17 | 862.90 | >9010.56 |
Q18 | 913.97 | 3174.24 |
Q19 | 129.14 | 8666.38 |
Q20 | 2.28 | 9389.21 |
Q21 | 1064.67 | >26868.31 |
Q22 | 90.90 | 1066.44 |
分析:從結果表可明顯看出,在22條SQL中,GreenPlum的執行效率都比PostgreSQL高出不少,咱們仍是以Q7爲例,查看兩種數據量下執行效率不一致的直接緣由。
通過對執行計劃的分析,發現區別仍是集中在步驟2提到的幾個部分,這裏就再也不重複給出總體的查詢計劃,直接查看耗時較多的部分以下:
圖6 數據量爲50G時PostgreSQL上執行explain analyze Q7的部分結果
圖7 數據量爲50G時GreenPlum上執行explain analyze Q7的部分結果
PostgreSQL的主要滯留時間有:
1).Bitmap Heap Scan: 9290197ms
2).Hash join: 713138ms
總執行時間爲10219009ms,可見主要的耗時集中在Bitmap Heap Scan上,
GreenPlum的主要滯留時間有:
1).Scan lineitem: 130397ms
2).Redistribute: 140685ms
3).Hash join: 211456ms
總的執行時間爲537134ms,相比步驟2的10G測試數據量,數據重分佈的耗時佔比明顯降低,主要耗時已集中在hash join操做上。
GreenPlum和PostgreSQL在執行一樣的wheret條件時,掃表的方式不同,緣由在於GreenPlum裏的lineitem表爲列存儲,直接掃表更方便更快。
對比PostgreSQL兩次的測試結果,發現Bitmao Heap Scan操做的性能降低比較明顯,第一次掃18188314 行用時17秒,而第二次掃90522811行用時9190秒。
小結:增大數據量,會減小數據重分佈耗時對總體執行時間的影響比重,主要耗時集中在內部數據的計算上。因爲掃表涉及到磁盤IO,GreenPlum將掃表任務分割給多個segment同時進行,減小了單個節點要執行的掃表量,至關於並行IO操做,對總體的性能提高較大。
經過對不一樣數據量(1G,10G,50G)的測試對比以及分析,能夠看出,在TPC-H類的測試時,數據量越大,GreenPlum性能越好於單機版的PostgreSQL。因爲GreenPlum採用分佈式架構,爲了實現各節點並行計算能力,須要在節點間進行廣播或者數據重分佈,對總體的性能有必定影響,當數據量較小時,計算量小,廣播或者重分佈耗時佔總耗時比例大,影響總體的執行效率,可能會出現GreenPlum不如單機版PostgreSQL效率高;當數據量較大時,總體計算的量很大,廣播或者重分佈耗時再也不是影響性能的關鍵因素,分佈式屬性的GreenPlum在關於複雜語句執行查詢效率較高,緣由在於,一是多節點同時進行計算(hash join、sort等),提高計算速度,且能夠充分利用系統CPU資源;二是掃表時,將任務分派到多節點,減小了單個節點的IO次數,達到並行IO的目的,更適用於OLAP場景。
因爲原生的TPC-H的測試用例不直接支持GreenPlum和PostgreSQL,所以須要修改測試腳本,生成新的建表語句如《附錄一》所示,測試sql如《附錄二》。
GreenPlum的數據導入可使用GreenPlum自帶的gpfdist工具,搭建多個gpfdsit文件服務器並行導入,但文件服務器的數量不能多於segment數量,這點官方文檔並未說明。
GreenPlum: BEGIN; CREATE TABLE PART ( P_PARTKEY SERIAL8, P_NAME VARCHAR(55), P_MFGR CHAR(25), P_BRAND CHAR(10), P_TYPE VARCHAR(25), P_SIZE INTEGER, P_CONTAINER CHAR(10), P_RETAILPRICE DECIMAL, P_COMMENT VARCHAR(23) ) with (APPENDONLY=true,BLOCKSIZE=2097152,ORIENTATION=COLUMN,CHECKSUM=true,OIDS=false) DISTRIBUTED BY (p_partkey); COPY part FROM '/tmp/dss-data/part.csv' WITH csv DELIMITER '|'; COMMIT; BEGIN; CREATE TABLE REGION ( R_REGIONKEY SERIAL8, R_NAME CHAR(25), R_COMMENT VARCHAR(152) ) with (APPENDONLY=true,BLOCKSIZE=2097152,ORIENTATION=COLUMN,CHECKSUM=true,OIDS=false) DISTRIBUTED BY (r_regionkey); COPY region FROM '/tmp/dss-data/region.csv' WITH csv DELIMITER '|'; COMMIT; BEGIN; CREATE TABLE NATION ( N_NATIONKEY SERIAL8, N_NAME CHAR(25), N_REGIONKEY BIGINT NOT NULL, -- references R_REGIONKEY N_COMMENT VARCHAR(152) ) with (APPENDONLY=true,BLOCKSIZE=2097152,ORIENTATION=COLUMN,CHECKSUM=true,OIDS=false) DISTRIBUTED BY (n_nationkey); COPY nation FROM '/tmp/dss-data/nation.csv' WITH csv DELIMITER '|'; COMMIT; BEGIN; CREATE TABLE SUPPLIER ( S_SUPPKEY SERIAL8, S_NAME CHAR(25), S_ADDRESS VARCHAR(40), S_NATIONKEY BIGINT NOT NULL, -- references N_NATIONKEY S_PHONE CHAR(15), S_ACCTBAL DECIMAL, S_COMMENT VARCHAR(101) ) with (APPENDONLY=true,BLOCKSIZE=2097152,ORIENTATION=COLUMN,CHECKSUM=true,OIDS=false) DISTRIBUTED BY (s_suppkey); COPY supplier FROM '/tmp/dss-data/supplier.csv' WITH csv DELIMITER '|'; COMMIT; BEGIN; CREATE TABLE CUSTOMER ( C_CUSTKEY SERIAL8, C_NAME VARCHAR(25), C_ADDRESS VARCHAR(40), C_NATIONKEY BIGINT NOT NULL, -- references N_NATIONKEY C_PHONE CHAR(15), C_ACCTBAL DECIMAL, C_MKTSEGMENT CHAR(10), C_COMMENT VARCHAR(117) ) with (APPENDONLY=true,BLOCKSIZE=2097152,ORIENTATION=COLUMN,CHECKSUM=true,OIDS=false) DISTRIBUTED BY (c_custkey); COPY customer FROM '/tmp/dss-data/customer.csv' WITH csv DELIMITER '|'; COMMIT; BEGIN; CREATE TABLE PARTSUPP ( PS_PARTKEY BIGINT NOT NULL, -- references P_PARTKEY PS_SUPPKEY BIGINT NOT NULL, -- references S_SUPPKEY PS_AVAILQTY INTEGER, PS_SUPPLYCOST DECIMAL, PS_COMMENT VARCHAR(199) ) with (APPENDONLY=true,BLOCKSIZE=2097152,ORIENTATION=COLUMN,CHECKSUM=true,OIDS=false) DISTRIBUTED BY (ps_partkey,ps_suppkey); COPY partsupp FROM '/tmp/dss-data/partsupp.csv' WITH csv DELIMITER '|'; COMMIT; BEGIN; CREATE TABLE ORDERS ( O_ORDERKEY SERIAL8, O_CUSTKEY BIGINT NOT NULL, -- references C_CUSTKEY O_ORDERSTATUS CHAR(1), O_TOTALPRICE DECIMAL, O_ORDERDATE DATE, O_ORDERPRIORITY CHAR(15), O_CLERK CHAR(15), O_SHIPPRIORITY INTEGER, O_COMMENT VARCHAR(79) ) with (APPENDONLY=true,BLOCKSIZE=2097152,ORIENTATION=COLUMN,CHECKSUM=true,OIDS=false) DISTRIBUTED BY (o_orderkey); COPY orders FROM '/tmp/dss-data/orders.csv' WITH csv DELIMITER '|'; COMMIT; BEGIN; CREATE TABLE LINEITEM ( L_ORDERKEY BIGINT NOT NULL, -- references O_ORDERKEY L_PARTKEY BIGINT NOT NULL, -- references P_PARTKEY (compound fk to PARTSUPP) L_SUPPKEY BIGINT NOT NULL, -- references S_SUPPKEY (compound fk to PARTSUPP) L_LINENUMBER INTEGER, L_QUANTITY DECIMAL, L_EXTENDEDPRICE DECIMAL, L_DISCOUNT DECIMAL, L_TAX DECIMAL, L_RETURNFLAG CHAR(1), L_LINESTATUS CHAR(1), L_SHIPDATE DATE, L_COMMITDATE DATE, L_RECEIPTDATE DATE, L_SHIPINSTRUCT CHAR(25), L_SHIPMODE CHAR(10), L_COMMENT VARCHAR(44) ) with (APPENDONLY=true,BLOCKSIZE=2097152,ORIENTATION=COLUMN,CHECKSUM=true,OIDS=false) DISTRIBUTED BY (l_orderkey, l_linenumber); COPY lineitem FROM '/tmp/dss-data/lineitem.csv' WITH csv DELIMITER '|'; COMMIT; PostgreSQL: BEGIN; CREATE TABLE PART ( P_PARTKEY SERIAL, P_NAME VARCHAR(55), P_MFGR CHAR(25), P_BRAND CHAR(10), P_TYPE VARCHAR(25), P_SIZE INTEGER, P_CONTAINER CHAR(10), P_RETAILPRICE DECIMAL, P_COMMENT VARCHAR(23) ); COPY part FROM '/tmp/dss-data-copy/part.csv' WITH csv DELIMITER '|'; COMMIT; BEGIN; CREATE TABLE REGION ( R_REGIONKEY SERIAL, R_NAME CHAR(25), R_COMMENT VARCHAR(152) ); COPY region FROM '/tmp/dss-data-copy/region.csv' WITH (FORMAT csv, DELIMITER '|'); COMMIT; BEGIN; CREATE TABLE NATION ( N_NATIONKEY SERIAL, N_NAME CHAR(25), N_REGIONKEY BIGINT NOT NULL, -- references R_REGIONKEY N_COMMENT VARCHAR(152) ); COPY nation FROM '/tmp/dss-data-copy/nation.csv' WITH (FORMAT csv, DELIMITER '|'); COMMIT; BEGIN; CREATE TABLE SUPPLIER ( S_SUPPKEY SERIAL, S_NAME CHAR(25), S_ADDRESS VARCHAR(40), S_NATIONKEY BIGINT NOT NULL, -- references N_NATIONKEY S_PHONE CHAR(15), S_ACCTBAL DECIMAL, S_COMMENT VARCHAR(101) ); COPY supplier FROM '/tmp/dss-data-copy/supplier.csv' WITH (FORMAT csv, DELIMITER '|'); COMMIT; BEGIN; CREATE TABLE CUSTOMER ( C_CUSTKEY SERIAL, C_NAME VARCHAR(25), C_ADDRESS VARCHAR(40), C_NATIONKEY BIGINT NOT NULL, -- references N_NATIONKEY C_PHONE CHAR(15), C_ACCTBAL DECIMAL, C_MKTSEGMENT CHAR(10), C_COMMENT VARCHAR(117) ); COPY customer FROM '/tmp/dss-data-copy/customer.csv' WITH (FORMAT csv, DELIMITER '|'); COMMIT; BEGIN; CREATE TABLE PARTSUPP ( PS_PARTKEY BIGINT NOT NULL, -- references P_PARTKEY PS_SUPPKEY BIGINT NOT NULL, -- references S_SUPPKEY PS_AVAILQTY INTEGER, PS_SUPPLYCOST DECIMAL, PS_COMMENT VARCHAR(199) ); COPY partsupp FROM '/tmp/dss-data-copy/partsupp.csv' WITH (FORMAT csv, DELIMITER '|'); COMMIT; BEGIN; CREATE TABLE ORDERS ( O_ORDERKEY SERIAL, O_CUSTKEY BIGINT NOT NULL, -- references C_CUSTKEY O_ORDERSTATUS CHAR(1), O_TOTALPRICE DECIMAL, O_ORDERDATE DATE, O_ORDERPRIORITY CHAR(15), O_CLERK CHAR(15), O_SHIPPRIORITY INTEGER, O_COMMENT VARCHAR(79) ); COPY orders FROM '/tmp/dss-data-copy/orders.csv' WITH (FORMAT csv, DELIMITER '|'); COMMIT; BEGIN; CREATE TABLE LINEITEM ( L_ORDERKEY BIGINT NOT NULL, -- references O_ORDERKEY L_PARTKEY BIGINT NOT NULL, -- references P_PARTKEY (compound fk to PARTSUPP) L_SUPPKEY BIGINT NOT NULL, -- references S_SUPPKEY (compound fk to PARTSUPP) L_LINENUMBER INTEGER, L_QUANTITY DECIMAL, L_EXTENDEDPRICE DECIMAL, L_DISCOUNT DECIMAL, L_TAX DECIMAL, L_RETURNFLAG CHAR(1), L_LINESTATUS CHAR(1), L_SHIPDATE DATE, L_COMMITDATE DATE, L_RECEIPTDATE DATE, L_SHIPINSTRUCT CHAR(25), L_SHIPMODE CHAR(10), L_COMMENT VARCHAR(44) ); COPY lineitem FROM '/tmp/dss-data-copy/lineitem.csv' WITH (FORMAT csv, DELIMITER '|'); COMMIT;
Q1: -- using 1471398061 as a seed to the RNG select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order from lineitem where l_shipdate <= date '1998-12-01' - interval '85' day group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus LIMIT 1; Q2: -- using 1471398061 as a seed to the RNG select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment from part, supplier, partsupp, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and p_size = 48 and p_type like '%STEEL' and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'AFRICA' and ps_supplycost = ( select min(ps_supplycost) from partsupp, supplier, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'AFRICA' ) order by s_acctbal desc, n_name, s_name, p_partkey LIMIT 100; Q3: -- using 1471398061 as a seed to the RNG select l_orderkey, sum(l_extendedprice * (1 - l_discount)) as revenue, o_orderdate, o_shippriority from customer, orders, lineitem where c_mktsegment = 'HOUSEHOLD' and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < date '1995-03-03' and l_shipdate > date '1995-03-03' group by l_orderkey, o_orderdate, o_shippriority order by revenue desc, o_orderdate LIMIT 10; Q4: -- using 1471398061 as a seed to the RNG select o_orderpriority, count(*) as order_count from orders where o_orderdate >= date '1993-06-01' and o_orderdate < date '1993-06-01' + interval '3' month and exists ( select * from lineitem where l_orderkey = o_orderkey and l_commitdate < l_receiptdate ) group by o_orderpriority order by o_orderpriority LIMIT 1; Q5: -- using 1471398061 as a seed to the RNG select n_name, sum(l_extendedprice * (1 - l_discount)) as revenue from customer, orders, lineitem, supplier, nation, region where c_custkey = o_custkey and l_orderkey = o_orderkey and l_suppkey = s_suppkey and c_nationkey = s_nationkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'AMERICA' and o_orderdate >= date '1993-01-01' and o_orderdate < date '1993-01-01' + interval '1' year group by n_name order by revenue desc LIMIT 1; Q6: -- using 1471398061 as a seed to the RNG select sum(l_extendedprice * l_discount) as revenue from lineitem where l_shipdate >= date '1993-01-01' and l_shipdate < date '1993-01-01' + interval '1' year and l_discount between 0.02 - 0.01 and 0.02 + 0.01 and l_quantity < 24 LIMIT 1; Q7: -- using 1471398061 as a seed to the RNG select supp_nation, cust_nation, l_year, sum(volume) as revenue from ( select n1.n_name as supp_nation, n2.n_name as cust_nation, extract(year from l_shipdate) as l_year, l_extendedprice * (1 - l_discount) as volume from supplier, lineitem, orders, customer, nation n1, nation n2 where s_suppkey = l_suppkey and o_orderkey = l_orderkey and c_custkey = o_custkey and s_nationkey = n1.n_nationkey and c_nationkey = n2.n_nationkey and ( (n1.n_name = 'BRAZIL' and n2.n_name = 'INDONESIA') or (n1.n_name = 'INDONESIA' and n2.n_name = 'BRAZIL') ) and l_shipdate between date '1995-01-01' and date '1996-12-31' ) as shipping group by supp_nation, cust_nation, l_year order by supp_nation, cust_nation, l_year LIMIT 1; Q8: -- using 1471398061 as a seed to the RNG select o_year, sum(case when nation = 'INDONESIA' then volume else 0 end) / sum(volume) as mkt_share from ( select extract(year from o_orderdate) as o_year, l_extendedprice * (1 - l_discount) as volume, n2.n_name as nation from part, supplier, lineitem, orders, customer, nation n1, nation n2, region where p_partkey = l_partkey and s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date '1995-01-01' and date '1996-12-31' and p_type = 'ECONOMY BURNISHED BRASS' ) as all_nations group by o_year order by o_year LIMIT 1; Q9: -- using 1471398061 as a seed to the RNG select nation, o_year, sum(amount) as sum_profit from ( select n_name as nation, extract(year from o_orderdate) as o_year, l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount from part, supplier, lineitem, partsupp, orders, nation where s_suppkey = l_suppkey and ps_suppkey = l_suppkey and ps_partkey = l_partkey and p_partkey = l_partkey and o_orderkey = l_orderkey and s_nationkey = n_nationkey and p_name like '%powder%' ) as profit group by nation, o_year order by nation, o_year desc LIMIT 1;-- using 1471398061 as a seed to the RNG Q10 select c_custkey, c_name, sum(l_extendedprice * (1 - l_discount)) as revenue, c_acctbal, n_name, c_address, c_phone, c_comment from customer, orders, lineitem, nation where c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate >= date '1993-06-01' and o_orderdate < date '1993-06-01' + interval '3' month and l_returnflag = 'R' and c_nationkey = n_nationkey group by c_custkey, c_name, c_acctbal, c_phone, n_name, c_address, c_comment order by revenue desc LIMIT 20; Q11 -- using 1471398061 as a seed to the RNG select ps_partkey, sum(ps_supplycost * ps_availqty) as value from partsupp, supplier, nation where ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = 'PERU' group by ps_partkey having sum(ps_supplycost * ps_availqty) > ( select sum(ps_supplycost * ps_availqty) * 0.0001000000 from partsupp, supplier, nation where ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = 'PERU' ) order by value desc LIMIT 1;-- using 1471398061 as a seed to the RNG Q12 select l_shipmode, sum(case when o_orderpriority = '1-URGENT' or o_orderpriority = '2-HIGH' then 1 else 0 end) as high_line_count, sum(case when o_orderpriority <> '1-URGENT' and o_orderpriority <> '2-HIGH' then 1 else 0 end) as low_line_count from orders, lineitem where o_orderkey = l_orderkey and l_shipmode in ('REG AIR', 'RAIL') and l_commitdate < l_receiptdate and l_shipdate < l_commitdate and l_receiptdate >= date '1993-01-01' and l_receiptdate < date '1993-01-01' + interval '1' year group by l_shipmode order by l_shipmode LIMIT 1;-- using 1471398061 as a seed to the RNG Q13 select c_count, count(*) as custdist from ( select c_custkey, count(o_orderkey) from customer left outer join orders on c_custkey = o_custkey and o_comment not like '%pending%packages%' group by c_custkey ) as c_orders (c_custkey, c_count) group by c_count order by custdist desc, c_count desc Q14 LIMIT 1;-- using 1471398061 as a seed to the RNG select 100.00 * sum(case when p_type like 'PROMO%' then l_extendedprice * (1 - l_discount) else 0 end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue from lineitem, part where l_partkey = p_partkey and l_shipdate >= date '1993-09-01' and l_shipdate < date '1993-09-01' + interval '1' month LIMIT 1; Q15 -- using 1471398061 as a seed to the RNG create view revenue0 (supplier_no, total_revenue) as select l_suppkey, sum(l_extendedprice * (1 - l_discount)) from lineitem where l_shipdate >= date '1994-11-01' and l_shipdate < date '1994-11-01' + interval '3' month group by l_suppkey; select s_suppkey, s_name, s_address, s_phone, total_revenue from supplier, revenue0 where s_suppkey = supplier_no and total_revenue = ( select max(total_revenue) from revenue0 ) order by s_suppkey LIMIT 1; Q16 drop view revenue0;-- using 1471398061 as a seed to the RNG select p_brand, p_type, p_size, count(distinct ps_suppkey) as supplier_cnt from partsupp, part where p_partkey = ps_partkey and p_brand <> 'Brand#22' and p_type not like 'STANDARD PLATED%' and p_size in (34, 17, 18, 16, 15, 49, 1, 48) and ps_suppkey not in ( select s_suppkey from supplier where s_comment like '%Customer%Complaints%' ) group by p_brand, p_type, p_size order by supplier_cnt desc, p_brand, p_type, p_size LIMIT 1; Q17: -- using 1471398061 as a seed to the RNG select sum(l_extendedprice) / 7.0 as avg_yearly from lineitem, part, (SELECT l_partkey AS agg_partkey, 0.2 * avg(l_quantity) AS avg_quantity FROM lineitem GROUP BY l_partkey) part_agg where p_partkey = l_partkey and agg_partkey = l_partkey and p_brand = 'Brand#21' and p_container = 'JUMBO JAR' and l_quantity < avg_quantity LIMIT 1; Q18 -- using 1471398061 as a seed to the RNG select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) from customer, orders, lineitem where o_orderkey in ( select l_orderkey from lineitem group by l_orderkey having sum(l_quantity) > 312 ) and c_custkey = o_custkey and o_orderkey = l_orderkey group by c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice order by o_totalprice desc, o_orderdate LIMIT 100;-- using 1471398061 as a seed to the RNG Q19 select sum(l_extendedprice* (1 - l_discount)) as revenue from lineitem, part where ( p_partkey = l_partkey and p_brand = 'Brand#42' and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') and l_quantity >= 7 and l_quantity <= 7 + 10 and p_size between 1 and 5 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#22' and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') and l_quantity >= 20 and l_quantity <= 20 + 10 and p_size between 1 and 10 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#25' and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG') and l_quantity >= 21 and l_quantity <= 21 + 10 and p_size between 1 and 15 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) LIMIT 1; Q20 -- using 1471398061 as a seed to the RNG select s_name, s_address from supplier, nation where s_suppkey in ( select ps_suppkey from partsupp, ( select l_partkey agg_partkey, l_suppkey agg_suppkey, 0.5 * sum(l_quantity) AS agg_quantity from lineitem where l_shipdate >= date '1994-01-01' and l_shipdate < date '1994-01-01' + interval '1' year group by l_partkey, l_suppkey ) agg_lineitem where agg_partkey = ps_partkey and agg_suppkey = ps_suppkey and ps_partkey in ( select p_partkey from part where p_name like 'forest%' ) and ps_availqty > agg_quantity ) and s_nationkey = n_nationkey and n_name = 'FRANCE' order by s_name LIMIT 1; Q21 -- using 1471398061 as a seed to the RNG select s_name, count(*) as numwait from supplier, lineitem l1, orders, nation where s_suppkey = l1.l_suppkey and o_orderkey = l1.l_orderkey and o_orderstatus = 'F' and l1.l_receiptdate > l1.l_commitdate and exists ( select * from lineitem l2 where l2.l_orderkey = l1.l_orderkey and l2.l_suppkey <> l1.l_suppkey ) and not exists ( select * from lineitem l3 where l3.l_orderkey = l1.l_orderkey and l3.l_suppkey <> l1.l_suppkey and l3.l_receiptdate > l3.l_commitdate ) and s_nationkey = n_nationkey and n_name = 'GERMANY' group by s_name order by numwait desc, s_name LIMIT 100; Q22 -- using 1471398061 as a seed to the RNG select cntrycode, count(*) as numcust, sum(c_acctbal) as totacctbal from ( select substring(c_phone from 1 for 2) as cntrycode, c_acctbal from customer where substring(c_phone from 1 for 2) in ('16', '10', '34', '26', '33', '18', '11') and c_acctbal > ( select avg(c_acctbal) from customer where c_acctbal > 0.00 and substring(c_phone from 1 for 2) in ('16', '10', '34', '26', '33', '18', '11') ) and not exists ( select * from orders where o_custkey = c_custkey ) ) as custsale group by cntrycode order by cntrycode LIMIT 1;