數據庫實驗 - 1. TPC-H 數據生成和導入

1. TPC-H 數據生成和導入

實驗環境 PostgreSQL 12
參數 ScaleFactor = 1G, QuerySeed = 20190909bash

製做完成的數據和查詢以下:
連接: https://pan.baidu.com/s/1-2VcQcrSZhz1yFd1Cq4m9Q 提取碼: q8sj架構

1.1. 生成數據

參考 TPC-H數據導入postgresql教程post

編輯 dbgen/makefile.suite 修改其中各部分以下ui

CC = gcc
DATABASE = SQLSERVER 
MACHINE = LINUX 
WORKLOAD = TPCH
$ #make clean

$ make

$ ./dbgen -s 1 -f # ScaleFactor=1(Gigabytes), Overwrite

生成以下文件.net

dss.ddl # 表定義
dss.ri  # 主鍵和外鍵定義

# 數據
customer.tbl
lineitem.tbl
nation.tbl
orders.tbl
partsupp.tbl
part.tbl
region.tbl
supplier.tbl

數據處理postgresql

sed -i 's/|$//g' `find *.tbl` # 去除末尾的 DELIMITER

1.2. 導入數據

先導入表定義,直接運行一遍 dss.ddlcode

\i /home/monkey/Research/DBAcc/TPCH/2.18.0_rc2/dbgen/dss.ddl

再導入表數據blog

chmod 777 *.tbl # 使 PostgreSQL 能夠讀文件
copy nation from '/home/monkey/Research/DBAcc/TPCH/2.18.0_rc2/dbgen/nation.tbl' with DELIMITER as '|';
copy part from '/home/monkey/Research/DBAcc/TPCH/2.18.0_rc2/dbgen/part.tbl' with DELIMITER as '|';
copy region from '/home/monkey/Research/DBAcc/TPCH/2.18.0_rc2/dbgen/region.tbl' with DELIMITER as '|';
copy partsupp from '/home/monkey/Research/DBAcc/TPCH/2.18.0_rc2/dbgen/partsupp.tbl' with DELIMITER as '|';
copy customer from '/home/monkey/Research/DBAcc/TPCH/2.18.0_rc2/dbgen/customer.tbl' with DELIMITER as '|';
copy supplier from '/home/monkey/Research/DBAcc/TPCH/2.18.0_rc2/dbgen/supplier.tbl' with DELIMITER as '|';
copy lineitem from '/home/monkey/Research/DBAcc/TPCH/2.18.0_rc2/dbgen/lineitem.tbl' with DELIMITER as '|';
copy orders from '/home/monkey/Research/DBAcc/TPCH/2.18.0_rc2/dbgen/orders.tbl' with DELIMITER as '|';

另外一種導入方法(上面一種沒 work,下面這種能夠):教程

cat nation.tbl | psql -U tpch -d tpch -c "copy nation from stdin with DELIMITER as '|';"
cat part.tbl | psql -U tpch -d tpch -c "copy part from stdin with DELIMITER as '|';"
cat region.tbl | psql -U tpch -d tpch -c "copy region from stdin with DELIMITER as '|';"
cat partsupp.tbl | psql -U tpch -d tpch -c "copy partsupp from stdin with DELIMITER as '|';"
cat customer.tbl | psql -U tpch -d tpch -c "copy customer from stdin with DELIMITER as '|';"
cat supplier.tbl | psql -U tpch -d tpch -c "copy supplier from stdin with DELIMITER as '|';"
cat lineitem.tbl | psql -U tpch -d tpch -c "copy lineitem from stdin with DELIMITER as '|';"
cat orders.tbl | psql -U tpch -d tpch -c "copy orders from stdin with DELIMITER as '|';"

1.3. 添加外鍵

dss.ri 作相應修改以下,執行。

-- For table REGION
ALTER TABLE REGION
ADD PRIMARY KEY (R_REGIONKEY);

-- For table NATION
ALTER TABLE NATION
ADD PRIMARY KEY (N_NATIONKEY);

ALTER TABLE NATION
ADD FOREIGN KEY (N_REGIONKEY) references REGION;

COMMIT WORK;

-- For table PART
ALTER TABLE PART
ADD PRIMARY KEY (P_PARTKEY);

COMMIT WORK;

-- For table SUPPLIER
ALTER TABLE SUPPLIER
ADD PRIMARY KEY (S_SUPPKEY);

ALTER TABLE SUPPLIER
ADD FOREIGN KEY (S_NATIONKEY) references NATION;

COMMIT WORK;

-- For table PARTSUPP
ALTER TABLE PARTSUPP
ADD PRIMARY KEY (PS_PARTKEY,PS_SUPPKEY);

COMMIT WORK;

-- For table CUSTOMER
ALTER TABLE CUSTOMER
ADD PRIMARY KEY (C_CUSTKEY);

ALTER TABLE CUSTOMER
ADD FOREIGN KEY (C_NATIONKEY) references NATION;

COMMIT WORK;

-- For table LINEITEM
ALTER TABLE LINEITEM
ADD PRIMARY KEY (L_ORDERKEY,L_LINENUMBER);

COMMIT WORK;

-- For table ORDERS
ALTER TABLE ORDERS
ADD PRIMARY KEY (O_ORDERKEY);

COMMIT WORK;

-- For table PARTSUPP
ALTER TABLE PARTSUPP
ADD FOREIGN KEY (PS_SUPPKEY) references SUPPLIER;

COMMIT WORK;

ALTER TABLE PARTSUPP
ADD FOREIGN KEY (PS_PARTKEY) references PART;

COMMIT WORK;

-- For table ORDERS
ALTER TABLE ORDERS
ADD FOREIGN KEY (O_CUSTKEY) references CUSTOMER;

COMMIT WORK;

-- For table LINEITEM
ALTER TABLE LINEITEM
ADD FOREIGN KEY (L_ORDERKEY)  references ORDERS;

COMMIT WORK;

ALTER TABLE LINEITEM
ADD FOREIGN KEY (L_PARTKEY,L_SUPPKEY) references PARTSUPP;

COMMIT WORK;

2. 檢查結果

psql 客戶端下輸入 \d+ 返回:

tpch=> \d+
                         關聯列表
 架構模式 |   名稱   |  類型  | 擁有者 |    大小    | 描述 
----------+----------+--------+--------+------------+------
 public   | customer | 數據表 | tpch   | 28 MB      | 
 public   | lineitem | 數據表 | tpch   | 879 MB     | 
 public   | nation   | 數據表 | tpch   | 8192 bytes | 
 public   | orders   | 數據表 | tpch   | 204 MB     | 
 public   | part     | 數據表 | tpch   | 32 MB      | 
 public   | partsupp | 數據表 | tpch   | 136 MB     | 
 public   | region   | 數據表 | tpch   | 8192 bytes | 
 public   | supplier | 數據表 | tpch   | 1800 kB    | 
(8 行記錄)

3. 生成查詢

拷貝 dists.dssqueries/*.sql 到同一目錄下,運行以下代碼:

for i in {1..22}
do
    ./qgen -r 20190909 $i > query/q$i.sql # 20190909 是隨機數種子
done
相關文章
相關標籤/搜索