clickhouse 在數據分析技術領域早已聲名遠揚,若是還不知道能夠 點這裏 瞭解下。sql
最近因爲項目需求使用到了 clickhouse 作分析數據庫,因而用測試環境作了一個單表 6 億數據量的性能測試,記錄一下測試結果,有作超大數據量分析技術選型需求的朋友能夠參考下。數據庫
測試數據和測試方法來自 clickshouse 官方的 Star Schema Benchmark服務器
按照官方指導造出了測試數據以後,先看一下數據量和空間佔用狀況。函數
表名 | 列數 | 數據行數 | 原始大小 | 壓縮大小 | 壓縮率 |
---|---|---|---|---|---|
supplier | 6 | 200,000 | 11.07 MiB | 7.53 MiB | 68 |
customer | 7 | 3,000,000 | 168.83 MiB | 114.72 MiB | 68 |
part | 8 | 1,400,000 | 34.29 MiB | 24.08 MiB | 70 |
lineorder | 16 | 600,037,902 | 24.03 GiB | 16.67 GiB | 69 |
lineorder_flat | 37 | 688,552,212 | 111.38 GiB | 61.05 GiB | 55 |
能夠看到 clickhouse 的壓縮率很高,壓縮率都在 50 以上,基本能夠達到 70 左右。數據體積的減少能夠很是有效的減小磁盤空間佔用、提升 I/O 性能,這對總體查詢性能的提高很是有效。性能
supplier、customer、part、lineorder 爲一個簡單的「供應商-客戶-訂單-地區」的星型模型,lineorder_flat 爲根據這個星型模型數據關係合併的大寬表,全部分析都直接在這張大寬表中執行,減小沒必要要的表關聯,符合咱們實際工做中的分析建表邏輯。測試
如下性能測試的全部分析 SQL 都在這張大寬表中運行,未進行表關聯查詢。大數據
SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE (toYear(LO_ORDERDATE) = 1993) AND ((LO_DISCOUNT >= 1) AND (LO_DISCOUNT <= 3)) AND (LO_QUANTITY < 25) ┌────────revenue─┐ │ 44652567249651 │ └────────────────┘ 1 rows in set. Elapsed: 0.242 sec. Processed 91.01 million rows, 728.06 MB (375.91 million rows/s., 3.01 GB/s.)
掃描行數:91,010,000 大約9100萬code
耗時(秒):0.242排序
查詢列數:2內存
結果行數:1
SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE (toYYYYMM(LO_ORDERDATE) = 199401) AND ((LO_DISCOUNT >= 4) AND (LO_DISCOUNT <= 6)) AND ((LO_QUANTITY >= 26) AND (LO_QUANTITY <= 35)) ┌───────revenue─┐ │ 9624332170119 │ └───────────────┘ 1 rows in set. Elapsed: 0.040 sec. Processed 7.75 million rows, 61.96 MB (191.44 million rows/s., 1.53 GB/s.)
掃描行數:7,750,000 775萬
耗時(秒):0.040
查詢列數:2
返回行數:1
SELECT sum(LO_REVENUE), toYear(LO_ORDERDATE) AS year, P_BRAND FROM lineorder_flat WHERE (P_CATEGORY = 'MFGR#12') AND (S_REGION = 'AMERICA') GROUP BY year, P_BRAND ORDER BY year ASC, P_BRAND ASC ┌─sum(LO_REVENUE)─┬─year─┬─P_BRAND───┐ │ 64420005618 │ 1992 │ MFGR#121 │ │ 63389346096 │ 1992 │ MFGR#1210 │ │ ........... │ .... │ ..........│ │ 39679892915 │ 1998 │ MFGR#128 │ │ 35300513083 │ 1998 │ MFGR#129 │ └─────────────────┴──────┴───────────┘ 280 rows in set. Elapsed: 8.558 sec. Processed 600.04 million rows, 6.20 GB (70.11 million rows/s., 725.04 MB/s.)
掃描行數:600,040,000 大約6億
耗時(秒):8.558
查詢列數:3
結果行數:280
SELECT sum(LO_REVENUE), toYear(LO_ORDERDATE) AS year, P_BRAND FROM lineorder_flat WHERE ((P_BRAND >= 'MFGR#2221') AND (P_BRAND <= 'MFGR#2228')) AND (S_REGION = 'ASIA') GROUP BY year, P_BRAND ORDER BY year ASC, P_BRAND ASC ┌─sum(LO_REVENUE)─┬─year─┬─P_BRAND───┐ │ 66450349438 │ 1992 │ MFGR#2221 │ │ 65423264312 │ 1992 │ MFGR#2222 │ │ ........... │ .... │ ......... │ │ 39907545239 │ 1998 │ MFGR#2227 │ │ 40654201840 │ 1998 │ MFGR#2228 │ └─────────────────┴──────┴───────────┘ 56 rows in set. Elapsed: 1.242 sec. Processed 600.04 million rows, 5.60 GB (482.97 million rows/s., 4.51 GB/s.)
掃描行數:600,040,000 大約6億
耗時(秒):1.242
查詢列數:3
結果行數:56
SELECT C_NATION, S_NATION, toYear(LO_ORDERDATE) AS year, sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE (C_REGION = 'ASIA') AND (S_REGION = 'ASIA') AND (year >= 1992) AND (year <= 1997) GROUP BY C_NATION, S_NATION, year ORDER BY year ASC, revenue DESC ┌─C_NATION──┬─S_NATION──┬─year─┬──────revenue─┐ │ INDIA │ INDIA │ 1992 │ 537778456208 │ │ INDONESIA │ INDIA │ 1992 │ 536684093041 │ │ ..... │ ....... │ .... │ ............ │ │ CHINA │ CHINA │ 1997 │ 525562838002 │ │ JAPAN │ VIETNAM │ 1997 │ 525495763677 │ └───────────┴───────────┴──────┴──────────────┘ 150 rows in set. Elapsed: 3.533 sec. Processed 546.67 million rows, 5.48 GB (154.72 million rows/s., 1.55 GB/s.)
掃描行數:546,670,000 大約5億4千多萬
耗時(秒):3.533
查詢列數:4
結果行數:150
SELECT C_CITY, S_CITY, toYear(LO_ORDERDATE) AS year, sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE (C_NATION = 'UNITED STATES') AND (S_NATION = 'UNITED STATES') AND (year >= 1992) AND (year <= 1997) GROUP BY C_CITY, S_CITY, year ORDER BY year ASC, revenue DESC ┌─C_CITY─────┬─S_CITY─────┬─year─┬────revenue─┐ │ UNITED ST6 │ UNITED ST6 │ 1992 │ 5694246807 │ │ UNITED ST0 │ UNITED ST0 │ 1992 │ 5676049026 │ │ .......... │ .......... │ .... │ .......... │ │ UNITED ST9 │ UNITED ST9 │ 1997 │ 4836163349 │ │ UNITED ST9 │ UNITED ST5 │ 1997 │ 4769919410 │ └────────────┴────────────┴──────┴────────────┘ 600 rows in set. Elapsed: 1.000 sec. Processed 546.67 million rows, 5.56 GB (546.59 million rows/s., 5.56 GB/s.)
掃描行數:546,670,000 大約5億4千多萬
耗時(秒):1.00
查詢列數:4
結果行數:600
SELECT toYear(LO_ORDERDATE) AS year, C_NATION, sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE (C_REGION = 'AMERICA') AND (S_REGION = 'AMERICA') AND ((P_MFGR = 'MFGR#1') OR (P_MFGR = 'MFGR#2')) GROUP BY year, C_NATION ORDER BY year ASC, C_NATION ASC ┌─year─┬─C_NATION──────┬────────profit─┐ │ 1992 │ ARGENTINA │ 1041983042066 │ │ 1992 │ BRAZIL │ 1031193572794 │ │ .... │ ...... │ ............ │ │ 1998 │ PERU │ 603980044827 │ │ 1998 │ UNITED STATES │ 605069471323 │ └──────┴───────────────┴───────────────┘ 35 rows in set. Elapsed: 5.066 sec. Processed 600.04 million rows, 8.41 GB (118.43 million rows/s., 1.66 GB/s.)
掃描行數:600,040,000 大約6億
耗時(秒):5.066
查詢列數:4
結果行數:35
SELECT toYear(LO_ORDERDATE) AS year, S_NATION, P_CATEGORY, sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE (C_REGION = 'AMERICA') AND (S_REGION = 'AMERICA') AND ((year = 1997) OR (year = 1998)) AND ((P_MFGR = 'MFGR#1') OR (P_MFGR = 'MFGR#2')) GROUP BY year, S_NATION, P_CATEGORY ORDER BY year ASC, S_NATION ASC, P_CATEGORY ASC ┌─year─┬─S_NATION──────┬─P_CATEGORY─┬───────profit─┐ │ 1997 │ ARGENTINA │ MFGR#11 │ 102369950215 │ │ 1997 │ ARGENTINA │ MFGR#12 │ 103052774082 │ │ .... │ ......... │ ....... │ ............ │ │ 1998 │ UNITED STATES │ MFGR#24 │ 60779388345 │ │ 1998 │ UNITED STATES │ MFGR#25 │ 60042710566 │ └──────┴───────────────┴────────────┴──────────────┘ 100 rows in set. Elapsed: 0.826 sec. Processed 144.42 million rows, 2.17 GB (174.78 million rows/s., 2.63 GB/s.)
掃描行數:144,420,000 大約1億4千多萬
耗時(秒):0.826
查詢列數:4
結果行數:100
查詢語句 | SQL簡要說明 | 掃描行數 | 返回行數 | 查詢列數 | 耗時(秒) |
---|---|---|---|---|---|
Q1.1 | 乘積、彙總、4個條件、首次運行 | 91,010,000 | 1 | 2 | 0.242 |
Q1.2 | Q1.1增長1個條件運行 | 7,750,000 | 1 | 2 | 0.040 |
Q2.1 | 彙總、函數、2列分組、2列排序、首次運行 | 600,040,000 | 280 | 3 | 8.558 |
Q2.2 | Q2.1增長1個條件運行 | 600,040,000 | 56 | 3 | 1.242 |
Q3.1 | 彙總、函數、3列分組、2列排序、首次運行 | 546,670,000 | 150 | 4 | 3.533 |
Q3.2 | Q3.1更換條件運行 | 546,670,000 | 600 | 4 | 1 |
Q4.1 | 相減、彙總、函數、2列分組、2列排序、首次運行 | 600,040,000 | 35 | 4 | 5.006 |
Q4.2 | Q4.1增長2個條件運行 | 144,420,000 | 100 | 4 | 0.826 |
在當前軟硬件環境下,掃描 6 億多行數據,常見的分析語句首次運行最慢在 8 秒左右能返回結果,相同的分析邏輯更換條件再次查詢的時候效率有明顯的提高,能夠縮短到 1 秒左右,若是隻是簡單的列查詢沒有加減乘除、聚合等邏輯,掃描全表 6 億多行數據首次查詢基本能夠在 2 秒內執行完成。