clickhouse基礎知識

時間 2019-11-21

標籤 clickhouse 基礎知識简体版

原文原文鏈接

Clickhouse是一個用於聯機分析處理（OLAP）的列式數據庫管理系統（columnar DBMS）。
傳統數據庫在數據大小比較小，索引大小適合內存，數據緩存命中率足夠高的情形下能正常提供服務。但殘酷的是，這種理想情形最終會隨着業務的增加走到盡頭，查詢會變得愈來愈慢。你可能經過增長更多的內存，訂購更快的磁盤等等來解決問題（縱向擴展），但這只是拖延解決本質問題。若是你的需求是解決怎樣快速查詢出結果，那麼ClickHouse也許能夠解決你的問題。正則表達式

應用場景：
1.絕大多數請求都是用於讀訪問的
2.數據須要以大批次（大於1000行）進行更新，而不是單行更新；或者根本沒有更新操做
3.數據只是添加到數據庫，沒有必要修改
4.讀取數據時，會從數據庫中提取出大量的行，但只用到一小部分列
5.表很「寬」，即表中包含大量的列
6.查詢頻率相對較低（一般每臺服務器每秒查詢數百次或更少）
7.對於簡單查詢，容許大約50毫秒的延遲
8.列的值是比較小的數值和短字符串（例如，每一個URL只有60個字節）
9.在處理單個查詢時須要高吞吐量（每臺服務器每秒高達數十億行）
10.不須要事務
11.數據一致性要求較低
12.每次查詢中只會查詢一個大表。除了一個大表，其他都是小表
13.查詢結果顯著小於數據源。即數據有過濾或聚合。返回結果不超過單個服務器內存大小數據庫

相應地，使用ClickHouse也有其自己的限制：json

1.不支持真正的刪除/更新支持不支持事務（期待後續版本支持）
2.不支持二級索引
3.有限的SQL支持，join實現不同凡響
4.不支持窗口功能
5.元數據管理須要人工干預維護數組

經常使用SQL語法

-- 列出數據庫列表
show databases;

-- 列出數據庫中表列表
show tables;

-- 建立數據庫
create database test;

-- 刪除一個表
drop table if exists test.t1;

-- 建立第一個表
create /*temporary*/ table /*if not exists*/ test.m1 (
 id UInt16
,name String
) ENGINE = Memory
;
-- 插入測試數據
insert into test.m1 (id, name) values (1, 'abc'), (2, 'bbbb');

-- 查詢
select * from test.m1;

默認值緩存

默認值的處理方面， ClickHouse 中，默認值老是有的，若是沒有顯示式指定的話，會按字段類型處理：服務器

數字類型， 0
字符串，空字符串
數組，空數組
日期， 0000-00-00
時間， 0000-00-00 00:00:00
注：NULLs 是不支持的併發

數據類型異步

1.整型：UInt8,UInt16,UInt32,UInt64,Int8,Int16,Int32,Int64
範圍U開頭-2N/2~2N-1;非U開頭0～2^N-1
2.枚舉類型：Enum8,Enum16
Enum('hello'=1,'test'=-1),Enum是有符號的整型映射的，所以負數也是能夠的
3.字符串型：FixedString(N),String
N是最大字節數，不是字符長度，若是是UTF8字符串，那麼就會佔3個字節，GBK會佔2字節;String能夠用來替換VARCHAR,BLOB,CLOB等數據類型
4.時間類型：Date
5.數組類型：Array(T)
T是一個基本類型，包括arry在內，官方不建議使用多維數組
6.元組：Tuple
7.結構：Nested(name1 Type1,name2 Type2,...)
相似一種map的結函數

物化列工具

指定 MATERIALIZED 表達式，即將一個列做爲物化列處理了，這意味着這個列的值不能從insert 語句獲取，只能是本身計算出來的。同時，
物化列也不會出如今 select * 的結果中：

drop table if exists test.m2;
create table test.m2 (
 a MATERIALIZED (b+1)
,b UInt16
) ENGINE = Memory;
insert into test.m2 (b) values (1);
select * from test.m2;
select a, b from test.m2;

表達式列

ALIAS 表達式列某方面跟物化列相同，就是它的值不能從 insert 語句獲取。不一樣的是，物化列是會真正保存數據（這樣查詢時不須要再計算），
而表達式列不會保存數據（這樣查詢時老是須要計算），只是在查詢時返回表達式的結果。

create table test.m3 (a ALIAS (b+1), b UInt16) ENGINE = Memory;
insert into test.m3(b) values (1);
select * from test.m3;
select a, b from test.m3;

引擎/engine

引擎是clickhouse設計的精華部分

TinyLog

最簡單的一種引擎，每一列保存爲一個文件，裏面的內容是壓縮過的，不支持索引
這種引擎沒有併發控制，因此，當你須要在讀，又在寫時，讀會出錯。併發寫，內容都會壞掉。

應用場景:
a. 基本上就是那種只寫一次
b. 而後就是隻讀的場景。
c. 不適用於處理量大的數據，官方推薦，使用這種引擎的表最多 100 萬行的數據

drop table if exists test.tinylog;
create table test.tinylog (a UInt16, b UInt16) ENGINE = TinyLog;
insert into test.tinylog(a,b) values (7,13);

此時/var/lib/clickhouse/data/test/tinylog保存數據的目錄結構：

├── a.bin
├── b.bin
└── sizes.json

a.bin 和 b.bin 是壓縮過的對應的列的數據， sizes.json 中記錄了每一個 *.bin 文件的大小

Log

這種引擎跟 TinyLog 基本一致
它的改進點，是加了一個 __marks.mrk 文件，裏面記錄了每一個數據塊的偏移
這樣作的一個用處，就是能夠準確地切分讀的範圍，從而使用併發讀取成爲可能
可是，它是不能支持併發寫的，一個寫操做會阻塞其它讀寫操做
Log 不支持索引，同時由於有一個 __marks.mrk 的冗餘數據，因此在寫入數據時，一旦出現問題，這個表就廢了

應用場景:
同 TinyLog 差很少，它適用的場景也是那種寫一次以後，後面就是隻讀的場景，臨時數據用它保存也能夠

drop table if exists test.log;
create table test.log (a UInt16, b UInt16) ENGINE = Log;
insert into test.log(a,b) values (7,13);

此時/var/lib/clickhouse/data/test/log保存數據的目錄結構：

├── __marks.mrk
├── a.bin
├── b.bin
└── sizes.json

Memory

內存引擎，數據以未壓縮的原始形式直接保存在內存當中，服務器重啓數據就會消失
能夠並行讀，讀寫互斥鎖的時間也很是短
不支持索引，簡單查詢下有很是很是高的性能表現

應用場景:
a. 進行測試
b. 在須要很是高的性能，同時數據量又不太大（上限大概 1 億行）的場景

Merge

一個工具引擎，自己不保存數據，只用於把指定庫中的指定多個錶鏈在一塊兒。
這樣，讀取操做能夠併發執行，同時也能夠利用原表的索引，可是，此引擎不支持寫操做
指定引擎的同時，須要指定要連接的庫及表，庫名可使用一個表達式，表名可使用正則表達式指定

create table test.tinylog1 (id UInt16, name String) ENGINE=TinyLog;
create table test.tinylog2 (id UInt16, name String) ENGINE=TinyLog;
create table test.tinylog3 (id UInt16, name String) ENGINE=TinyLog;

insert into test.tinylog1(id, name) values (1, 'tinylog1');
insert into test.tinylog2(id, name) values (2, 'tinylog2');
insert into test.tinylog3(id, name) values (3, 'tinylog3');

use test;
create table test.merge (id UInt16, name String) ENGINE=Merge(currentDatabase(), '^tinylog[0-9]+');
select _table,* from test.merge order by id desc

┌─_table───┬─id─┬─name─────┐
│ tinylog3 │ 3 │ tinylog3 │
│ tinylog2 │ 2 │ tinylog2 │
│ tinylog1 │ 1 │ tinylog1 │
└──────────┴────┴──────────┘

注：_table 這個列，是由於使用了 Merge 多出來的一個的一個虛擬列

a. 它表示原始數據的來源表，它不會出如今 show table 的結果當中
b. select * 不會包含它

Distributed

與 Merge 相似， Distributed 也是經過一個邏輯表，去訪問各個物理表，設置引擎時的樣子是：

Distributed(remote_group, database, table [, sharding_key])

其中：

remote_group /etc/clickhouse-server/config.xml中remote_servers參數
database 是各服務器中的庫名
table 是表名
sharding_key 是一個尋址表達式，能夠是一個列名，也能夠是像 rand() 之類的函數調用，它與 remote_servers 中的 weight 共同做用，決定在寫時往哪一個 shard 寫

配置文件中的 remote_servers

<remote_servers>
   <log>
       <shard>
           <weight>1</weight>
           <internal_replication>false</internal_replication>
           <replica>
               <host>172.17.0.3</host>
               <port>9000</port>
           </replica>
       </shard>
       <shard>
           <weight>2</weight>
           <internal_replication>false</internal_replication>
           <replica>
               <host>172.17.0.4</host>
               <port>9000</port>
           </replica>
       </shard>
   </log>
</remote_servers>

log 是某個 shard 組的名字，就是上面的 remote_group 的值
shard 是固定標籤
weight 是權重，前面說的 sharding_key 與這個有關。
簡單來講，上面的配置，理論上來看:
第一個 shard 「被選中」的機率是 1 / (1 + 2) ，第二個是 2 / (1 + 2) ，這很容易理解。可是， sharding_key 的工做狀況，是按實際數字的「命中區間」算的，即第一個的區間是 [0, 1) 的週期，第二個區間是 [1, 1+2) 的週期。好比把 sharding_key 設置成 id ，當 id=0 或 id=3 時，必定是寫入到第一個 shard 中，若是把 sharding_key 設置成 rand() ，那系統會對應地本身做通常化轉換吧，這種時候就是一種機率場景了。
internal_replication 是定義針對多個 replica 時的寫入行爲的。
若是爲 false ，則會往全部的 replica 中寫入數據，可是並不保證數據寫入的一致性，因此這種狀況時間一長，各 replica 的數據極可能出現差別。若是爲 true ，則只會往第一個可寫的 replica 中寫入數據（剩下的事「物理表」本身處理）。
replica 就是定義各個冗餘副本的，選項有 host ， port ， user ， password 等

看一個實際的例子，咱們先在兩臺機器上建立好物理表並插入一些測試數據：

create table test.tinylog_d1(id UInt16, name String) ENGINE=TinyLog;
insert into test.tinylog_d1(id, name) values (1, 'Distributed record 1');
insert into test.tinylog_d1(id, name) values (2, 'Distributed record 2');

在其中一臺建立邏輯表：

create table test.tinylog_d (id UInt16, name String) ENGINE=Distributed(log, test,tinylog_d1 , id);

-- 插入數據到邏輯表，觀察數據分發狀況
insert into test.tinylog_d(id, name) values (0, 'main');
insert into test.tinylog_d(id, name) values (1, 'main');
insert into test.tinylog_d(id, name) values (2, 'main');

select name,sum(id),count(id) from test.tinylog_d group by name;

注：邏輯表中的寫入操做是異步的，會先緩存在本機的文件系統上，而且，對於物理表的不可訪問狀態，並無嚴格控制，因此寫入失敗丟數據的狀況是可能發生的

Null

空引擎，寫入的任何數據都會被忽略，讀取的結果必定是空。

可是注意，雖然數據自己不會被存儲，可是結構上的和數據格式上的約束仍是跟普通表同樣是存在的，同時，你也能夠在這個引擎上建立視圖

Buffer

1.Buffer 引擎，像是Memory 存儲的一個上層應用似的（磁盤上也是沒有相應目錄的）
2.它的行爲是一個緩衝區，寫入的數據先被放在緩衝區，達到一個閾值後，這些數據會自動被寫到指定的另外一個表中
3.和Memory 同樣，有不少的限制，好比沒有索引
4.Buffer 是接在其它表前面的一層，對它的讀操做，也會自動應用到後面表，可是由於前面說到的限制的緣由，通常咱們讀數據，就直接從源表讀就行了，緩衝區的這點數據延遲，只要配置得當，影響不大的
5.Buffer 後面也能夠不接任何表，這樣的話，當數據達到閾值，就會被丟棄掉

一些特色：

若是一次寫入的數據太大或太多，超過了 max 條件，則會直接寫入源表。
刪源表或改源表的時候，建議 Buffer 表刪了重建。
「友好重啓」時， Buffer 數據會先落到源表，「暴力重啓」， Buffer 表中的數據會丟失。
即便使用了 Buffer ，屢次的小數據寫入，對比一次大數據寫入，也慢得多（幾千行與百萬行的差距）

-- 建立源表
create table test.mergetree (sdt  Date, id UInt16, name String, point UInt16) ENGINE=MergeTree(sdt, (id, name), 10);
-- 建立 Buffer表
-- Buffer(database, table, num_layers, min_time, max_time, min_rows, max_rows, min_bytes, max_bytes)
create table test.mergetree_buffer as test.mergetree ENGINE=Buffer(test, mergetree, 16, 3, 20, 2, 10, 1, 10000);

insert into test.mergetree (sdt, id, name, point) values ('2017-07-10', 1, 'a', 20);
insert into test.mergetree_buffer (sdt, id, name, point) values ('2017-07-10', 1, 'b', 10);
select * from test.mergetree;
select '------';
select * from test.mergetree_buffer;

database 數據庫
table 源表，這裏除了字符串常量，也可使用變量的。
num_layers 是相似「分區」的概念，每一個分區的後面的 min / max 是獨立計算的，官方推薦的值是 16 。
min / max 這組配置薦，就是設置閾值的，分別是時間（秒），行數，空間（字節）。

閾值的規則: 是「全部的 min 條件都知足，或至少一個 max 條件知足」。

若是按上面咱們的建表來講，全部的 min 條件就是：過了 3秒，2條數據，1 Byte。一個 max 條件是：20秒，或 10 條數據，或有 10K

Set

Set 這個引擎有點特殊，由於它只用在 IN 操做符右側，你不能對它 select

create table test.set(id UInt16, name String) ENGINE=Set;
insert into test.set(id, name) values (1, 'hello');
-- select 1 where (1, 'hello') in test.set; -- 默認UInt8 須要手動進行類型轉換
select 1 where (toUInt16(1), 'hello') in test.set;

注: Set 引擎表，是全內存運行的，可是相關數據會落到磁盤上保存，啓動時會加載到內存中。因此，意外中斷或暴力重啓，是可能產生數據丟失問題的

Join

TODO

MergeTree

這個引擎是 ClickHouse 的重頭戲，它支持一個日期和一組主鍵的兩層式索引，還能夠實時更新數據。同時，索引的粒度能夠自定義，外加直接支持採樣功能

MergeTree(EventDate, (CounterID, EventDate), 8192)
MergeTree(EventDate, intHash32(UserID), (CounterID, EventDate, intHash32(UserID)), 8192)

EventDate 一個日期的列名
intHash32(UserID) 採樣表達式
(CounterID, EventDate) 主鍵組（裏面除了列名，也支持表達式），也能夠是一個表達式
8192 主鍵索引的粒度

drop table if exists test.mergetree1;
create table test.mergetree1 (sdt  Date, id UInt16, name String, cnt UInt16) ENGINE=MergeTree(sdt, (id, name), 10);

-- 日期的格式，好像必須是 yyyy-mm-dd
insert into test.mergetree1(sdt, id, name, cnt) values ('2018-06-01', 1, 'aaa', 10);
insert into test.mergetree1(sdt, id, name, cnt) values ('2018-06-02', 4, 'bbb', 10);
insert into test.mergetree1(sdt, id, name, cnt) values ('2018-06-03', 5, 'ccc', 11);

此時/var/lib/clickhouse/data/test/mergetree1的目錄結構：

├── 20180601_20180601_1_1_0
│   ├── checksums.txt
│   ├── columns.txt
│   ├── id.bin
│   ├── id.mrk
│   ├── name.bin
│   ├── name.mrk
│   ├── cnt.bin
│   ├── cnt.mrk 
│   ├── cnt.idx
│   ├── primary.idx
│   ├── sdt.bin
│   └── sdt.mrk -- 保存一下塊偏移量
├── 20180602_20180602_2_2_0
│   └── ...
├── 20180603_20180603_3_3_0
│   └── ...
├── format_version.txt
└── detached

ReplacingMergeTree

1.在 MergeTree 的基礎上，添加了「處理重複數據」的功能=>實時數據場景
2.相比 MergeTree ,ReplacingMergeTree 在最後加一個"版本列",它跟時間列配合一塊兒，用以區分哪條數據是"新的"，並把舊的丟掉(這個過程是在 merge 時處理，不是數據寫入時就處理了的，平時重複的數據仍是保存着的，而且查也是跟日常同樣會查出來的)
3.主鍵列組用於區分重複的行

-- 版本列 容許的類型是， UInt 一族的整數，或 Date 或 DateTime
create table test.replacingmergetree (sdt  Date, id UInt16, name String, cnt UInt16) ENGINE=ReplacingMergeTree(sdt, (name), 10, cnt);

insert into test.replacingmergetree (sdt, id, name, cnt) values ('2018-06-10', 1, 'a', 20);
insert into test.replacingmergetree (sdt, id, name, cnt) values ('2018-06-10', 1, 'a', 30);
insert into test.replacingmergetree (sdt, id, name, cnt) values ('2018-06-11', 1, 'a', 20);
insert into test.replacingmergetree (sdt, id, name, cnt) values ('2018-06-11', 1, 'a', 30);
insert into test.replacingmergetree (sdt, id, name, cnt) values ('2018-06-11', 1, 'a', 10);

select * from test.replacingmergetree;

-- 若是記錄未執行merge，能夠手動觸發一下 merge 行爲
optimize table test.replacingmergetree;

┌────────sdt─┬─id─┬─name─┬─cnt─┐
│ 2018-06-11 │ 1 │ a │ 30 │
└────────────┴────┴──────┴─────┘

SummingMergeTree

1.SummingMergeTree 就是在 merge 階段把數據sum求和
2.sum求和的列能夠指定，不可加的未指定列，會取一個最早出現的值

create table test.summingmergetree (sdt Date, name String, a UInt16, b UInt16) ENGINE=SummingMergeTree(sdt, (sdt, name), 8192, (a));

insert into test.summingmergetree (sdt, name, a, b) values ('2018-06-10', 'a', 1, 20);
insert into test.summingmergetree (sdt, name, a, b) values ('2018-06-10', 'b', 2, 11);
insert into test.summingmergetree (sdt, name, a, b) values ('2018-06-11', 'b', 3, 18);
insert into test.summingmergetree (sdt, name, a, b) values ('2018-06-11', 'b', 3, 82);
insert into test.summingmergetree (sdt, name, a, b) values ('2018-06-11', 'a', 3, 11);
insert into test.summingmergetree (sdt, name, a, b) values ('2018-06-12', 'c', 1, 35);

-- 手動觸發一下 merge 行爲
optimize table test.summingmergetree;

select * from test.summingmergetree;

┌────────sdt─┬─name─┬─a─┬──b─┐
│ 2018-06-10 │ a │ 1 │ 20 │
│ 2018-06-10 │ b │ 2 │ 11 │
│ 2018-06-11 │ a │ 3 │ 11 │
│ 2018-06-11 │ b │ 6 │ 18 │
│ 2018-06-12 │ c │ 1 │ 35 │
└────────────┴──────┴───┴────┘
注: 可加列不能是主鍵中的列，而且若是某行數據可加列都是 null ，則這行會被刪除

AggregatingMergeTree

AggregatingMergeTree 是在 MergeTree 基礎之上，針對聚合函數結果，做增量計算優化的一個設計，它會在 merge 時，針對主鍵預處理聚合的數據
應用於AggregatingMergeTree 上的聚合函數除了普通的 sum, uniq等，還有 sumState , uniqState ，及 sumMerge ， uniqMerge 這兩組

1.聚合數據的預計算
是一種「空間換時間」的權衡，而且是以減小維度爲代價的

dim1	dim2	dim3	measure1
aaaa	a	1	1
aaaa	b	2	1
bbbb	b	3	1
cccc	b	2	1
cccc	c	1	1
dddd	c	2	1
dddd	a	1	1

假設原始有三個維度，一個須要 count 的指標

dim1	dim2	dim3	measure1
aaaa	a	1	1
aaaa	b	2	1
bbbb	b	3	1
cccc	b	2	1
cccc	c	1	1
dddd	c	2	1
dddd	a	1	1

經過減小一個維度的方式，來以 count 函數聚合一次 M

dim2	dim3	count(measure1)
a	1	3
b	2	2
b	3	1
c	1	1
c	2	1

2.聚合數據的增量計算

對於 AggregatingMergeTree 引擎的表，不能使用普通的 INSERT 去添加數據，能夠用：
a. INSERT SELECT 來插入數據
b. 更經常使用的，是能夠建立一個物化視圖

drop table if exists test.aggregatingmergetree;
create table test.aggregatingmergetree(
sdt Date
, dim1 String
, dim2 String
, dim3 String
, measure1 UInt64
) ENGINE=MergeTree(sdt, (sdt, dim1, dim2, dim3), 8192);

-- 建立一個物化視圖，使用 AggregatingMergeTree
drop table if exists test.aggregatingmergetree_view;
create materialized view test.aggregatingmergetree_view
ENGINE = AggregatingMergeTree(sdt,(dim2, dim3), 8192)
as
select sdt,dim2, dim3, uniqState(dim1) as uv
from test.aggregatingmergetree
group by sdt,dim2, dim3;

insert into test.aggregatingmergetree (sdt, dim1, dim2, dim3, measure1) values ('2018-06-10', 'aaaa', 'a', '10', 1);
insert into test.aggregatingmergetree (sdt, dim1, dim2, dim3, measure1) values ('2018-06-10', 'aaaa', 'a', '10', 1);
insert into test.aggregatingmergetree (sdt, dim1, dim2, dim3, measure1) values ('2018-06-10', 'aaaa', 'b', '20', 1);
insert into test.aggregatingmergetree (sdt, dim1, dim2, dim3, measure1) values ('2018-06-10', 'bbbb', 'b', '30', 1);
insert into test.aggregatingmergetree (sdt, dim1, dim2, dim3, measure1) values ('2018-06-10', 'cccc', 'b', '20', 1);
insert into test.aggregatingmergetree (sdt, dim1, dim2, dim3, measure1) values ('2018-06-10', 'cccc', 'c', '10', 1);
insert into test.aggregatingmergetree (sdt, dim1, dim2, dim3, measure1) values ('2018-06-10', 'dddd', 'c', '20', 1);
insert into test.aggregatingmergetree (sdt, dim1, dim2, dim3, measure1) values ('2018-06-10', 'dddd', 'a', '10', 1);

-- 按 dim2 和 dim3 聚合 count(measure1)
select dim2, dim3, count(measure1) from test.aggregatingmergetree group by dim2, dim3;

-- 按 dim2 聚合 UV
select dim2, uniq(dim1) from test.aggregatingmergetree group by dim2;

-- 手動觸發merge
OPTIMIZE TABLE test.aggregatingmergetree_view;
select * from test.aggregatingmergetree_view;

-- 查 dim2 的 uv
select dim2, uniqMerge(uv) from test.aggregatingmergetree_view group by dim2 order by dim2;

CollapsingMergeTree

是專門爲 OLAP 場景下，一種「變通」存數作法而設計的，在數據是不能改，更不能刪的前提下，經過「運算」的方式，去抹掉舊數據的影響，把舊數據「減」去便可，從而解決"最終狀態"類的問題，好比 當前有多少人在線？

「以加代刪」的增量存儲方式，帶來了聚合計算方便的好處，代價倒是存儲空間的翻倍，而且，對於只關心最新狀態的場景，中間數據都是無用的

CollapsingMergeTree 在建立時與 MergeTree 基本同樣，除了最後多了一個參數，須要指定 Sign 位（必須是 Int8 類型）

create table test.collapsingmergetree(sign Int8, sdt Date, name String, cnt UInt16) ENGINE=CollapsingMergeTree(sdt, (sdt, name), 8192, sign);

做者：darebeat 連接：https://www.jianshu.com/p/a5bf490247ea 來源：簡書簡書著做權歸做者全部，任何形式的轉載都請聯繫做者得到受權並註明出處。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。