Clickhouse 入門

時間 2020-11-30

標籤 php html java mysql linux sql 數據庫 apache api 性能優化欄目 PHP 简体版

原文原文鏈接

clickhouse 簡介
ck是一個列式存儲的數據庫，其針對的場景是OLAP。OLAP的特色是：php

數據不常常寫，即使寫也是批量寫。不像OLTP是一條一條寫
大多數是讀請求
查詢併發較少，不適合放置先生高併發業務場景使用 , CK自己建議最大一秒100個併發查詢。
不要求事務

click的優勢

爲了加強壓縮比例，ck存儲的一列長度固，因而存儲的時候，不用在存儲該列的長度信息html

使用向量引擎 , vector engine ，什麼是向量引擎？
https://www.infoq.cn/article/columnar-databases-and-vectorization/?itm_source=infoq_en&itm_medium=link_on_en_item&itm_campaign=item_in_other_langsjava

clickhouse的缺點

不能完整支持事務
不能很高吞吐量的修改或刪除數據
因爲索引的稀疏性，不適合基於key來查詢單個記錄

性能優化

爲了提升插入性能，最好批量插入，最少批次是1000行記錄。且使用併發插入能顯著提升插入速度。mysql

訪問接口

ck像es同樣暴露兩個端口，一個tcp的，一個http的。tcp默認端口：9000 ,http默認端口：8123。通常咱們並不直接經過這些端口與ck交互，而是使用一些客戶端，這些客戶端能夠是：linux

Command-line Client 經過它能夠連接ck,而後進行基本的crud操做，還能夠導入數據到ck 。它使用tcp端口連接ck
http interface : 能像es同樣，經過rest方式，按照ck本身的語法，提交crud
jdbc driver
odbc driver

輸入輸出格式

ck可以讀寫多種格式作爲輸入(即insert)，也能在輸出時(即select )吐出指定的格式。sql

好比插入數據時，指定數據源的格式爲JSONEachRow數據庫

INSERT INTO UserActivity FORMAT JSONEachRow {"PageViews":5, "UserID":"4324182021466249494", "Duration":146,"Sign":-1} {"UserID":"4324182021466249494","PageViews":6,"Duration":185,"Sign":1}

讀取數據時，指定格式爲JSONEachRowapache

SELECT * FROM UserActivity FORMAT JSONEachRow

值得注意的時指定這些格式應該是ck解析或生成的格式，並非ck最終的的存儲格式，ck應該仍是按本身的列式格式進行存儲。ck支持多種格式，具體看文檔
https://clickhouse.yandex/docs/en/interfaces/formats/#nativeapi

數據庫引擎

ck支持在其中ck中建立一個數據庫，但數據庫的實際存儲是Mysql，這樣就能夠經過ck對該庫中表的數據進行crud, 有點像hive中的外表，只是這裏外掛的是整個數據庫。性能優化

假設mysql中有如下數據

mysql> USE test;
Database changed

mysql> CREATE TABLE `mysql_table` (
    ->   `int_id` INT NOT NULL AUTO_INCREMENT,
    ->   `float` FLOAT NOT NULL,
    ->   PRIMARY KEY (`int_id`));
Query OK, 0 rows affected (0,09 sec)

mysql> insert into mysql_table (`int_id`, `float`) VALUES (1,2);
Query OK, 1 row affected (0,00 sec)

mysql> select * from mysql_table;
+--------+-------+
| int_id | value |
+--------+-------+
|      1 |     2 |
+--------+-------+
1 row in set (0,00 sec)

在ck中建立數據庫，連接上述mysql

CREATE DATABASE mysql_db ENGINE = MySQL('localhost:3306', 'test', 'my_user', 'user_password')

而後就能夠在ck中，對mysql庫進行一系列操做

表引擎(table engine)—MergeTree 家族

表引擎定義一個表建立是時候，使用什麼引擎進行存儲。表引擎控制以下事項

數據如何讀寫以及，以及存儲位置
支持的查詢能力
數據併發訪問能力
數據的replica特徵

MergeTree 引擎

建表時，指定table engine相關配置

CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
    name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1] [TTL expr1],
    name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2] [TTL expr2],
    ...
    INDEX index_name1 expr1 TYPE type1(...) GRANULARITY value1,
    INDEX index_name2 expr2 TYPE type2(...) GRANULARITY value2
) ENGINE = MergeTree()
[PARTITION BY expr]
[ORDER BY expr]
[PRIMARY KEY expr]
[SAMPLE BY expr]
[TTL expr]
[SETTINGS name=value, ...]

該引擎會數據進行分區存儲。
數據插入時，不一樣分區的數據，會分爲不一樣的數據段(data part), ck後臺再對這些data part作合併，不一樣的分區的data part不會合到一塊兒
一個data part 由有許多不可分割的最小granule組成

部分配置舉例

ENGINE MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate, intHash32(UserID)) SAMPLE BY intHash32(UserID) SETTINGS index_granularity=8192

granule

gruanule是按主鍵排序後，緊鄰在一塊兒，不可再分割的數據集。每一個granule 的第一行數據的主鍵做爲這個數據做爲這個數據集的mark 。好比這裏的主鍵是(CounterID, Date)。第一個granule排序的第一列數據，其主鍵爲a,1 ,能夠看到多一個gruanle中的多行數據，其主鍵能夠相同。

同時爲了方便索引，ck會對每一個granule指定一個mark number, 方便實際使用的（經過編號，總比經過實際的主鍵值要好使用一點）。

這種索引結構很是像跳錶。也稱爲稀疏索引，由於它不是對每一行數據作索引，而是以排序後的數據範圍作索引。

查詢舉例，若是咱們想查詢CounterID in ('a', 'h')，ck服務器基於上述結構，實際讀取的數據範圍爲[0, 3) and [6, 8)

能夠在建表時，經過index_granularity指定，兩個mark之間存儲的行記錄數，也即granule的大小(由於兩個mark間就是一個granule)

TTL

能夠對錶和字段進行過時設置

MergeTree 總結

MergeTree 至關於MergeTree家族表引擎的超類。它定義整個MergeTree家族的數據文件存儲的特徵。即

有數據合併
有稀疏索引，像跳錶同樣的數據結構，來存儲數據集。
能夠指定數據分區

而在此數據基礎上，衍生出了一些列增對不一樣應用場景的子MergeTree。他們分別是

ReplacingMergeTree 自動移除primary key相同的數據
SummingMergeTree　可以將相同主鍵的，數字類型字段進行sum,　最後存爲一行，這至關於預聚合，它能減小存儲空間，提高查詢性能
AggregatingMergeTree　可以將同一主鍵的數據，按必定規則聚合，減小數據存儲，提升聚合查詢的性能，至關於預聚合。
CollapsingMergeTree　將大多數列內容都相同，可是部分列值不一樣，可是數據是成對的行合併，好比列的值是1和-1

ReplicatedMergeTree　引擎

ck中建立的表，默認都是沒有replicate的，爲了提升可用性，須要引入replicate。ck的引入方式是經過集成zookeeper實現數據的replicate副本。

正對上述的各類預聚合引擎，也有對應的ReplicatedMergeTree 引擎進行支持

ReplicatedMergeTree
ReplicatedSummingMergeTree
ReplicatedReplacingMergeTree
ReplicatedAggregatingMergeTree
ReplicatedCollapsingMergeTree
ReplicatedVersionedCollapsingMergeTree
ReplicatedGraphiteMergeTree

表引擎(table engine)— Log Engine 家族

該系列表引擎正對的是那種會持續產生須要小表，而且各個表數據量都不大的日誌場景。這些引擎的特色是：

數據存儲在磁盤上
以apeend方式新增數據
寫是加鎖，讀需等待，也即查詢性能不高

表引擎(table engine)— 外部數據源

ck建表時，還支持許多外部數據源引擎，他們應該是像hive　外表同樣，只是創建了一個表形態的連接，實際存儲仍是源數據源。(這個有待確認)

這些外部數據源表引擎有：

Kafka
MySQL
JDBC
ODBC
HDFS

Sql語法

sample 語句

在建表的時候，能夠指定基於某個列的散列值作sample (之因此hash散列，是爲了保證抽樣的均勻和隨機).這樣咱們在查詢的時候，能夠不用對全表數據作處理，而是基於sample抽樣一部分數據，進行結構計算就像。好比全表有100我的，若是要計算這一百我的的總成績，可使用sample取十我的，將其成績求和後，乘以10。sample適用於那些不須要精確計算，而且對計算耗時很是敏感的業務場景。

安裝事宜

一些tips

生產環境關掉swap file

Disable the swap file for production environments.

記錄集羣運行狀況的一些表

system.metrics, system.events, and system.asynchronous_metrics tables.

安裝環境配置

cpu頻率控制

Linux系統，會根據任務的負荷對cpu進行降頻或升頻，這些調度升降過程會影響到ck的性能，使用如下配置，將cpu的頻率開到最大

echo 'performance' | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

linux系統頻率可能的配置以下：

運行超額分配內存

基於swap 磁盤機制，Linux系統能夠支持應用系統對超過物理內存實際大小的，內存申請，基本原理是將一部分的不用的數據，swap到硬盤，騰出空間給正在用的數據，這樣對上層應用來看，彷彿擁有了很大的內存量，這種容許超額申請內存的行爲叫：Overcommiting Memory

控制Overcommiting Memory行爲的有三個數值

0: The Linux kernel is free to overcommit memory (this is the default), a heuristic algorithm is applied to figure out if enough memory is available.
1: The Linux kernel will always overcommit memory, and never check if enough memory is available. This increases the risk of out-of-memory situations, but also improves memory-intensive workloads.
2: The Linux kernel will not overcommit memory, and only allocate as much memory as defined in overcommit_ratio.

ck須要儘量多的內存，因此須要開啓超額申請的功能，修改配置以下

echo 0 | sudo tee /proc/sys/vm/overcommit_memory

關閉透明內存

Huge Pages 操做系統爲了提速處理，將部分應用內存頁放到了處理器中，這個頁叫hug pages。而爲了透明化這一過程，linux啓用了khugepaged內核線程來專門負責此事，這種透明自動化的方式叫： transparent hugepages 。但自動化的方式會帶來內存泄露的風險，具體緣由看參考連接。

因此CK安裝指望關閉該選項：

echo 'never' | sudo tee /sys/kernel/mm/transparent_hugepage/enabled

儘可能用大的網絡帶寬

若是是ipv6的話，須要增大 route cache

不要將zk和ck裝在一塊兒

ck會盡量的多佔用資源來保證性能，因此若是跟zk裝在一塊兒，ck會影響zk,使其吞吐量降低，延遲增高

開啓zk日誌清理功能

zk默認不會刪除過時的snapshot和log文件，日積月累將是個定時炸彈，因此須要修改zk配置，啓用autopurge功能，yandex的配置以下:

zk配置zoo.cfg

# http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=30000
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=10

maxClientCnxns=2000

maxSessionTimeout=60000000
# the directory where the snapshot is stored.
dataDir=/opt/zookeeper/{{ cluster['name'] }}/data
# Place the dataLogDir to a separate physical disc for better performance
dataLogDir=/opt/zookeeper/{{ cluster['name'] }}/logs

autopurge.snapRetainCount=10
autopurge.purgeInterval=1


# To avoid seeks ZooKeeper allocates space in the transaction log file in
# blocks of preAllocSize kilobytes. The default block size is 64M. One reason
# for changing the size of the blocks is to reduce the block size if snapshots
# are taken more often. (Also, see snapCount).
preAllocSize=131072

# Clients can submit requests faster than ZooKeeper can process them,
# especially if there are a lot of clients. To prevent ZooKeeper from running
# out of memory due to queued requests, ZooKeeper will throttle clients so that
# there is no more than globalOutstandingLimit outstanding requests in the
# system. The default limit is 1,000.ZooKeeper logs transactions to a
# transaction log. After snapCount transactions are written to a log file a
# snapshot is started and a new transaction log file is started. The default
# snapCount is 10,000.
snapCount=3000000

# If this option is defined, requests will be will logged to a trace file named
# traceFile.year.month.day.
#traceFile=

# Leader accepts client connections. Default value is "yes". The leader machine
# coordinates updates. For higher update throughput at thes slight expense of
# read throughput the leader can be configured to not accept clients and focus
# on coordination.
leaderServes=yes

standaloneEnabled=false
dynamicConfigFile=/etc/zookeeper-{{ cluster['name'] }}/conf/zoo.cfg.dynamic

對應的jvm參數

NAME=zookeeper-{{ cluster['name'] }}
ZOOCFGDIR=/etc/$NAME/conf

# TODO this is really ugly
# How to find out, which jars are needed?
# seems, that log4j requires the log4j.properties file to be in the classpath
CLASSPATH="$ZOOCFGDIR:/usr/build/classes:/usr/build/lib/*.jar:/usr/share/zookeeper/zookeeper-3.5.1-metrika.jar:/usr/share/zookeeper/slf4j-log4j12-1.7.5.jar:/usr/share/zookeeper/slf4j-api-1.7.5.jar:/usr/share/zookeeper/servlet-api-2.5-20081211.jar:/usr/share/zookeeper/netty-3.7.0.Final.jar:/usr/share/zookeeper/log4j-1.2.16.jar:/usr/share/zookeeper/jline-2.11.jar:/usr/share/zookeeper/jetty-util-6.1.26.jar:/usr/share/zookeeper/jetty-6.1.26.jar:/usr/share/zookeeper/javacc.jar:/usr/share/zookeeper/jackson-mapper-asl-1.9.11.jar:/usr/share/zookeeper/jackson-core-asl-1.9.11.jar:/usr/share/zookeeper/commons-cli-1.2.jar:/usr/src/java/lib/*.jar:/usr/etc/zookeeper"

ZOOCFG="$ZOOCFGDIR/zoo.cfg"
ZOO_LOG_DIR=/var/log/$NAME
USER=zookeeper
GROUP=zookeeper
PIDDIR=/var/run/$NAME
PIDFILE=$PIDDIR/$NAME.pid
SCRIPTNAME=/etc/init.d/$NAME
JAVA=/usr/bin/java
ZOOMAIN="org.apache.zookeeper.server.quorum.QuorumPeerMain"
ZOO_LOG4J_PROP="INFO,ROLLINGFILE"
JMXLOCALONLY=false
JAVA_OPTS="-Xms{{ cluster.get('xms','128M') }} \
    -Xmx{{ cluster.get('xmx','1G') }} \
    -Xloggc:/var/log/$NAME/zookeeper-gc.log \
    -XX:+UseGCLogFileRotation \
    -XX:NumberOfGCLogFiles=16 \
    -XX:GCLogFileSize=16M \
    -verbose:gc \
    -XX:+PrintGCTimeStamps \
    -XX:+PrintGCDateStamps \
    -XX:+PrintGCDetails
    -XX:+PrintTenuringDistribution \
    -XX:+PrintGCApplicationStoppedTime \
    -XX:+PrintGCApplicationConcurrentTime \
    -XX:+PrintSafepointStatistics \
    -XX:+UseParNewGC \
    -XX:+UseConcMarkSweepGC \
-XX:+CMSParallelRemarkEnabled"

數據備份

數據除了存儲在ck以外，能夠在hdfs中保留一份，以防止ck數據丟失後，沒法恢復。

配置文件

ck的默認配置文件爲/etc/clickhouse-server/config.xml，你能夠在其中指定全部的服務器配置。

固然你能夠將各類不一樣的配置分開，好比user的配置，和quota的配置，單獨放一個文件，其他文件放置的路徑爲

/etc/clickhouse-server/config.d

ck最終會將全部的配置合在一塊兒生成一個完整的配置file-preprocessed.xml

各個分開的配置，能夠覆蓋或刪除主配置中的相同配置，使用replace或remove屬性就行，好比

<query_masking_rules>
    <rule>
        <name>hide SSN</name>
        <regexp>\b\d{3}-\d{2}-\d{4}\b</regexp>
        <replace>000-00-0000</replace>
    </rule>
</query_masking_rules>

同時ck還可使用zk作爲本身的配置源，即最終配置文件的生成，會使用zk中的配置。

默認狀況下：
users, access rights, profiles of settings, quotas這些設置都在users.xml

一些最佳實踐

一些最佳配置實踐：
1.寫入時，不要使用distribution 表，怕出現數據不一致
2.設置background_pool_size ，提高Merge的速度，由於merge線程就是使用這個線程池
3.設置max_memory_usage和max_memory_usage_for_all_queries，限制ck使用物理內存的大小，由於使用內存過大，操做系統會將ck進程殺死
4.設置max_bytes_before_external_sort和max_bytes_before_external_group_by，來使得聚合的sort和group在須要大內存且內存超過上述限制時，不至於失敗，能夠轉而使用硬盤進行處理

clickhouse 簡介

ck是一個列式存儲的數據庫，其針對的場景是OLAP。OLAP的特色是：

數據不常常寫，即使寫也是批量寫。不像OLTP是一條一條寫
大多數是讀請求
查詢併發較少，不適合放置先生高併發業務場景使用 , CK自己建議最大一秒100個併發查詢。
不要求事務

click的優勢

爲了加強壓縮比例，ck存儲的一列長度固，因而存儲的時候，不用在存儲該列的長度信息

clickhouse的缺點

不能完整支持事務
不能很高吞吐量的修改或刪除數據
因爲索引的稀疏性，不適合基於key來查詢單個記錄

性能優化

爲了提升插入性能，最好批量插入，最少批次是1000行記錄。且使用併發插入能顯著提升插入速度。

訪問接口

ck像es同樣暴露兩個端口，一個tcp的，一個http的。tcp默認端口：9000 ,http默認端口：8123。通常咱們並不直接經過這些端口與ck交互，而是使用一些客戶端，這些客戶端能夠是：

Command-line Client 經過它能夠連接ck,而後進行基本的crud操做，還能夠導入數據到ck 。它使用tcp端口連接ck
http interface : 能像es同樣，經過rest方式，按照ck本身的語法，提交crud
jdbc driver
odbc driver

輸入輸出格式

ck可以讀寫多種格式作爲輸入(即insert)，也能在輸出時(即select )吐出指定的格式。

好比插入數據時，指定數據源的格式爲JSONEachRow

INSERT INTO UserActivity FORMAT JSONEachRow {"PageViews":5, "UserID":"4324182021466249494", "Duration":146,"Sign":-1} {"UserID":"4324182021466249494","PageViews":6,"Duration":185,"Sign":1}

讀取數據時，指定格式爲JSONEachRow

SELECT * FROM UserActivity FORMAT JSONEachRow

值得注意的時指定這些格式應該是ck解析或生成的格式，並非ck最終的的存儲格式，ck應該仍是按本身的列式格式進行存儲。ck支持多種格式，具體看文檔
https://clickhouse.yandex/docs/en/interfaces/formats/#native

數據庫引擎

ck支持在其中ck中建立一個數據庫，但數據庫的實際存儲是Mysql，這樣就能夠經過ck對該庫中表的數據進行crud, 有點像hive中的外表，只是這裏外掛的是整個數據庫。

假設mysql中有如下數據

mysql> USE test;
Database changed

mysql> CREATE TABLE `mysql_table` (
    ->   `int_id` INT NOT NULL AUTO_INCREMENT,
    ->   `float` FLOAT NOT NULL,
    ->   PRIMARY KEY (`int_id`));
Query OK, 0 rows affected (0,09 sec)

mysql> insert into mysql_table (`int_id`, `float`) VALUES (1,2);
Query OK, 1 row affected (0,00 sec)

mysql> select * from mysql_table;
+--------+-------+
| int_id | value |
+--------+-------+
|      1 |     2 |
+--------+-------+
1 row in set (0,00 sec)

在ck中建立數據庫，連接上述mysql

CREATE DATABASE mysql_db ENGINE = MySQL('localhost:3306', 'test', 'my_user', 'user_password')

而後就能夠在ck中，對mysql庫進行一系列操做

表引擎(table engine)—MergeTree 家族

表引擎定義一個表建立是時候，使用什麼引擎進行存儲。表引擎控制以下事項

數據如何讀寫以及，以及存儲位置
支持的查詢能力
數據併發訪問能力
數據的replica特徵

MergeTree 引擎

建表時，指定table engine相關配置

CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
    name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1] [TTL expr1],
    name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2] [TTL expr2],
    ...
    INDEX index_name1 expr1 TYPE type1(...) GRANULARITY value1,
    INDEX index_name2 expr2 TYPE type2(...) GRANULARITY value2
) ENGINE = MergeTree()
[PARTITION BY expr]
[ORDER BY expr]
[PRIMARY KEY expr]
[SAMPLE BY expr]
[TTL expr]
[SETTINGS name=value, ...]

該引擎會數據進行分區存儲。
數據插入時，不一樣分區的數據，會分爲不一樣的數據段(data part), ck後臺再對這些data part作合併，不一樣的分區的data part不會合到一塊兒
一個data part 由有許多不可分割的最小granule組成

部分配置舉例

ENGINE MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate, intHash32(UserID)) SAMPLE BY intHash32(UserID) SETTINGS index_granularity=8192

granule

同時爲了方便索引，ck會對每一個granule指定一個mark number, 方便實際使用的（經過編號，總比經過實際的主鍵值要好使用一點）。

這種索引結構很是像跳錶。也稱爲稀疏索引，由於它不是對每一行數據作索引，而是以排序後的數據範圍作索引。

查詢舉例，若是咱們想查詢CounterID in ('a', 'h')，ck服務器基於上述結構，實際讀取的數據範圍爲[0, 3) and [6, 8)

能夠在建表時，經過index_granularity指定，兩個mark之間存儲的行記錄數，也即granule的大小(由於兩個mark間就是一個granule)

TTL

能夠對錶和字段進行過時設置

MergeTree 總結

MergeTree 至關於MergeTree家族表引擎的超類。它定義整個MergeTree家族的數據文件存儲的特徵。即

有數據合併
有稀疏索引，像跳錶同樣的數據結構，來存儲數據集。
能夠指定數據分區

而在此數據基礎上，衍生出了一些列增對不一樣應用場景的子MergeTree。他們分別是

ReplacingMergeTree 自動移除primary key相同的數據
SummingMergeTree　可以將相同主鍵的，數字類型字段進行sum,　最後存爲一行，這至關於預聚合，它能減小存儲空間，提高查詢性能
AggregatingMergeTree　可以將同一主鍵的數據，按必定規則聚合，減小數據存儲，提升聚合查詢的性能，至關於預聚合。
CollapsingMergeTree　將大多數列內容都相同，可是部分列值不一樣，可是數據是成對的行合併，好比列的值是1和-1

ReplicatedMergeTree　引擎

ck中建立的表，默認都是沒有replicate的，爲了提升可用性，須要引入replicate。ck的引入方式是經過集成zookeeper實現數據的replicate副本。

正對上述的各類預聚合引擎，也有對應的ReplicatedMergeTree 引擎進行支持

ReplicatedMergeTree
ReplicatedSummingMergeTree
ReplicatedReplacingMergeTree
ReplicatedAggregatingMergeTree
ReplicatedCollapsingMergeTree
ReplicatedVersionedCollapsingMergeTree
ReplicatedGraphiteMergeTree

表引擎(table engine)— Log Engine 家族

該系列表引擎正對的是那種會持續產生須要小表，而且各個表數據量都不大的日誌場景。這些引擎的特色是：

數據存儲在磁盤上
以apeend方式新增數據
寫是加鎖，讀需等待，也即查詢性能不高

表引擎(table engine)— 外部數據源

ck建表時，還支持許多外部數據源引擎，他們應該是像hive　外表同樣，只是創建了一個表形態的連接，實際存儲仍是源數據源。(這個有待確認)

這些外部數據源表引擎有：

Kafka
MySQL
JDBC
ODBC
HDFS

Sql語法

sample 語句

安裝事宜

一些tips

生產環境關掉swap file

Disable the swap file for production environments.

記錄集羣運行狀況的一些表

system.metrics, system.events, and system.asynchronous_metrics tables.

安裝環境配置

cpu頻率控制

Linux系統，會根據任務的負荷對cpu進行降頻或升頻，這些調度升降過程會影響到ck的性能，使用如下配置，將cpu的頻率開到最大

echo 'performance' | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

linux系統頻率可能的配置以下：

運行超額分配內存

控制Overcommiting Memory行爲的有三個數值

0: The Linux kernel is free to overcommit memory (this is the default), a heuristic algorithm is applied to figure out if enough memory is available.
1: The Linux kernel will always overcommit memory, and never check if enough memory is available. This increases the risk of out-of-memory situations, but also improves memory-intensive workloads.
2: The Linux kernel will not overcommit memory, and only allocate as much memory as defined in overcommit_ratio.

ck須要儘量多的內存，因此須要開啓超額申請的功能，修改配置以下

echo 0 | sudo tee /proc/sys/vm/overcommit_memory

關閉透明內存

因此CK安裝指望關閉該選項：

echo 'never' | sudo tee /sys/kernel/mm/transparent_hugepage/enabled

儘可能用大的網絡帶寬

若是是ipv6的話，須要增大 route cache

不要將zk和ck裝在一塊兒

ck會盡量的多佔用資源來保證性能，因此若是跟zk裝在一塊兒，ck會影響zk,使其吞吐量降低，延遲增高

開啓zk日誌清理功能

zk默認不會刪除過時的snapshot和log文件，日積月累將是個定時炸彈，因此須要修改zk配置，啓用autopurge功能，yandex的配置以下:

zk配置zoo.cfg

# http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=30000
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=10

maxClientCnxns=2000

maxSessionTimeout=60000000
# the directory where the snapshot is stored.
dataDir=/opt/zookeeper/{{ cluster['name'] }}/data
# Place the dataLogDir to a separate physical disc for better performance
dataLogDir=/opt/zookeeper/{{ cluster['name'] }}/logs

autopurge.snapRetainCount=10
autopurge.purgeInterval=1


# To avoid seeks ZooKeeper allocates space in the transaction log file in
# blocks of preAllocSize kilobytes. The default block size is 64M. One reason
# for changing the size of the blocks is to reduce the block size if snapshots
# are taken more often. (Also, see snapCount).
preAllocSize=131072

# Clients can submit requests faster than ZooKeeper can process them,
# especially if there are a lot of clients. To prevent ZooKeeper from running
# out of memory due to queued requests, ZooKeeper will throttle clients so that
# there is no more than globalOutstandingLimit outstanding requests in the
# system. The default limit is 1,000.ZooKeeper logs transactions to a
# transaction log. After snapCount transactions are written to a log file a
# snapshot is started and a new transaction log file is started. The default
# snapCount is 10,000.
snapCount=3000000

# If this option is defined, requests will be will logged to a trace file named
# traceFile.year.month.day.
#traceFile=

# Leader accepts client connections. Default value is "yes". The leader machine
# coordinates updates. For higher update throughput at thes slight expense of
# read throughput the leader can be configured to not accept clients and focus
# on coordination.
leaderServes=yes

standaloneEnabled=false
dynamicConfigFile=/etc/zookeeper-{{ cluster['name'] }}/conf/zoo.cfg.dynamic

對應的jvm參數

NAME=zookeeper-{{ cluster['name'] }}
ZOOCFGDIR=/etc/$NAME/conf

# TODO this is really ugly
# How to find out, which jars are needed?
# seems, that log4j requires the log4j.properties file to be in the classpath
CLASSPATH="$ZOOCFGDIR:/usr/build/classes:/usr/build/lib/*.jar:/usr/share/zookeeper/zookeeper-3.5.1-metrika.jar:/usr/share/zookeeper/slf4j-log4j12-1.7.5.jar:/usr/share/zookeeper/slf4j-api-1.7.5.jar:/usr/share/zookeeper/servlet-api-2.5-20081211.jar:/usr/share/zookeeper/netty-3.7.0.Final.jar:/usr/share/zookeeper/log4j-1.2.16.jar:/usr/share/zookeeper/jline-2.11.jar:/usr/share/zookeeper/jetty-util-6.1.26.jar:/usr/share/zookeeper/jetty-6.1.26.jar:/usr/share/zookeeper/javacc.jar:/usr/share/zookeeper/jackson-mapper-asl-1.9.11.jar:/usr/share/zookeeper/jackson-core-asl-1.9.11.jar:/usr/share/zookeeper/commons-cli-1.2.jar:/usr/src/java/lib/*.jar:/usr/etc/zookeeper"

ZOOCFG="$ZOOCFGDIR/zoo.cfg"
ZOO_LOG_DIR=/var/log/$NAME
USER=zookeeper
GROUP=zookeeper
PIDDIR=/var/run/$NAME
PIDFILE=$PIDDIR/$NAME.pid
SCRIPTNAME=/etc/init.d/$NAME
JAVA=/usr/bin/java
ZOOMAIN="org.apache.zookeeper.server.quorum.QuorumPeerMain"
ZOO_LOG4J_PROP="INFO,ROLLINGFILE"
JMXLOCALONLY=false
JAVA_OPTS="-Xms{{ cluster.get('xms','128M') }} \
    -Xmx{{ cluster.get('xmx','1G') }} \
    -Xloggc:/var/log/$NAME/zookeeper-gc.log \
    -XX:+UseGCLogFileRotation \
    -XX:NumberOfGCLogFiles=16 \
    -XX:GCLogFileSize=16M \
    -verbose:gc \
    -XX:+PrintGCTimeStamps \
    -XX:+PrintGCDateStamps \
    -XX:+PrintGCDetails
    -XX:+PrintTenuringDistribution \
    -XX:+PrintGCApplicationStoppedTime \
    -XX:+PrintGCApplicationConcurrentTime \
    -XX:+PrintSafepointStatistics \
    -XX:+UseParNewGC \
    -XX:+UseConcMarkSweepGC \
-XX:+CMSParallelRemarkEnabled"

數據備份

數據除了存儲在ck以外，能夠在hdfs中保留一份，以防止ck數據丟失後，沒法恢復。

配置文件

ck的默認配置文件爲/etc/clickhouse-server/config.xml，你能夠在其中指定全部的服務器配置。

固然你能夠將各類不一樣的配置分開，好比user的配置，和quota的配置，單獨放一個文件，其他文件放置的路徑爲

/etc/clickhouse-server/config.d

ck最終會將全部的配置合在一塊兒生成一個完整的配置file-preprocessed.xml

各個分開的配置，能夠覆蓋或刪除主配置中的相同配置，使用replace或remove屬性就行，好比

<query_masking_rules>
    <rule>
        <name>hide SSN</name>
        <regexp>\b\d{3}-\d{2}-\d{4}\b</regexp>
        <replace>000-00-0000</replace>
    </rule>
</query_masking_rules>

同時ck還可使用zk作爲本身的配置源，即最終配置文件的生成，會使用zk中的配置。

默認狀況下：
users, access rights, profiles of settings, quotas這些設置都在users.xml

一些最佳實踐

一些踩坑處理：
1.Too many parts(304). Merges are processing significantly slower than inserts 問題是由於插入的太平凡，插入速度超過了後臺merge的速度，解決版本辦法是，增大background_pool_size和下降插入速度，官方建議「每秒不超過1次的insert request」，實際是每秒的寫入影響不要超過一個文件。若是寫入的數據涉及多個分區文件，極可能仍是出現這個問題。因此分區的設置必定要合理
2.DB::NetException: Connection reset by peer, while reading from socket xxx 。頗有多是沒有配置max_memory_usage和max_memory_usage_for_all_queries，致使內存超限，ck server被操做系統殺死
3.Memory limit (for query) exceeded:would use 9.37 GiB (attempt to allocate chunk of 301989888 bytes), maximum: 9.31 GiB 。是因爲咱們設置了ck server的內存使用上線。那些超限的請求被ck殺死，但ck自己並無掛。這個時候就要增長max_bytes_before_external_sort和max_bytes_before_external_group_by配置，來利用上硬盤
4.ck的副本和分片依賴zk,因此zk是個很大的性能瓶頸，須要對zk有很好的認識和配置，甚至啓用多個zk集羣來支持ck集羣
5.zk和ck建議都使用ssd,提高性能
對應文章：https://mp.weixin.qq.com/s/egzFxUOAGen_yrKclZGVag

參考資料

https://clickhouse.yandex/docs/en/operations/tips/

http://engineering.pivotal.io/post/virtual_memory_settings_in_linux_-_the_problem_with_overcommit/

https://blog.nelhage.com/post/transparent-hugepages/

https://wiki.archlinux.org/index.php/CPU_frequency_scaling

參考資料

https://clickhouse.yandex/docs/en/operations/tips/

http://engineering.pivotal.io/post/virtual_memory_settings_in_linux_-_the_problem_with_overcommit/

https://blog.nelhage.com/post/transparent-hugepages/

https://wiki.archlinux.org/index.php/CPU_frequency_scaling

歡迎關注個人我的公衆號"西北偏北UP"，記錄代碼人生，行業思考，科技評論

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。