背景:html
隨着數據量的上升,OLAP一直是被討論的話題,雖然druid,kylin可以解決OLAP問題,可是druid,kylin也是須要和hadoop全家桶一塊兒用的,異常的笨重,再說我也搞不定,那隻能找我能搞定的技術。故引進clickhoue,關於clickhoue在17年本人就開始關注,而且寫了一些入門的介紹,直到19年clickhoue功能慢慢的豐富才又慢慢的關注,而且編寫了同步程序,把mysql數據實時同步到clickhoue,而且最終在線上使用起來。python
關於clickhouse是什麼請自行查閱官網:https://clickhouse.yandex/mysql
clickhouse官方性能測試:https://clickhouse.yandex/benchmark.htmlgit
clickhouse面對海量數據,好比單表過百億可使用集羣(複製+分片),若是數據量比較小,好比單表10-20億使用單機就足以知足查詢需求。若是使用複製須要使用zk,更多集羣的請自行查閱官方資料。github
單機部署(之前的文章也有寫過單機部署) :redis
在2016年clickhouse剛開始開源的時候對Ubuntu支持很是友好,一個apt命令就能夠安裝了。對於centos等系統 支持就比較差,須要本身編譯,並且不必定可以成功。隨着使用人羣的擴大,目前對於centos支持也是很是的友好 了,有rpm包能夠直接安裝。甚至目前Altinity公司已經制做了yum源,添加源以後直接yum安裝完成。這個在官方 文檔裏面也有提到,參考: https://clickhouse.yandex/docs/en/getting_started/ https://github.com/Altinity/clickhouse-rpm-install 。目前線上使用的是centos 7.0的系統。之因此使用7.0的系統是由於同步數據的程序是用python寫的,並且用到的 一個核心包:python-mysql-replication須要使用python 2.7的環境。同時因爲clickhouse不兼容mysql協議,爲了方便開發接入系統不用過多更改代碼,引入了proxysql兼容mysql協議,clickhouse最新版本已經支持mysql協議,支持clickhouse的proxysql也須要python 2.7的環境,因此乾脆直接用centos 7.0系統sql
測試環境:
服務器數量:1臺
操做系統:centos 7.1
安裝服務:clickhouse,mysql
安裝mysql是測試clickhouse從mysql同步數據。數據庫
clickhouse安裝:bootstrap
添加yum源centos
curl -s https://packagecloud.io/install/repositories/altinity/clickhouse/script.rpm.sh | sudo bash
yum安裝
yum install -y clickhouse-server clickhouse-client
服務啓動
/etc/init.d/clickhouse-server start
默認數據存放位置是: /var/lib/clickhouse/
登陸,查看數據庫(默認用戶是default,密碼爲空)
[root@ck-server-01 sync]# clickhouse-client -h 127.0.0.1 ClickHouse client version 19.9.2.4. Connecting to 127.0.0.1:9000 as user default. Connected to ClickHouse server version 19.9.2 revision 54421. ck-server-01 :) show databases; SHOW DATABASES ┌─name────┐ │ default │ │ system │ └─────────┘ 2 rows in set. Elapsed: 0.003 sec. ck-server-01 :)
default數據庫裏面沒有任何東西,和mysql裏面的test庫是同樣的。system庫看名字就知道是什麼。到這裏clickhouse就部署完成,是否是很簡單?
補充一點,在官方的文檔裏面有幾點建議:
1. 關閉大頁
2. 調整內存使用
3. 關閉cpu節能模式
echo 'performance' | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor echo 0 > /proc/sys/vm/overcommit_memory echo 'never' > /sys/kernel/mm/transparent_hugepage/enabled
mysql部署請自行部署。這裏不作介紹。若是想從mysql同步數據那麼binlog 格式必須是row。並且必須binlog_row_image=full
安裝同步程序依賴的包;同步程序能夠放在clickhouse服務器上面,也能夠單獨放在其餘服務器。同步程序使用pypy啓動,因此安裝包的時候須要安裝pypy的包。
yum -y install pypy-libs pypy pypy-devel wget https://bootstrap.pypa.io/get-pip.py pypy get-pip.py /usr/lib64/pypy-5.0.1/bin/pip install MySQL-python /usr/lib64/pypy-5.0.1/bin/pip install mysql-replication /usr/lib64/pypy-5.0.1/bin/pip install clickhouse-driver /usr/lib64/pypy-5.0.1/bin/pip install redis
這裏也安裝了redis模塊是由於同步的binlog pos能夠存放在redis裏面,固然程序也是支持存放在文件裏面
proxysql安裝(主要是爲了clickhouse兼容mysql協議): proxysql在這裏下載:https://github.com/sysown/proxysql/releases 選擇帶clickhouse的包下載,不然不會支持clickhouse
proxysql安裝及配置
rpm -ivh proxysql-2.0.3-1-clickhouse-centos7.x86_64.rpm
啓動(必須這樣啓動,不然是不支持clickhouse的):
proxysql --clickhouse-server
登陸proxysql,設置帳戶:
mysql -uadmin -padmin -h127.0.0.1 -P6032 INSERT INTO clickhouse_users VALUES ('clicku','clickp',1,100); LOAD CLICKHOUSE USERS TO RUNTIME; SAVE CLICKHOUSE USERS TO DISK;
使用proxysql鏈接到clickhouse:
[root@ck-server-01 sync]# mysql -u clicku -pclickp -h 127.0.0.1 -P6090 mysql: [Warning] Using a password on the command line interface can be insecure. Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 28356 Server version: 5.5.30 (ProxySQL ClickHouse Module) Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> show databases; +---------+ | name | +---------+ | default | | system | +---------+
mysql同步數據到clickhouse
mysql裏面有個庫yayun,庫裏面有張表tb1,同步這張表到clickhoue
mysql> use yayun; Database changed mysql> show create table tb1\G *************************** 1. row *************************** Table: tb1 Create Table: CREATE TABLE `tb1` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `pay_money` decimal(20,2) NOT NULL DEFAULT '0.00', `pay_day` date NOT NULL, `pay_time` datetime NOT NULL DEFAULT '0000-00-00 00:00:00', PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 1 row in set (0.00 sec)
1. clickhoue裏面建庫,建表。
ck-server-01 :) create database yayun; CREATE DATABASE yayun Ok. 0 rows in set. Elapsed: 0.021 sec. ck-server-01 :)
2. 建表(clickhouse建表的格式以及字段類型和mysql徹底不同,若是字段少還能夠本身建,若是字段多比較痛苦,可使用clickhouse自帶的從mysql導數據的命令來建表),在建表以前須要進行受權,由於程序同步也是模擬一個從庫拉取數據.
GRANT REPLICATION SLAVE, REPLICATION CLIENT, SELECT ON *.* TO 'ch_repl'@'127.0.0.1' identified by '123';
3. 登錄clickhouse進行建表
ck-server-01 :) use yayun; USE yayun Ok. 0 rows in set. Elapsed: 0.001 sec. ck-server-01 :) CREATE TABLE tb1 :-] ENGINE = MergeTree :-] PARTITION BY toYYYYMM(pay_time) :-] ORDER BY (pay_time) AS :-] SELECT * :-] FROM mysql('127.0.0.1:3306', 'yayun', 'tb1', 'ch_repl', '123') ; CREATE TABLE tb1 ENGINE = MergeTree PARTITION BY toYYYYMM(pay_time) ORDER BY pay_time AS SELECT * FROM mysql('127.0.0.1:3306', 'yayun', 'tb1', 'ch_repl', '123') Ok. 0 rows in set. Elapsed: 0.031 sec.
這裏使用MergeTree引擎,MergeTree是clickhouse裏面最牛逼的引擎,支持海量數據,支持索引,支持分區,支持更新刪除。toYYYYMM(pay_time)的意思是根據pay_time分區,粒度是按月。ORDER BY (pay_time)的意思是根據pay_time排序存儲,同時也是索引。上面的create table命令若是mysql表裏面之後數據那麼數據也會一併進入clickhouse裏面。一般會limit 1,而後更改一下表結構。上面沒有報錯的話咱們看看clickhouse裏面的表結構:
ck-server-01 :) show create table tb1; SHOW CREATE TABLE tb1 ┌─statement────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ CREATE TABLE yayun.tb1 (`id` UInt32, `pay_money` String, `pay_day` Date, `pay_time` DateTime) ENGINE = MergeTree PARTITION BY toYYYYMM(pay_time) ORDER BY pay_time SETTINGS index_granularity = 8192 │ └──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ 1 rows in set. Elapsed: 0.002 sec.
其中這裏的index_granularity = 8192是指索引的粒度。若是數據量沒有達到百億,那麼一般無需更改。表結構也建立完成之後如今配置同步程序配置文件:metainfo.conf
[root@ck-server-01 sync]# cat metainfo.conf # 從這裏同步數據 [master_server] host='127.0.0.1' port=3306 user='ch_repl' passwd='123' server_id=101 # redis配置信息,用於存放pos點 [redis_server] host='127.0.0.1' port=6379 passwd='12345' log_pos_prefix='log_pos_' #把log_position記錄到文件 [log_position] file='./repl_pos.log' # ch server信息,數據同步之後寫入這裏 [clickhouse_server] host=127.0.0.1 port=9000 passwd='' user='default' #字段大小寫. 1是大寫,0是小寫 column_lower_upper=0 # 須要同步的數據庫 [only_schemas] schemas='yayun' # 須要同步的表 [only_tables] tables='tb1' # 指定庫表跳過DML語句(update,delete可選) [skip_dmls_sing] skip_delete_tb_name = '' skip_update_tb_name = '' #跳過全部表的DML語句(update,delete可選) [skip_dmls_all] #skip_type = 'delete' #skip_type = 'delete,update' skip_type = '' [bulk_insert_nums] #多少記錄提交一次 insert_nums=10 #選擇每隔多少秒同步一次,負數表示不啓用,單位秒 interval=60 # 同步失敗告警收件人 [failure_alarm] mail_host= 'xxx' mail_port= 25 mail_user= 'xxx' mail_pass= 'xxx' mail_send_from = 'xxx' alarm_mail = 'xxx' #日誌存放路徑 [repl_log] log_dir="/tmp/relication_mysql_clickhouse.log"
設置pos點:
和mysql搭建從庫同樣,配置從哪裏開始同步,看mysql的pos點:
mysql> show master status; +------------------+----------+--------------+------------------+-------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +------------------+----------+--------------+------------------+-------------------+ | mysql-bin.000069 | 4024404 | | | | +------------------+----------+--------------+------------------+-------------------+ 1 row in set (0.00 sec)
把pos點寫入文件或者redis,我選擇記錄到文件就是。
[root@ck-server-01 sync]# cat repl_pos.log [log_position] filename = mysql-bin.000069 position = 4024404 [root@ck-server-01 sync]#
啓動同步程序:
[root@ck-server-01 sync]# pypy replication_mysql_clickhouse.py --help usage: Data Replication to clikhouse [-h] [-c CONF] [-d] [-l] mysql data is copied to clikhouse optional arguments: -h, --help show this help message and exit -c CONF, --conf CONF Data synchronization information file -d, --debug Display SQL information -l, --logtoredis log position to redis ,default file By dengyayun @2019 [root@ck-server-01 sync]#
默認pos點就是記錄文件,無需再指定記錄binlog pos方式
[root@ck-server-01 sync]# pypy replication_mysql_clickhouse.py --conf metainfo.conf --debug 11:59:54 INFO 開始同步數據時間 2019-07-17 11:59:54 11:59:54 INFO 從服務器 127.0.0.1:3306 同步數據 11:59:54 INFO 讀取binlog: mysql-bin.000069:4024404 11:59:54 INFO 同步到clickhouse server 127.0.0.1:9000 11:59:54 INFO 同步到clickhouse的數據庫: ['yayun'] 11:59:54 INFO 同步到clickhouse的表: ['tb1']
mysql插入10條數據:
mysql> insert into tb1 (pay_money,pay_day,pay_time)values('66.22','2019-06-29','2019-06-29 14:00:00'),('66.22','2019-06-29','2019-06-29 14:00:00'),('66.22','2019-06-29','2019-06-29 14:00:00'),('66.22','2019-06-29','2019-06-29 14:00:00'),('66.22','2019-06-29','2019-06-29 14:00:00'),('66.22','2019-06-29','2019-06-29 14:00:00'),('66.22','2019-06-29','2019-06-29 14:00:00'),('66.22','2019-06-29','2019-06-29 14:00:00'),('66.22','2019-06-29','2019-06-29 14:00:00'),('66.22','2019-06-29','2019-06-29 14:00:00') ; Query OK, 10 rows affected (0.01 sec) Records: 10 Duplicates: 0 Warnings: 0 mysql> select * from tb1; +----+-----------+------------+---------------------+ | id | pay_money | pay_day | pay_time | +----+-----------+------------+---------------------+ | 1 | 66.22 | 2019-06-29 | 2019-06-29 14:00:00 | | 3 | 66.22 | 2019-06-29 | 2019-06-29 14:00:00 | | 5 | 66.22 | 2019-06-29 | 2019-06-29 14:00:00 | | 7 | 66.22 | 2019-06-29 | 2019-06-29 14:00:00 | | 9 | 66.22 | 2019-06-29 | 2019-06-29 14:00:00 | | 11 | 66.22 | 2019-06-29 | 2019-06-29 14:00:00 | | 13 | 66.22 | 2019-06-29 | 2019-06-29 14:00:00 | | 15 | 66.22 | 2019-06-29 | 2019-06-29 14:00:00 | | 17 | 66.22 | 2019-06-29 | 2019-06-29 14:00:00 | | 19 | 66.22 | 2019-06-29 | 2019-06-29 14:00:00 |
同步程序日誌輸出:
[root@ck-server-01 sync]# pypy replication_mysql_clickhouse.py --conf metainfo.conf --debug 12:12:09 INFO 開始同步數據時間 2019-07-17 12:12:09 12:12:09 INFO 從服務器 127.0.0.1:3306 同步數據 12:12:09 INFO 讀取binlog: mysql-bin.000069:4024404 12:12:09 INFO 同步到clickhouse server 127.0.0.1:9000 12:12:09 INFO 同步到clickhouse的數據庫: ['yayun'] 12:12:09 INFO 同步到clickhouse的表: ['tb1'] 12:12:09 INFO INSERT 數據插入SQL: INSERT INTO yayun.tb1 VALUES, [{u'id': 1, u'pay_money': '66.22', u'pay_day': datetime.date(2019, 6, 29), u'pay_time': datetime.datetime(2019, 6, 29, 14, 0)}, {u'id': 3, u'pay_money': '66.22', u'pay_day': datetime.date(2019, 6, 29), u'pay_time': datetime.datetime(2019, 6, 29, 14, 0)}, {u'id': 5, u'pay_money': '66.22', u'pay_day': datetime.date(2019, 6, 29), u'pay_time': datetime.datetime(2019, 6, 29, 14, 0)}, {u'id': 7, u'pay_money': '66.22', u'pay_day': datetime.date(2019, 6, 29), u'pay_time': datetime.datetime(2019, 6, 29, 14, 0)}, {u'id': 9, u'pay_money': '66.22', u'pay_day': datetime.date(2019, 6, 29), u'pay_time': datetime.datetime(2019, 6, 29, 14, 0)}, {u'id': 11, u'pay_money': '66.22', u'pay_day': datetime.date(2019, 6, 29), u'pay_time': datetime.datetime(2019, 6, 29, 14, 0)}, {u'id': 13, u'pay_money': '66.22', u'pay_day': datetime.date(2019, 6, 29), u'pay_time': datetime.datetime(2019, 6, 29, 14, 0)}, {u'id': 15, u'pay_money': '66.22', u'pay_day': datetime.date(2019, 6, 29), u'pay_time': datetime.datetime(2019, 6, 29, 14, 0)}, {u'id': 17, u'pay_money': '66.22', u'pay_day': datetime.date(2019, 6, 29), u'pay_time': datetime.datetime(2019, 6, 29, 14, 0)}, {u'id': 19, u'pay_money': '66.22', u'pay_day': datetime.date(2019, 6, 29), u'pay_time': datetime.datetime(2019, 6, 29, 14, 0)}]
clickhoue數據查詢:
ck-server-01 :) select * from tb1; SELECT * FROM tb1 ┌─id─┬─pay_money─┬────pay_day─┬────────────pay_time─┐ │ 1 │ 66.22 │ 2019-06-29 │ 2019-06-29 14:00:00 │ │ 3 │ 66.22 │ 2019-06-29 │ 2019-06-29 14:00:00 │ │ 5 │ 66.22 │ 2019-06-29 │ 2019-06-29 14:00:00 │ │ 7 │ 66.22 │ 2019-06-29 │ 2019-06-29 14:00:00 │ │ 9 │ 66.22 │ 2019-06-29 │ 2019-06-29 14:00:00 │ │ 11 │ 66.22 │ 2019-06-29 │ 2019-06-29 14:00:00 │ │ 13 │ 66.22 │ 2019-06-29 │ 2019-06-29 14:00:00 │ │ 15 │ 66.22 │ 2019-06-29 │ 2019-06-29 14:00:00 │ │ 17 │ 66.22 │ 2019-06-29 │ 2019-06-29 14:00:00 │ │ 19 │ 66.22 │ 2019-06-29 │ 2019-06-29 14:00:00 │ └────┴───────────┴────────────┴─────────────────────┘ 10 rows in set. Elapsed: 0.005 sec.
mysql數據更新:
mysql> update tb1 set pay_money='88.88'; Query OK, 10 rows affected (0.00 sec) Rows matched: 10 Changed: 10 Warnings: 0 mysql> select * from tb1; +----+-----------+------------+---------------------+ | id | pay_money | pay_day | pay_time | +----+-----------+------------+---------------------+ | 1 | 88.88 | 2019-06-29 | 2019-06-29 14:00:00 | | 3 | 88.88 | 2019-06-29 | 2019-06-29 14:00:00 | | 5 | 88.88 | 2019-06-29 | 2019-06-29 14:00:00 | | 7 | 88.88 | 2019-06-29 | 2019-06-29 14:00:00 | | 9 | 88.88 | 2019-06-29 | 2019-06-29 14:00:00 | | 11 | 88.88 | 2019-06-29 | 2019-06-29 14:00:00 | | 13 | 88.88 | 2019-06-29 | 2019-06-29 14:00:00 | | 15 | 88.88 | 2019-06-29 | 2019-06-29 14:00:00 | | 17 | 88.88 | 2019-06-29 | 2019-06-29 14:00:00 | | 19 | 88.88 | 2019-06-29 | 2019-06-29 14:00:00 | +----+-----------+------------+---------------------+ 10 rows in set (0.00 sec)
clickhoue數據查詢:
ck-server-01 :) select * from tb1; SELECT * FROM tb1 ┌─id─┬─pay_money─┬────pay_day─┬────────────pay_time─┐ │ 1 │ 88.88 │ 2019-06-29 │ 2019-06-29 14:00:00 │ │ 3 │ 88.88 │ 2019-06-29 │ 2019-06-29 14:00:00 │ │ 5 │ 88.88 │ 2019-06-29 │ 2019-06-29 14:00:00 │ │ 7 │ 88.88 │ 2019-06-29 │ 2019-06-29 14:00:00 │ │ 9 │ 88.88 │ 2019-06-29 │ 2019-06-29 14:00:00 │ │ 11 │ 88.88 │ 2019-06-29 │ 2019-06-29 14:00:00 │ │ 13 │ 88.88 │ 2019-06-29 │ 2019-06-29 14:00:00 │ │ 15 │ 88.88 │ 2019-06-29 │ 2019-06-29 14:00:00 │ │ 17 │ 88.88 │ 2019-06-29 │ 2019-06-29 14:00:00 │ │ 19 │ 88.88 │ 2019-06-29 │ 2019-06-29 14:00:00 │ └────┴───────────┴────────────┴─────────────────────┘ 10 rows in set. Elapsed: 0.009 sec.
能夠看見數據都同步完成。
代碼地址:
https://github.com/yymysql/mysql-clickhouse-replication
總結:
目前線上報表業務都已經在使用clickhoue,數據同步採用自行開發的同步程序進行同步。目前數據一致性沒有什麼問題。固然同步的表須要有自增主鍵,不然有些狀況比較難處理。延時也比較小。數據的延時以及數據的一致性都有監控。
整體來講使用clickhoue處理olap仍是很是不錯的選擇,小夥伴們能夠嘗試。
參考資料
https://clickhouse-driver.readthedocs.io/en/latest/
https://python-mysql-replication.readthedocs.io/en/latest/examples.html
https://clickhouse.yandex/docs/en/
https://github.com/sysown/proxysql/wiki/ClickHouse-Support