本期做者:鄧亞運
37 互娛高級 DBA,負責公司 MySQL,Redis,Hadoop,Clickhouse 集羣的管理和維護。html
前面的文章簡單的介紹了 ClickHouse(第01期),以及也進行了簡單的性能測試(第02期)。本期說說集羣的搭建以及數據複製,若是複製數據須要 zookeeper 配合。node
1)3 臺機器。我這裏是 3 臺虛擬機,都安裝了 clickhouse。
2)綁定 hosts,其實不綁定也不要緊,配置文件裏面直接寫 ip。(3 臺機器都綁定 hosts,以下)segmentfault
192.168.0.10 db_server_yayun_01 192.168.0.20 db_server_yayun_02 192.168.0.30 db_server_yayun_03
3)建立配置文件,默認這個配置文件是不存在的。/etc/clickhouse-server/config.xml 有提示,以下:服務器
If element has 'incl' attribute, then for it's value will be used corresponding substitution from another file. By default, path to file with substitutions is /etc/metrika.xml. It could be changed in config in 'include_from' element. Values for substitutions are specified in /yandex/name_of_substitution elements in that file.
配置文件 /etc/metrika.xml 內容以下:tcp
<yandex> <clickhouse_remote_servers> <perftest_3shards_1replicas> <shard> <internal_replication>true</internal_replication> <replica> <host>db_server_yayun_01</host> <port>9000</port> </replica> </shard> <shard> <replica> <internal_replication>true</internal_replication> <host>db_server_yayun_02</host> <port>9000</port> </replica> </shard> <shard> <internal_replication>true</internal_replication> <replica> <host>db_server_yayun_03</host> <port>9000</port> </replica> </shard> </perftest_3shards_1replicas> </clickhouse_remote_servers> <zookeeper-servers> <node index="1"> <host>192.168.0.30</host> <port>2181</port> </node> </zookeeper-servers> <macros> <replica>192.168.0.10</replica> </macros> <networks> <ip>::/0</ip> </networks> <clickhouse_compression> <case> <min_part_size>10000000000</min_part_size> <min_part_size_ratio>0.01</min_part_size_ratio> <method>lz4</method> </case> </clickhouse_compression> </yandex>
3 臺機器的配置文件都同樣,惟一有區別的是:分佈式
<macros> <replica>192.168.0.10</replica> </macros>
服務器 ip 是多少這裏就寫多少,其實不寫 ip 也不要緊,3 臺機器不重複就行。這裏是複製須要用到的配置。還有 zk 的配置以下:oop
<zookeeper-servers> <node index="1"> <host>192.168.0.30</host> <port>2181</port> </node> </zookeeper-servers>
個人 zk 是安裝在 30 的機器上面的,只安裝了一個實例,生產環境確定要放到單獨的機器,而且配置成集羣。配置文件修改好之後 3 臺服務器重啓。
官方文檔給的步驟是:性能
ClickHouse deployment to cluster ClickHouse cluster is a homogenous cluster. Steps to set up: 1. Install ClickHouse server on all machines of the cluster 2. Set up cluster configs in configuration file 3. Create local tables on each instance 4. Create a Distributed table
前面 2 步都搞定了,下面建立本地表,再建立 Distributed 表。(3臺 機器都建立,DDL 不一樣步,蛋疼,也可使用 ON Cluster 語法。在一個節點執行便可。)測試
CREATE TABLE ontime_local (FlightDate Date,Year UInt16) ENGINE = MergeTree(FlightDate, (Year, FlightDate), 8192); CREATE TABLE ontime_all AS ontime_local ENGINE = Distributed(perftest_3shards_1replicas, default, ontime_local, rand())
插入數據(隨便一臺機器就行):spa
:) insert into ontime_all (FlightDate,Year)values('2001-10-12',2001); INSERT INTO ontime_all (FlightDate, Year) VALUES Ok. 1 rows in set. Elapsed: 0.013 sec. :) insert into ontime_all (FlightDate,Year)values('2002-10-12',2002); INSERT INTO ontime_all (FlightDate, Year) VALUES Ok. 1 rows in set. Elapsed: 0.004 sec. :) insert into ontime_all (FlightDate,Year)values('2003-10-12',2003); INSERT INTO ontime_all (FlightDate, Year) VALUES Ok.
我這裏插入了 3 條數據。下面查詢看看(任何一臺機器均可以):
:) select * from ontime_all; SELECT * FROM ontime_all ┌─FlightDate─┬─Year─┐ │ 2001-10-12 │ 2001 │ └────────────┴──────┘ ┌─FlightDate─┬─Year─┐ │ 2002-10-12 │ 2002 │ └────────────┴──────┘ ┌─FlightDate─┬─Year─┐ │ 2003-10-12 │ 2003 │ └────────────┴──────┘ → Progress: 3.00 rows, 12.00 B (48.27 rows/s., 193.08 B/s.) 3 rows in set. Elapsed: 0.063 sec. :)
當在其中一臺機器上面查詢的時候,抓包其餘機器能夠看見是有請求的。
tcpdump -i any -s 0 -l -w - dst port 9000
那麼關閉其中一臺機器呢?
:) select * from ontime_all; SELECT * FROM ontime_all ┌─FlightDate─┬─Year─┐ │ 2001-10-12 │ 2001 │ └────────────┴──────┘ ┌─FlightDate─┬─Year─┐ │ 2002-10-12 │ 2002 │ └────────────┴──────┘ ┌─FlightDate─┬─Year─┐ │ 2003-10-12 │ 2003 │ └────────────┴──────┘ ↓ Progress: 6.00 rows, 24.00 B (292.80 rows/s., 1.17 KB/s.) Received exception from server: Code: 279. DB::Exception: Received from localhost:9000, ::1. DB::NetException. DB::NetException: All connection tries failed. Log: Code: 210, e.displayText() = DB::NetException: Connection refused: (db_server_yayun_02:9000, 192.168.0.20), e.what() = DB::NetException Code: 210, e.displayText() = DB::NetException: Connection refused: (db_server_yayun_02:9000, 192.168.0.20), e.what() = DB::NetException Code: 210, e.displayText() = DB::NetException: Connection refused: (db_server_yayun_02:9000, 192.168.0.20), e.what() = DB::NetException
能夠看見已經拋錯了,居然不是高可用?是的,就是否是高可用,上面的配置是設置 3 個分片,可是沒有副本,因此在掛掉一個節點之後,就會存在問題。後面又看到了文檔的另一種配置方法,那就是配置 2 個節點,副本 2 個,通過測試高可用沒有問題,另外也是分佈式並行查詢。感興趣的同窗能夠自行測試。https://clickhouse.yandex/ref...下面進行數據複製的測試,zk 已經配置好了,直接建表測試(3 臺機器都建立):
CREATE TABLE ontime_replica (FlightDate Date,Year UInt16) ENGINE = ReplicatedMergeTree('/clickhouse_perftest/tables/ontime_replica','{replica}',FlightDate,(Year, FlightDate),8192);
insert into ontime_replica (FlightDate,Year)values('2018-10-12',2018);
https://clickhouse.yandex/ref...
https://clickhouse.yandex/tut...
關於 ClickHouse 的技術內容,大家還有什麼想知道的嗎?趕忙留言告訴小編吧!