ClickHouse之集羣搭建以及數據複製

時間 2019-11-11

原文原文鏈接

前面的文章簡單的介紹了ClickHouse，以及也進行了簡單的性能測試。本次說說集羣的搭建以及數據複製，若是複製數據須要zookeeper配合。html

環境：node

1. 3臺機器，我這裏是3臺虛擬機。都安裝了clickhouse。服務器

2. 綁定hosts，其實不綁定也不要緊，配置文件裏面直接寫ip。（3臺機器都綁定hosts，以下）tcp

192.168.0.10 db_server_yayun_01
192.168.0.20 db_server_yayun_02
192.168.0.30 db_server_yayun_03

3. 建立配置文件，默認這個配置文件是不存在的。/etc/clickhouse-server/config.xml有提示，以下：
If element has 'incl' attribute, then for it's value will be used corresponding substitution from another file.
By default, path to file with substitutions is /etc/metrika.xml. It could be changed in config in 'include_from' element.
Values for substitutions are specified in /yandex/name_of_substitution elements in that file.分佈式

配置文件/etc/metrika.xml內容以下：性能

<yandex>
<clickhouse_remote_servers>
    <perftest_3shards_1replicas>
        <shard>
             <internal_replication>true</internal_replication>
            <replica>
                <host>db_server_yayun_01</host>
                <port>9000</port>
            </replica>
        </shard>
        <shard>
            <replica>
                <internal_replication>true</internal_replication>
                <host>db_server_yayun_02</host>
                <port>9000</port>
            </replica>
        </shard>
        <shard>
            <internal_replication>true</internal_replication>
            <replica>
                <host>db_server_yayun_03</host>
                <port>9000</port>
            </replica>
        </shard>
    </perftest_3shards_1replicas>
</clickhouse_remote_servers>


<zookeeper-servers>
  <node index="1">
    <host>192.168.0.30</host>
    <port>2181</port>
  </node>
</zookeeper-servers>

<macros>
    <replica>192.168.0.10</replica>
</macros>


<networks>
   <ip>::/0</ip>
</networks>


<clickhouse_compression>
<case>
  <min_part_size>10000000000</min_part_size>             
  <min_part_size_ratio>0.01</min_part_size_ratio>
  <method>lz4</method>
</case>
</clickhouse_compression>

</yandex>

3臺機器的配置文件都同樣，惟一有區別的是：測試

<macros>
    <replica>192.168.0.10</replica>
</macros>

服務器ip是多少這裏就寫多少，其實不寫ip也不要緊，3臺機器不重複就行。這裏是複製須要用到的配置。還有zk的配置以下：spa

<zookeeper-servers>
  <node index="1">
    <host>192.168.0.30</host>
    <port>2181</port>
  </node>
</zookeeper-servers>

個人zk是安裝在30的機器上面的，只安裝了一個實例，生產環境確定要放到單獨的機器，而且配置成集羣。配置文件修改好之後3臺服務器重啓。
官方文檔給的步驟是：code

ClickHouse deployment to cluster

ClickHouse cluster is a homogenous cluster. Steps to set up:

1. Install ClickHouse server on all machines of the cluster
2. Set up cluster configs in configuration file
3. Create local tables on each instance
4. Create a Distributed table

前面2步都搞定了，下面建立本地表，再建立Distributed表。（3臺機器都建立，DDL不一樣步，蛋疼）server

CREATE TABLE ontime_local (FlightDate Date,Year UInt16) ENGINE = MergeTree(FlightDate, (Year, FlightDate), 8192);
CREATE TABLE ontime_all AS ontime_local ENGINE = Distributed(perftest_3shards_1replicas, default, ontime_local, rand())

插入數據（隨便一臺機器就行）：

:) insert into ontime_all (FlightDate,Year)values('2001-10-12',2001);

INSERT INTO ontime_all (FlightDate, Year) VALUES

Ok.

1 rows in set. Elapsed: 0.013 sec. 

:) insert into ontime_all (FlightDate,Year)values('2002-10-12',2002);

INSERT INTO ontime_all (FlightDate, Year) VALUES

Ok.

1 rows in set. Elapsed: 0.004 sec. 

:) insert into ontime_all (FlightDate,Year)values('2003-10-12',2003);

INSERT INTO ontime_all (FlightDate, Year) VALUES

Ok.

我這裏插入了3條數據。下面查詢看看（任何一臺機器均可以）：

:) select * from  ontime_all;

SELECT *
FROM ontime_all 

┌─FlightDate─┬─Year─┐
│ 2001-10-12 │ 2001 │
└────────────┴──────┘
┌─FlightDate─┬─Year─┐
│ 2002-10-12 │ 2002 │
└────────────┴──────┘
┌─FlightDate─┬─Year─┐
│ 2003-10-12 │ 2003 │
└────────────┴──────┘
→ Progress: 3.00 rows, 12.00 B (48.27 rows/s., 193.08 B/s.) 
3 rows in set. Elapsed: 0.063 sec. 

:)

當在其中一臺機器上面查詢的時候，抓包其餘機器能夠看見是有請求的。

tcpdump -i any -s 0 -l -w - dst port 9000

那麼關閉其中一臺機器呢？

:) select * from ontime_all;

SELECT *
FROM ontime_all 

┌─FlightDate─┬─Year─┐
│ 2001-10-12 │ 2001 │
└────────────┴──────┘
┌─FlightDate─┬─Year─┐
│ 2002-10-12 │ 2002 │
└────────────┴──────┘
┌─FlightDate─┬─Year─┐
│ 2003-10-12 │ 2003 │
└────────────┴──────┘
↓ Progress: 6.00 rows, 24.00 B (292.80 rows/s., 1.17 KB/s.) Received exception from server:
Code: 279. DB::Exception: Received from localhost:9000, ::1. DB::NetException. DB::NetException: All connection tries failed. Log: 

Code: 210, e.displayText() = DB::NetException: Connection refused: (db_server_yayun_02:9000, 192.168.0.20), e.what() = DB::NetException
Code: 210, e.displayText() = DB::NetException: Connection refused: (db_server_yayun_02:9000, 192.168.0.20), e.what() = DB::NetException
Code: 210, e.displayText() = DB::NetException: Connection refused: (db_server_yayun_02:9000, 192.168.0.20), e.what() = DB::NetException

能夠看見已經拋錯了，居然不是高可用？後面又看到了文檔的另一種配置方法，那就是配置2個節點，副本2個，通過測試高可用沒有問題，另外也是分佈式並行查詢。感興趣的同窗能夠自行測試。
https://clickhouse.yandex/reference_en.html#Distributed

下面進行數據複製的測試,zk已經配置好了，直接建表測試（3臺機器都建立）：

CREATE TABLE ontime_replica (FlightDate Date,Year UInt16) ENGINE = ReplicatedMergeTree('/clickhouse_perftest/tables/ontime_replica','{replica}',FlightDate,(Year, FlightDate),8192);

插入數據測試：

insert into ontime_replica (FlightDate,Year)values('2018-10-12',2018);

任何一臺機器都可查詢到。其實到如今對於集羣和複製都還沒完全搞明白，由於分佈式表也進行了數據複製，因此有點懵。有大嬸的話歡迎一塊兒交流。

參考資料：

https://clickhouse.yandex/reference_en.html#Distributed

https://clickhouse.yandex/tutorial.html

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。