最近在參與網絡flow採集和分析的項目。主要是遵循netflow和sflow協議,完成對防火牆和核心交換機的流量的採集和存儲以及後續分析。flow併發量和數據量都比較大,存儲是瓶頸。最開始存儲到prometheus和以後測試的infulxdb的方案均宣告失敗。看來prometheus仍是不適合大數據量的處理,若是數據量過大,須要考慮聯邦模式了。
在調研社區相關項目vflow後,準備測試clickhouse的存儲方案。
採集端咱們並無採用vflow,而是對telegraf寫了專門的針對flow的input插件。而後輸出到kafka集羣當中。而後消費者從kafka獲取數據存儲到clickhouse,便於之後的分析。
ClickHouse是一個很是好的分析列式數據庫選擇,性能比較強勁。官方提供了很對與主流數據庫的性能對比,你們能夠了解更加詳細的測試報告。php
下載地址
共5個包。html
安裝server過程當中,出現如下錯誤:linux
rpm -ivh clickhouse-server-1.1.54236-4.el7.x86_64.rpm error: Failed dependencies: libicudata.so.50()(64bit) is needed by clickhouse-server-1.1.54236-4.el7.x86_64 libicui18n.so.50()(64bit) is needed by clickhouse-server-1.1.54236-4.el7.x86_64 libicuuc.so.50()(64bit) is needed by clickhouse-server-1.1.54236-4.el7.x86_64 libltdl.so.7()(64bit) is needed by clickhouse-server-1.1.54236-4.el7.x86_64 libodbc.so.2()(64bit) is needed by clickhouse-server-1.1.54236-4.el7.x86_64
首先經過執行下面語句解決前三個報錯git
yum install libicu-devel
而後下載libtool-ltdl-2.4.2-22.el7_3.x86_64.rpmgithub
rpm -ivh libtool-ltdl-2.4.2-22.el7_3.x86_64.rpm
而後下載unixODBC-2.3.1-11.el7.x86_64.rpmgolang
rpm -ivh unixODBC-2.3.1-11.el7.x86_64.rpm
1算法
rpm -ivh clickhouse-server-common-1.1.54236-4.el7.x86_64.rpm Preparing... ################################# [100%] Updating / installing... 1:clickhouse-server-common-1.1.5423################################# [100%]
2sql
rpm -ivh clickhouse-server-1.1.54236-4.el7.x86_64.rpm Preparing... ################################# [100%] Updating / installing... 1:clickhouse-server-1.1.54236-4.el7################################# [100%]
3數據庫
rpm -ivh clickhouse-debuginfo-1.1.54236-4.el7.x86_64.rpm Preparing... ################################# [100%] Updating / installing... 1:clickhouse-debuginfo-1.1.54236-4.################################# [100%]
4centos
rpm -ivh clickhouse-client-1.1.54236-4.el7.x86_64.rpm Preparing... ################################# [100%] Updating / installing... 1:clickhouse-client-1.1.54236-4.el7################################# [100%]
5
rpm -ivh clickhouse-compressor-1.1.54236-4.el7.x86_64.rpm Preparing... ################################# [100%] Updating / installing... 1:clickhouse-compressor-1.1.54236-4################################# [100%]
clickhouse-server --config-file=/etc/clickhouse-server/config.xml Include not found: clickhouse_remote_servers Include not found: clickhouse_compression 2018.03.19 17:17:25.113898 [ 1 ] <Warning> Application: Logging to console 2018.03.19 17:17:25.117332 [ 1 ] <Information> : Starting daemon with revision 54236 2018.03.19 17:17:25.117444 [ 1 ] <Information> Application: starting up 2018.03.19 17:17:25.118273 [ 1 ] <Debug> Application: rlimit on number of file descriptors is 1024000 2018.03.19 17:17:25.118299 [ 1 ] <Debug> Application: Initializing DateLUT. 2018.03.19 17:17:25.118307 [ 1 ] <Trace> Application: Initialized DateLUT with time zone `Asia/Shanghai'. 2018.03.19 17:17:25.120309 [ 1 ] <Debug> Application: Configuration parameter 'interserver_http_host' doesn't exist or exists and empty. Will use 'xxxx' as replica host. 2018.03.19 17:17:25.120471 [ 1 ] <Debug> ConfigReloader: Loading config `/etc/clickhouse-server/users.xml' 2018.03.19 17:17:25.125606 [ 1 ] <Warning> ConfigProcessor: Include not found: networks 2018.03.19 17:17:25.125636 [ 1 ] <Warning> ConfigProcessor: Include not found: networks 2018.03.19 17:17:25.126753 [ 1 ] <Information> Application: Loading metadata. 2018.03.19 17:17:25.127259 [ 1 ] <Information> DatabaseOrdinary (default): Total 0 tables. 2018.03.19 17:17:25.127348 [ 1 ] <Information> DatabaseOrdinary (system): Total 0 tables. 2018.03.19 17:17:25.127894 [ 1 ] <Debug> Application: Loaded metadata. 2018.03.19 17:17:25.128699 [ 1 ] <Information> Application: Listening http://[::1]:8123 2018.03.19 17:17:25.128749 [ 1 ] <Information> Application: Listening tcp: [::1]:9000 2018.03.19 17:17:25.128783 [ 1 ] <Information> Application: Listening interserver: [::1]:9009 2018.03.19 17:17:25.128816 [ 1 ] <Information> Application: Listening http://10.xx.xx.136:8123 2018.03.19 17:17:25.128845 [ 1 ] <Information> Application: Listening tcp: 10.xx.xx.136:9000 2018.03.19 17:17:25.128872 [ 1 ] <Information> Application: Listening interserver: 10.xx.xx.136:9009 2018.03.19 17:17:25.129116 [ 1 ] <Information> Application: Ready for connections. 2018.03.19 17:17:27.120687 [ 2 ] <Debug> ConfigReloader: Loading config `/etc/clickhouse-server/config.xml' 2018.03.19 17:17:27.127614 [ 2 ] <Warning> ConfigProcessor: Include not found: clickhouse_remote_servers 2018.03.19 17:17:27.127701 [ 2 ] <Warning> ConfigProcessor: Include not found: clickhouse_compression
clickhouse-client --host=10.xx.xx.136 --port=9000 ClickHouse client version 1.1.54236. Connecting to 10.xx.xx.136:9000. Connected to ClickHouse server version 1.1.54236. :) :) :) :) :) :) show tables; SHOW TABLES Ok. 0 rows in set. Elapsed: 0.011 sec. :)
:) select now() SELECT now() ┌───────────────now()─┐ │ 2018-03-19 17:22:55 │ └─────────────────────┘ 1 rows in set. Elapsed: 0.002 sec.
/etc/systemd/system/clickhouse.service [Unit] Description=clickhouse After=syslog.target After=network.target [Service] LimitAS=infinity LimitRSS=infinity LimitCORE=infinity LimitNOFILE=65536 User=root Type=simple Restart=on-failure KillMode=control-group ExecStart=/usr/bin/clickhouse-server --config-file=/etc/clickhouse-server/config.xml RestartSec=10s [Install] WantedBy=multi-user.target
硬件配置:
kafka的消費者須要將數據寫入到clickhouse數據庫中,因爲咱們的技術棧主要爲golang,因此須要一個golang版本的ClickHouse的驅動。本章就重點介紹一個開源的驅動。
connection_open_strategy - random/in_order (default random).
tcp://host1:9000?username=user&password=qwerty&database=clicks&read_timeout=10&write_timeout=20&alt_hosts=host2:9000,host3:9000
go get -u github.com/kshvakov/clickhouse
package main import ( "database/sql" "fmt" "log" "time" "github.com/kshvakov/clickhouse" ) func main() { connect, err := sql.Open("clickhouse", "tcp://127.0.0.1:9000?debug=true") if err != nil { log.Fatal(err) } if err := connect.Ping(); err != nil { if exception, ok := err.(*clickhouse.Exception); ok { fmt.Printf("[%d] %s \n%s\n", exception.Code, exception.Message, exception.StackTrace) } else { fmt.Println(err) } return } _, err = connect.Exec(` CREATE TABLE IF NOT EXISTS example ( country_code FixedString(2), os_id UInt8, browser_id UInt8, categories Array(Int16), action_day Date, action_time DateTime ) engine=Memory `) if err != nil { log.Fatal(err) } var ( tx, _ = connect.Begin() stmt, _ = tx.Prepare("INSERT INTO example (country_code, os_id, browser_id, categories, action_day, action_time) VALUES (?, ?, ?, ?, ?, ?)") ) for i := 0; i < 100; i++ { if _, err := stmt.Exec( "RU", 10+i, 100+i, clickhouse.Array([]int16{1, 2, 3}), time.Now(), time.Now(), ); err != nil { log.Fatal(err) } } if err := tx.Commit(); err != nil { log.Fatal(err) } rows, err := connect.Query("SELECT country_code, os_id, browser_id, categories, action_day, action_time FROM example") if err != nil { log.Fatal(err) } for rows.Next() { var ( country string os, browser uint8 categories []int16 actionDay, actionTime time.Time ) if err := rows.Scan(&country, &os, &browser, &categories, &actionDay, &actionTime); err != nil { log.Fatal(err) } log.Printf("country: %s, os: %d, browser: %d, categories: %v, action_day: %s, action_time: %s", country, os, browser, categories, actionDay, actionTime) } if _, err := connect.Exec("DROP TABLE example"); err != nil { log.Fatal(err) } }
後續會講解clickhouse的go版本的客戶端庫以及flow項目中clickhouse的使用心得。