flow的採集與分析---clickhouse的簡介和安裝

前言

最近在參與網絡flow採集和分析的項目。主要是遵循netflow和sflow協議,完成對防火牆和核心交換機的流量的採集和存儲以及後續分析。flow併發量和數據量都比較大,存儲是瓶頸。最開始存儲到prometheus和以後測試的infulxdb的方案均宣告失敗。看來prometheus仍是不適合大數據量的處理,若是數據量過大,須要考慮聯邦模式了。
在調研社區相關項目vflow後,準備測試clickhouse的存儲方案。
圖片描述
採集端咱們並無採用vflow,而是對telegraf寫了專門的針對flow的input插件。而後輸出到kafka集羣當中。而後消費者從kafka獲取數據存儲到clickhouse,便於之後的分析。
ClickHouse是一個很是好的分析列式數據庫選擇,性能比較強勁。官方提供了很對與主流數據庫的性能對比,你們能夠了解更加詳細的測試報告php

安裝過程

下載所需的rpm包

下載地址
圖片描述
共5個包。html

解決依賴

安裝server過程當中,出現如下錯誤:linux

rpm -ivh clickhouse-server-1.1.54236-4.el7.x86_64.rpm
error: Failed dependencies:
    libicudata.so.50()(64bit) is needed by clickhouse-server-1.1.54236-4.el7.x86_64
    libicui18n.so.50()(64bit) is needed by clickhouse-server-1.1.54236-4.el7.x86_64
    libicuuc.so.50()(64bit) is needed by clickhouse-server-1.1.54236-4.el7.x86_64
    libltdl.so.7()(64bit) is needed by clickhouse-server-1.1.54236-4.el7.x86_64
    libodbc.so.2()(64bit) is needed by clickhouse-server-1.1.54236-4.el7.x86_64

首先經過執行下面語句解決前三個報錯git

yum install libicu-devel

而後下載libtool-ltdl-2.4.2-22.el7_3.x86_64.rpmgithub

rpm -ivh libtool-ltdl-2.4.2-22.el7_3.x86_64.rpm

而後下載unixODBC-2.3.1-11.el7.x86_64.rpmgolang

rpm -ivh unixODBC-2.3.1-11.el7.x86_64.rpm

安裝

1算法

rpm -ivh clickhouse-server-common-1.1.54236-4.el7.x86_64.rpm
Preparing...                          ################################# [100%]
Updating / installing...
   1:clickhouse-server-common-1.1.5423################################# [100%]

2sql

rpm -ivh clickhouse-server-1.1.54236-4.el7.x86_64.rpm
Preparing...                          ################################# [100%]
Updating / installing...
   1:clickhouse-server-1.1.54236-4.el7################################# [100%]

3數據庫

rpm -ivh clickhouse-debuginfo-1.1.54236-4.el7.x86_64.rpm
Preparing...                          ################################# [100%]
Updating / installing...
   1:clickhouse-debuginfo-1.1.54236-4.################################# [100%]

4centos

rpm -ivh clickhouse-client-1.1.54236-4.el7.x86_64.rpm
Preparing...                          ################################# [100%]
Updating / installing...
   1:clickhouse-client-1.1.54236-4.el7################################# [100%]

5

rpm -ivh clickhouse-compressor-1.1.54236-4.el7.x86_64.rpm
Preparing...                          ################################# [100%]
Updating / installing...
   1:clickhouse-compressor-1.1.54236-4################################# [100%]

啓動

啓動服務

clickhouse-server --config-file=/etc/clickhouse-server/config.xml
Include not found: clickhouse_remote_servers
Include not found: clickhouse_compression
2018.03.19 17:17:25.113898 [ 1 ] <Warning> Application: Logging to console
2018.03.19 17:17:25.117332 [ 1 ] <Information> : Starting daemon with revision 54236
2018.03.19 17:17:25.117444 [ 1 ] <Information> Application: starting up
2018.03.19 17:17:25.118273 [ 1 ] <Debug> Application: rlimit on number of file descriptors is 1024000
2018.03.19 17:17:25.118299 [ 1 ] <Debug> Application: Initializing DateLUT.
2018.03.19 17:17:25.118307 [ 1 ] <Trace> Application: Initialized DateLUT with time zone `Asia/Shanghai'.
2018.03.19 17:17:25.120309 [ 1 ] <Debug> Application: Configuration parameter 'interserver_http_host' doesn't exist or exists and empty. Will use 'xxxx' as replica host.
2018.03.19 17:17:25.120471 [ 1 ] <Debug> ConfigReloader: Loading config `/etc/clickhouse-server/users.xml'
2018.03.19 17:17:25.125606 [ 1 ] <Warning> ConfigProcessor: Include not found: networks
2018.03.19 17:17:25.125636 [ 1 ] <Warning> ConfigProcessor: Include not found: networks
2018.03.19 17:17:25.126753 [ 1 ] <Information> Application: Loading metadata.
2018.03.19 17:17:25.127259 [ 1 ] <Information> DatabaseOrdinary (default): Total 0 tables.
2018.03.19 17:17:25.127348 [ 1 ] <Information> DatabaseOrdinary (system): Total 0 tables.
2018.03.19 17:17:25.127894 [ 1 ] <Debug> Application: Loaded metadata.
2018.03.19 17:17:25.128699 [ 1 ] <Information> Application: Listening http://[::1]:8123
2018.03.19 17:17:25.128749 [ 1 ] <Information> Application: Listening tcp: [::1]:9000
2018.03.19 17:17:25.128783 [ 1 ] <Information> Application: Listening interserver: [::1]:9009
2018.03.19 17:17:25.128816 [ 1 ] <Information> Application: Listening http://10.xx.xx.136:8123
2018.03.19 17:17:25.128845 [ 1 ] <Information> Application: Listening tcp: 10.xx.xx.136:9000
2018.03.19 17:17:25.128872 [ 1 ] <Information> Application: Listening interserver: 10.xx.xx.136:9009
2018.03.19 17:17:25.129116 [ 1 ] <Information> Application: Ready for connections.
2018.03.19 17:17:27.120687 [ 2 ] <Debug> ConfigReloader: Loading config `/etc/clickhouse-server/config.xml'
2018.03.19 17:17:27.127614 [ 2 ] <Warning> ConfigProcessor: Include not found: clickhouse_remote_servers
2018.03.19 17:17:27.127701 [ 2 ] <Warning> ConfigProcessor: Include not found: clickhouse_compression

客戶端鏈接

clickhouse-client --host=10.xx.xx.136  --port=9000
ClickHouse client version 1.1.54236.
Connecting to 10.xx.xx.136:9000.
Connected to ClickHouse server version 1.1.54236.

:)
:)
:)
:)
:)
:) show tables;

SHOW TABLES

Ok.

0 rows in set. Elapsed: 0.011 sec.

:)

簡單操做測試

:) select now()

SELECT now()

┌───────────────now()─┐
│ 2018-03-19 17:22:55 │
└─────────────────────┘

1 rows in set. Elapsed: 0.002 sec.

systemd守護進程服務

/etc/systemd/system/clickhouse.service
[Unit]
Description=clickhouse
After=syslog.target
After=network.target

[Service]
LimitAS=infinity
LimitRSS=infinity
LimitCORE=infinity
LimitNOFILE=65536
User=root
Type=simple
Restart=on-failure
KillMode=control-group
ExecStart=/usr/bin/clickhouse-server --config-file=/etc/clickhouse-server/config.xml
RestartSec=10s

[Install]
WantedBy=multi-user.target

性能測試

硬件配置:

  • CPU Intel Core Processor (Haswell, no TSX) cores = 8, 2.6GHz, x86_64
  • Memory 16G
  • Drive SSD in software RAID

圖片描述
圖片描述

ClickHouse列數據庫的Golang 驅動

kafka的消費者須要將數據寫入到clickhouse數據庫中,因爲咱們的技術棧主要爲golang,因此須要一個golang版本的ClickHouse的驅動。本章就重點介紹一個開源的驅動

關鍵特性

  • 使用原生 ClickHouse tcp client-server 協議
  • 兼容 database/sql 庫
  • 實現了輪訓算法的負載均衡

DSN

  • username/password - auth credentials
  • database - select the current default database
  • read_timeout/write_timeout - timeout in second
  • no_delay - disable/enable the Nagle Algorithm for tcp socket (default
    is 'true' - disable)
  • alt_hosts - comma separated list of single address host for
    load-balancing
  • connection_open_strategy - random/in_order (default random).

    • random - choose random server from set
    • in_order - first live server is choosen in specified order
  • block_size - maximum rows in block (default is 1000000). If the rows
    are larger then the data will be split into several blocks to send
    them to the server
  • debug - enable debug output (boolean value)

SSL/TLS 參數

  • secure - 創建安全鏈接,默認爲false
  • skip_verify - 跳過安全認證 默認是true

example

tcp://host1:9000?username=user&password=qwerty&database=clicks&read_timeout=10&write_timeout=20&alt_hosts=host2:9000,host3:9000

支持的數據類型

  • UInt8, UInt16, UInt32, UInt64, Int8, Int16, Int32, Int64
  • Float32, Float64
  • String
  • FixedString(N)
  • Date
  • DateTime
  • Enum
  • UUID
  • Nullable(T)
  • Array(T) (one-dimensional) godoc

Install

go get -u github.com/kshvakov/clickhouse

示例

package main

import (
    "database/sql"
    "fmt"
    "log"
    "time"

    "github.com/kshvakov/clickhouse"
)

func main() {
    connect, err := sql.Open("clickhouse", "tcp://127.0.0.1:9000?debug=true")
    if err != nil {
        log.Fatal(err)
    }
    if err := connect.Ping(); err != nil {
        if exception, ok := err.(*clickhouse.Exception); ok {
            fmt.Printf("[%d] %s \n%s\n", exception.Code, exception.Message, exception.StackTrace)
        } else {
            fmt.Println(err)
        }
        return
    }

    _, err = connect.Exec(`
        CREATE TABLE IF NOT EXISTS example (
            country_code FixedString(2),
            os_id        UInt8,
            browser_id   UInt8,
            categories   Array(Int16),
            action_day   Date,
            action_time  DateTime
        ) engine=Memory
    `)

    if err != nil {
        log.Fatal(err)
    }
    var (
        tx, _   = connect.Begin()
        stmt, _ = tx.Prepare("INSERT INTO example (country_code, os_id, browser_id, categories, action_day, action_time) VALUES (?, ?, ?, ?, ?, ?)")
    )

    for i := 0; i < 100; i++ {
        if _, err := stmt.Exec(
            "RU",
            10+i,
            100+i,
            clickhouse.Array([]int16{1, 2, 3}),
            time.Now(),
            time.Now(),
        ); err != nil {
            log.Fatal(err)
        }
    }

    if err := tx.Commit(); err != nil {
        log.Fatal(err)
    }

    rows, err := connect.Query("SELECT country_code, os_id, browser_id, categories, action_day, action_time FROM example")
    if err != nil {
        log.Fatal(err)
    }

    for rows.Next() {
        var (
            country               string
            os, browser           uint8
            categories            []int16
            actionDay, actionTime time.Time
        )
        if err := rows.Scan(&country, &os, &browser, &categories, &actionDay, &actionTime); err != nil {
            log.Fatal(err)
        }
        log.Printf("country: %s, os: %d, browser: %d, categories: %v, action_day: %s, action_time: %s", country, os, browser, categories, actionDay, actionTime)
    }

    if _, err := connect.Exec("DROP TABLE example"); err != nil {
        log.Fatal(err)
    }
}

總結

後續會講解clickhouse的go版本的客戶端庫以及flow項目中clickhouse的使用心得。

相關文章
相關標籤/搜索