Ceilometer + Aodh + Gnocchi 介紹

時間 2019-11-18

標籤 ceilometer aodh gnocchi 介紹简体版

原文原文鏈接

1、 Ceilometerhtml

1. 概述

Openstack ceilometer主要用於監控虛擬機、服務（glance、image、network等）和事件。虛擬機的監控項主要包括CPU、磁盤、網絡、instance。linux

針對一個計量和監控系統須要考慮到的問題，可能都有哪些，打算上這樣一個系統的時候，主要有三個問題須要考慮，一個是數據收集的問題，計量和監控，收集不少可能各類各樣數據；另一個是存儲的問題，這麼大量的數據收集過來怎麼存儲，怎麼有效的查詢；第三個問題是報警，我收集過來這些數據以後，用這些數據作一些事情，好比報警，可以及時地通知到管理員。Telemetry（Ceilometer、Gnocchi、Aodh）這三個項目分別就是解決這三個問題的。數據庫

2. 基本概念

ceilometer 主要有下面幾個概念:後端

meter 是ceilometer定義的監控項，諸如內存佔用，網絡IO，磁盤IO等等
sample 是每一個採集時間點上meter對應的值
statistics 通常是統計學上某個週期內，meter對應的值(平均值之類)
resource 是被監控的資源對象，這個能夠是一臺虛擬機，一臺物理機或者一塊雲硬盤
alarm 是ceilometer的告警機制，你能夠經過閾值或者組合條件告警，並設置告警時觸發的action

3. 內部架構

3.1 ceilometer總體架構api

3.2進程介紹安全

l A compute agent (ceilometer-agent-compute)服務器

l A central agent (ceilometer-agent-central)網絡

l A notification agent (ceilometer-agent-notification)架構

l A collector (ceilometer-collector)併發

l An alarm evaluator (ceilometer-alarm-evaluator)

l An alarm notifier (ceilometer-alarm-notifier)

l An API server (ceilometer-api)

3.3 Gathering the data

3.4 Notification Agents: Listening for data

3.5 Polling Agents: Asking for data

3.6 Processing the data

3.7 Transforming the data

3.8 Publishing the data

3.9 Storing the data

3.10 Accessing the data

2、 Gnocchi

1. 簡述

Gnocchi的提出是爲了解決Ceilometer性能問題，Ceilometer早期的時候數據模型設計的不是特別好，致使針對這幾個數據庫的性能都不太好。

Ceilometer由於當初提出得很早，又作過不少轉變，致使他數據很靈活。可是性能有些降低。因此專門作了Gnocchi這樣一個項目解決他這個問題。這個是Gnocchi對解決問題的一個抽象，抽象出了Resource和Metric兩個概念。剛剛提到Ceilometer是兩種數據，一個是 Resource，一個是時間序列的數據，即Metric，Gnocchi主要就是來存儲這些數據的，而Resource是索引，是Resoruce到 Metric的索引，這邊是隨着時間增加，對這些時間數據的索引，就是有哪些資源在Metric作這個事情。

2. 基本概念

2.1 Metric

指資源的某一項指標。如,主機的cpu，disk。

2.2 Measures

metric的檢測的具體數據。

2.3 Resource

是被監控的資源對象，這個能夠是一臺虛擬機，一臺物理機或者一塊雲硬盤

2.4 Resource Type

是用來管理resource的，只有預先定義resource的type，ceilometer collector註冊資源的時候，才能夠成功。Gnocchiclient沒有提供建立type的cli。目前官方提供類型，請查看：http://docs.openstack.org/developer/gnocchi/resource_types.html

2.5 Archive Policy

metric的檢測數據的存儲規則。

l Granulariy 處理數據的時間粒度。多長時間metric處理一次聚合數據。

l Points 保存的數據數量。

l timespan = points x granularity

l back_window metricd只處理時間戳在將來或在最近一次處理聚合週期內的檢測數據。若是須要處理比當前處理數據週期老的數據，須要去設back_window。

For example, if an archive policy is defined with coarsest aggregation of 1 hour, and the last point processed has a timestamp of 14:34, it’s possible to process measures back to 14:00 with a back_window of 0. If the back_window is set to 2, it will be possible to send measures with timestamp back to 12:00 (14:00 minus 2 times 1 hour).

l aggregation_methods 執行gnocchi capabilities list能夠查看當前支持的聚合方法。在配置文件中，也能夠配置default_aggregation_methods。

2.6 Archive Policy Rule

經過建立archive policy rule，模式匹配的方式把archive policy分配給metric

3. 總體架構

gnocchi的後端服務分紅三部分：

l gnocchi-api

a HTTP REST API 以wsgi方式啓動的api服務。（metric/resource resource type/archive policy/archive policy rule的操做+measures的保存）

l gnocchi-statsd

（udp數據接收，後續會支持tcp數據接收，當前沒有經過udp發送數據，這裏應該是空跑）

StatsD是用來收集數據的，收集完數據就發送到其餘服務器進行處理，這裏是存放到tmp目錄下。

l gnocchi-metricd

MetricProcessor核心服務，處理數據的聚合，清理等動做。

4. 數據存儲

4.1 簡單介紹

將數據裂化成兩部分（index和storage）：

l index driver：存資源索引值（resource）。

l storage driver：存時間時間序列上的數據值（metric）。

支持多種存儲方式：

n File

n Swift

n Ceph (preferred)

n InfluxDB (experimental)

前三種，經過gnocchi的本身寫的一個叫Carbonara（做者就是個吃貨有沒有）的類庫支持

InfluxDB 自己就是一個時間序列的數據庫，可是當前對接還處於試驗階段，bug較多

4.2 Data split

在Gnocchi 1.3以前，一個metric的數據是存儲到一個對應的文件中（每個聚合方法對應一個文件）。其實，就是從聚合方法的維度去劃分數據。在2.0版本以後，在此基礎上，經過配置point的大小，來劃分塊。增長了CRUD的併發。

4.3 Data compression

the suite of timestamps timestamps = [41230, 41235, 41240, 41250, 41255] is encoded into timestamps = [41230, 1, 1, 2, 1], interval = 5

To actually compress the values, I tried two

different algorithms:LZ4 XOR

4.4 Gnocchi aggregation mechanism

4.5 總結

1. 對存儲的數據進行了分類處理。

2. 對數據進行壓縮存儲

3. 只存儲處理後的聚合數據，原始數據刪除。

4. 使用分佈式存儲，易於存儲的擴展。

5. 性能提高

5.1 測試環境

硬件：2 臺（2×Intel Xeon E5-2609 v3 (12 cores in total) 32 GB of RAM）

一個作gnocchi 一個client

軟件：RHEL 7 disable OpenStack components

PostsgreSQL indexer

file storage driver

The OpenStack Keystone authentication middleware was not enabled in this setup

5.2 Metric CRUD operations

5.3 Sending and getting measures

5.4 Comparison with Ceilometer

Most Gnocchi operations are O(log R) where R is the number of metrics or resources, whereas most Ceilometer operations are O(log S) where S is the number of samples (measures). Since is R millions of time smaller than S, Gnocchi gets to be much faster.

3、 Aodh

1. Aodh把告警和事件分開處理，使告警的檢測和響應更加及時。

l An API server (aodh-api). 爲告警數據的存儲和訪問提供藉口。

l An alarm evaluator (aodh-evaluator). 根據統計的數據，來評估是否須要觸發告警.

l A notification listener (aodh-listener). 監聽事件，觸發事件相關的告警.

l An alarm notifier (aodh-notifier). 根據配置的告警方式，發出告警.

Ceilometer-alarm

總結

Ceilometer-alarm有如下問題，而aodh正是解決了如下三個問題。

1）目前有一個比較糾結的問題就是alarm和ceilometer的關係，雖然alarm的代碼寫在ceilometer的代碼樹中，其實，他們兩個並無緊密的關係，alarm是ceilometer api的消費者，把他們兩個分開也是徹底能夠的。

2）目前alarm是ceilometer api的消費者，每一個alarm被檢查的時間間隔是60s，當alarm數量不少的時候，會給api形成比較大的壓力，因此有人提議讓alarm直接訪問數據庫。

3）目前，有的使用ceilometer做爲billing服務，可是alarm和billing使用的同一個數據庫，這無形中有了一些安全隱患，並且alarm和billing這兩個對數據的時效性要求還不同，alarm可能只須要最近一段時間的數據，而billing則要求數據保持較長的時間，因此這致使db-ttl也比較難作。

4、 多進程pdb調試

1.在代碼中，添加下面的類：

import sys

import pdb

class ForkedPdb(pdb.Pdb):

"""A Pdb subclass that may be used

from a forked multiprocessing child

"""

def interaction(self, *args, **kwargs):

_stdin = sys.stdin

try:

sys.stdin = file('/dev/stdin')