(轉)InFluxDB數據庫使用手冊

InfluxDB是一個開源的時序數據庫,使用GO語言開發,特別適合用於處理和分析資源監控數據這種時序相關數據。而InfluxDB自帶的各類特殊函數如求標準差,隨機取樣數據,統計數據變化比等,使數據統計和實時分析變得十分方便。在咱們的容器資源監控系統中,就採用了InfluxDB存儲cadvisor的監控數據。本文對InfluxDB的基本概念和一些特點功能作一個詳細介紹,內容主要是翻譯整理自官網文檔,若有錯漏,請指正。mysql

1 安裝配置

這裏說一下使用docker容器運行influxdb的步驟,物理機安裝請參照官方文檔。拉取鏡像文件後運行便可,當前最新版本是1.3.5。啓動容器時設置掛載的數據目錄和開放端口。InfluxDB的操做語法InfluxQL與SQL基本一致,也提供了一個相似mysql-client的名爲influx的CLI。InfluxDB自己是支持分佈式部署多副本存儲的,本文介紹都是針對的單節點單副本。sql

# docker pull influxdb
# docker run -idt --name influxdb -p 8086:8086 -v /Users/ssj/influxdb:/var/lib/influxdb influxdb
f216e9be15bff545befecb30d1d275552026216a939cc20c042b17419e3bde31
# docker exec -it influxdb /bin/bash 
root@f216e9be15bf:/# influx
Connected to http://localhost:8086 version 1.3.5
InfluxDB shell version: 1.3.5
> create database cadvisor  ## 建立數據庫cadvisor
> show databases           
name: databases
name
----
_internal
cadvisor
> CREATE USER testuser WITH PASSWORD 'testpwd' ## 建立用戶和設置密碼
> GRANT ALL PRIVILEGES ON cadvisor TO testuser ## 受權數據庫給指定用戶
> CREATE RETENTION POLICY "cadvisor_retention" ON "cadvisor" DURATION 30d REPLICATION 1 DEFAULT ## 建立默認的數據保留策略,設置保存時間30天,副本爲1

2 重要概念

influxdb裏面有一些重要概念:database,timestamp,field key, field value, field set,tag key,tag value,tag set,measurement, retention policy ,series,point。結合下面的例子數據來講明這幾個概念:docker

name: census
-————————————
time                     butterflies     honeybees     location   scientist
2015-08-18T00:00:00Z      12                23           1         langstroth
2015-08-18T00:00:00Z      1                 30           1         perpetua
2015-08-18T00:06:00Z      11                28           1         langstroth
2015-08-18T00:06:00Z      3                 28           1         perpetua
2015-08-18T05:54:00Z      2                 11           2         langstroth
2015-08-18T06:00:00Z      1                 10           2         langstroth
2015-08-18T06:06:00Z      8                 23           2         perpetua
2015-08-18T06:12:00Z      7                 22           2         perpetua

timestamp

既然是時間序列數據庫,influxdb的數據都有一列名爲time的列,裏面存儲UTC時間戳。shell

field key,field value,field set

butterflies和honeybees兩列數據稱爲字段(fields),influxdb的字段由field key和field value組成。其中butterflies和honeybees爲field key,它們爲string類型,用於存儲元數據。數據庫

而butterflies這一列的數據12-7爲butterflies的field value,同理,honeybees這一列的23-22爲honeybees的field value。field value能夠爲string,float,integer或boolean類型。field value一般都是與時間關聯的。bash

field key和field value對組成的集合稱之爲field set。以下:服務器

butterflies = 12 honeybees = 23
butterflies = 1 honeybees = 30
butterflies = 11 honeybees = 28
butterflies = 3 honeybees = 28
butterflies = 2 honeybees = 11
butterflies = 1 honeybees = 10
butterflies = 8 honeybees = 23
butterflies = 7 honeybees = 22

在influxdb中,字段必須存在。注意,字段是沒有索引的。若是使用字段做爲查詢條件,會掃描符合查詢條件的全部字段值,性能不及tag。類比一下,fields至關於SQL的沒有索引的列。curl

tag key,tag value,tag set

location和scientist這兩列稱爲標籤(tags),標籤由tag key和tag value組成。location這個tag key有兩個tag value:1和2,scientist有兩個tag value:langstroth和perpetua。tag key和tag value對組成了tag set,示例中的tag set以下:分佈式

location = 1, scientist = langstroth
location = 2, scientist = langstroth
location = 1, scientist = perpetua
location = 2, scientist = perpetua

tags是可選的,可是強烈建議你用上它,由於tag是有索引的,tags至關於SQL中的有索引的列。tag value只能是string類型 若是你的經常使用場景是根據butterflies和honeybees來查詢,那麼你能夠將這兩個列設置爲tag,而其餘兩列設置爲field,tag和field依據具體查詢需求來定。函數

measurement

measurement是fields,tags以及time列的容器,measurement的名字用於描述存儲在其中的字段數據,相似mysql的表名。如上面例子中的measurement爲census。measurement至關於SQL中的表,本文中我在部分地方會用表來指代measurement。

retention policy

retention policy指數據保留策略,示例數據中的retention policy爲默認的autogen。它表示數據一直保留永不過時,副本數量爲1。你也能夠指定數據的保留時間,如30天。

series

series是共享同一個retention policy,measurement以及tag set的數據集合。示例中數據有4個series,以下:

Arbitrary series number Retention policy Measurement Tag set
series 1 autogen census location = 1,scientist = langstroth
series 2 autogen census location = 2,scientist = langstroth
series 3 autogen census location = 1,scientist = perpetua
series 4 autogen census location = 2,scientist = perpetua

point

point則是同一個series中具備相同時間的field set,points至關於SQL中的數據行。以下面就是一個point:

name: census
-----------------
time                  butterflies    honeybees   location    scientist
2015-08-18T00:00:00Z       1            30           1        perpetua

database

上面提到的結構都存儲在數據庫中,示例的數據庫爲my_database。一個數據庫能夠有多個measurement,retention policy, continuous queries以及user。influxdb是一個無模式的數據庫,能夠很容易的添加新的measurement,tags,fields等。而它的操做卻和傳統的數據庫同樣,可使用類SQL語言查詢和修改數據。

influxdb不是一個完整的CRUD數據庫,它更像是一個CR-ud數據庫。它優先考慮的是增長和讀取數據而不是更新和刪除數據的性能,並且它阻止了某些更新和刪除行爲使得建立和讀取數據更加高效。

3 特點函數

influxdb函數分爲聚合函數,選擇函數,轉換函數,預測函數等。除了與普通數據庫同樣提供了基本操做函數外,還提供了一些特點函數以方便數據統計計算,下面會一一介紹其中一些經常使用的特點函數。

  • 聚合函數:FILL(), INTEGRAL()SPREAD()STDDEV()MEAN(), MEDIAN()等。
  • 選擇函數: SAMPLE(), PERCENTILE(), FIRST(), LAST(), TOP(), BOTTOM()等。
  • 轉換函數: DERIVATIVE(), DIFFERENCE()等。
  • 預測函數:HOLT_WINTERS()

先從官網導入測試數據(注:這裏測試用的版本是1.3.1,最新版本是1.3.5):

$ curl https://s3.amazonaws.com/noaa.water-database/NOAA_data.txt -o NOAA_data.txt
$ influx -import -path=NOAA_data.txt -precision=s -database=NOAA_water_database
$ influx -precision rfc3339 -database NOAA_water_database
Connected to http://localhost:8086 version 1.3.1
InfluxDB shell 1.3.1
> show measurements
name: measurements
name
----
average_temperature
distincts
h2o_feet
h2o_pH
h2o_quality
h2o_temperature

> show series from h2o_feet;
key
---
h2o_feet,location=coyote_creek
h2o_feet,location=santa_monica

下面的例子都以官方示例數據庫來測試,這裏只用部分數據以方便觀察。measurement爲h2o_feet,tag key爲location,field key有level descriptionwater_level兩個。

> SELECT * FROM "h2o_feet" WHERE time >= '2015-08-17T23:48:00Z' AND time <= '2015-08-18T00:30:00Z'
name: h2o_feet
time                 level description    location     water_level
----                 -----------------    --------     -----------
2015-08-18T00:00:00Z between 6 and 9 feet coyote_creek 8.12
2015-08-18T00:00:00Z below 3 feet         santa_monica 2.064
2015-08-18T00:06:00Z between 6 and 9 feet coyote_creek 8.005
2015-08-18T00:06:00Z below 3 feet         santa_monica 2.116
2015-08-18T00:12:00Z between 6 and 9 feet coyote_creek 7.887
2015-08-18T00:12:00Z below 3 feet         santa_monica 2.028
2015-08-18T00:18:00Z between 6 and 9 feet coyote_creek 7.762
2015-08-18T00:18:00Z below 3 feet         santa_monica 2.126
2015-08-18T00:24:00Z between 6 and 9 feet coyote_creek 7.635
2015-08-18T00:24:00Z below 3 feet         santa_monica 2.041
2015-08-18T00:30:00Z between 6 and 9 feet coyote_creek 7.5
2015-08-18T00:30:00Z below 3 feet         santa_monica 2.051

GROUP BY,FILL()

以下語句中GROUP BY time(12m),* 表示以每12分鐘和tag(location)分組(若是是GROUP BY time(12m)則表示僅每12分鐘分組,GROUP BY 參數只能是time和tag)。而後fill(200)表示若是這個時間段沒有數據,以200填充,mean(field_key)求該範圍內數據的平均值(注意:這是依據series來計算。其餘還有SUM求和,MEDIAN求中位數)。LIMIT 7表示限制返回的point(記錄數)最多爲7條,而SLIMIT 1則是限制返回的series爲1個。

注意這裏的時間區間,起始時間爲整點前包含這個區間第一個12m的時間,好比這裏爲 2015-08-17T:23:48:00Z,第一條爲 2015-08-17T23:48:00Z <= t < 2015-08-18T00:00:00Z這個區間的location=coyote_creekwater_level的平均值,這裏沒有數據,因而填充的200。第二條爲 2015-08-18T00:00:00Z <= t < 2015-08-18T00:12:00Z區間的location=coyote_creekwater_level平均值,這裏爲 (8.12+8.005)/ 2 = 8.0625,其餘以此類推。

GROUP BY time(10m)則表示以10分鐘分組,起始時間爲包含這個區間的第一個10m的時間,即 2015-08-17T23:40:00Z。默認返回的是第一個series,若是要計算另外那個series,能夠在SQL語句後面加上 SOFFSET 1

那若是時間小於數據自己採集的時間間隔呢,好比GROUP BY time(10s)呢?這樣的話,就會按10s取一個點,沒有數值的爲空或者FILL填充,對應時間點有數據則保持不變。

## GROUP BY time(12m)
> SELECT mean("water_level") FROM "h2o_feet" WHERE time >= '2015-08-17T23:48:00Z' AND time <= '2015-08-18T00:30:00Z' GROUP BY time(12m),* fill(200) LIMIT 7 SLIMIT 1
name: h2o_feet
tags: location=coyote_creek
time                 mean
----                 ----
2015-08-17T23:48:00Z 200
2015-08-18T00:00:00Z 8.0625
2015-08-18T00:12:00Z 7.8245
2015-08-18T00:24:00Z 7.5675

## GROUP BY time(10m),SOFFSET設置爲1
> SELECT mean("water_level") FROM "h2o_feet" WHERE time >= '2015-08-17T23:48:00Z' AND time <= '2015-08-18T00:30:00Z' GROUP BY time(10m),* fill(200) LIMIT 7 SLIMIT 1 SOFFSET 1
name: h2o_feet
tags: location=santa_monica
time                 mean
----                 ----
2015-08-17T23:40:00Z 200
2015-08-17T23:50:00Z 200
2015-08-18T00:00:00Z 2.09
2015-08-18T00:10:00Z 2.077
2015-08-18T00:20:00Z 2.041
2015-08-18T00:30:00Z 2.051

INTEGRAL(field_key, unit)

計算數值字段值覆蓋的曲面的面積值並獲得面積之和。測試數據以下:

> SELECT "water_level" FROM "h2o_feet" WHERE "location" = 'santa_monica' AND time >= '2015-08-18T00:00:00Z' AND time <= '2015-08-18T00:30:00Z'

name: h2o_feet
time                   water_level
----                   -----------
2015-08-18T00:00:00Z   2.064
2015-08-18T00:06:00Z   2.116
2015-08-18T00:12:00Z   2.028
2015-08-18T00:18:00Z   2.126
2015-08-18T00:24:00Z   2.041
2015-08-18T00:30:00Z   2.051

使用INTERGRAL計算面積。注意,這個面積就是這些點鏈接起來後與時間圍成的不規則圖形的面積,注意unit默認是以1秒計算,因此下面語句計算結果爲3732.66=2.028*1800+分割出來的梯形和三角形面積。若是unit改成1分,則結果爲3732.66/60 = 62.211。unit爲2分,則結果爲3732.66/120 = 31.1055。以此類推。

# unit爲默認的1秒
> SELECT INTEGRAL("water_level") FROM "h2o_feet" WHERE "location" = 'santa_monica' AND time >= '2015-08-18T00:00:00Z' AND time <= '2015-08-18T00:30:00Z'
name: h2o_feet
time                 integral
----                 --------
1970-01-01T00:00:00Z 3732.66

# unit爲1分
> SELECT INTEGRAL("water_level", 1m) FROM "h2o_feet" WHERE "location" = 'santa_monica' AND time >= '2015-08-18T00:00:00Z' AND time <= '2015-08-18T00:30:00Z'
name: h2o_feet
time                 integral
----                 --------
1970-01-01T00:00:00Z 62.211

SPREAD(field_key)

計算數值字段的最大值和最小值的差值。

> SELECT SPREAD("water_level") FROM "h2o_feet" WHERE time >= '2015-08-17T23:48:00Z' AND time <= '2015-08-18T00:30:00Z' GROUP BY time(12m),* fill(18) LIMIT 3 SLIMIT 1 SOFFSET 1
name: h2o_feet
tags: location=santa_monica
time                 spread
----                 ------
2015-08-17T23:48:00Z 18
2015-08-18T00:00:00Z 0.052000000000000046
2015-08-18T00:12:00Z 0.09799999999999986

STDDEV(field_key)

計算字段的標準差。influxdb用的是貝塞爾修正的標準差計算公式 ,以下:

  • mean=(v1+v2+...+vn)/n;
  • stddev = math.sqrt(
    ((v1-mean)2 + (v2-mean)2 + ...+(vn-mean)2)/(n-1)
    )
> SELECT STDDEV("water_level") FROM "h2o_feet" WHERE time >= '2015-08-17T23:48:00Z' AND time <= '2015-08-18T00:30:00Z' GROUP BY time(12m),* fill(18) SLIMIT 1;
name: h2o_feet
tags: location=coyote_creek
time                 stddev
----                 ------
2015-08-17T23:48:00Z 18
2015-08-18T00:00:00Z 0.08131727983645186
2015-08-18T00:12:00Z 0.08838834764831845
2015-08-18T00:24:00Z 0.09545941546018377

PERCENTILE(field_key, N)

選取某個字段中大於N%的這個字段值。

若是一共有4條記錄,N爲10,則10%4=0.4,四捨五入爲0,則查詢結果爲空。N爲20,則 20% 4 = 0.8,四捨五入爲1,選取的是4個數中最小的數。若是N爲40,40% * 4 = 1.6,四捨五入爲2,則選取的是4個數中第二小的數。由此能夠看出N=100時,就跟MAX(field_key)是同樣的,而當N=50時,與MEDIAN(field_key)在字段值爲奇數個時是同樣的。

> SELECT PERCENTILE("water_level",20) FROM "h2o_feet" WHERE time >= '2015-08-17T23:48:00Z' AND time <= '2015-08-18T00:30:00Z' GROUP BY time(12m)
name: h2o_feet
time                 percentile
----                 ----------
2015-08-17T23:48:00Z 
2015-08-18T00:00:00Z 2.064
2015-08-18T00:12:00Z 2.028
2015-08-18T00:24:00Z 2.041

> SELECT PERCENTILE("water_level",40) FROM "h2o_feet" WHERE time >= '2015-08-17T23:48:00Z' AND time <= '2015-08-18T00:30:00Z' GROUP BY time(12m)
name: h2o_feet
time                 percentile
----                 ----------
2015-08-17T23:48:00Z 
2015-08-18T00:00:00Z 2.116
2015-08-18T00:12:00Z 2.126
2015-08-18T00:24:00Z 2.051

SAMPLE(field_key, N)

隨機返回field key的N個值。若是語句中有GROUP BY time(),則每組數據隨機返回N個值。

> SELECT SAMPLE("water_level",2) FROM "h2o_feet" WHERE time >= '2015-08-17T23:48:00Z' AND time <= '2015-08-18T00:30:00Z';
name: h2o_feet
time                 sample
----                 ------
2015-08-18T00:00:00Z 2.064
2015-08-18T00:12:00Z 2.028

> SELECT SAMPLE("water_level",2) FROM "h2o_feet" WHERE time >= '2015-08-17T23:48:00Z' AND time <= '2015-08-18T00:30:00Z' GROUP BY time(12m);
name: h2o_feet
time                 sample
----                 ------
2015-08-18T00:06:00Z 2.116
2015-08-18T00:06:00Z 8.005
2015-08-18T00:12:00Z 7.887
2015-08-18T00:18:00Z 7.762
2015-08-18T00:24:00Z 7.635
2015-08-18T00:30:00Z 2.051

CUMULATIVE_SUM(field_key)

計算字段值的遞增和。

> SELECT CUMULATIVE_SUM("water_level") FROM "h2o_feet" WHERE time >= '2015-08-17T23:48:00Z' AND time <= '2015-08-18T00:30:00Z';
name: h2o_feet
time                 cumulative_sum
----                 --------------
2015-08-18T00:00:00Z 8.12
2015-08-18T00:00:00Z 10.184
2015-08-18T00:06:00Z 18.189
2015-08-18T00:06:00Z 20.305
2015-08-18T00:12:00Z 28.192
2015-08-18T00:12:00Z 30.22
2015-08-18T00:18:00Z 37.982
2015-08-18T00:18:00Z 40.108
2015-08-18T00:24:00Z 47.742999999999995
2015-08-18T00:24:00Z 49.78399999999999
2015-08-18T00:30:00Z 57.28399999999999
2015-08-18T00:30:00Z 59.334999999999994

DERIVATIVE(field_key, unit) 和 NON_NEGATIVE_DERIVATIVE(field_key, unit)

計算字段值的變化比。unit默認爲1s,即計算的是1秒內的變化比。

以下面的第一個數據計算方法是 (2.116-2.064)/(6*60) = 0.00014..,其餘計算方式同理。雖然原始數據是6m收集一次,可是這裏的變化比默認是按秒來計算的。若是要按6m計算,則設置unit爲6m便可。

> SELECT DERIVATIVE("water_level") FROM "h2o_feet" WHERE "location" = 'santa_monica' AND time >= '2015-08-18T00:00:00Z' AND time <= '2015-08-18T00:30:00Z'
name: h2o_feet
time                 derivative
----                 ----------
2015-08-18T00:06:00Z 0.00014444444444444457
2015-08-18T00:12:00Z -0.00024444444444444465
2015-08-18T00:18:00Z 0.0002722222222222218
2015-08-18T00:24:00Z -0.000236111111111111
2015-08-18T00:30:00Z 0.00002777777777777842

> SELECT DERIVATIVE("water_level", 6m) FROM "h2o_feet" WHERE "location" = 'santa_monica' AND time >= '2015-08-18T00:00:00Z' AND time <= '2015-08-18T00:30:00Z'
name: h2o_feet
time                 derivative
----                 ----------
2015-08-18T00:06:00Z 0.052000000000000046
2015-08-18T00:12:00Z -0.08800000000000008
2015-08-18T00:18:00Z 0.09799999999999986
2015-08-18T00:24:00Z -0.08499999999999996
2015-08-18T00:30:00Z 0.010000000000000231

而DERIVATIVE結合GROUP BY time,以及mean能夠構造更加複雜的查詢,以下所示:

> SELECT DERIVATIVE(mean("water_level"), 6m) FROM "h2o_feet" WHERE time >= '2015-08-18T00:00:00Z' AND time <= '2015-08-18T00:30:00Z' group by time(12m), *
name: h2o_feet
tags: location=coyote_creek
time                 derivative
----                 ----------
2015-08-18T00:12:00Z -0.11900000000000022
2015-08-18T00:24:00Z -0.12849999999999984

name: h2o_feet
tags: location=santa_monica
time                 derivative
----                 ----------
2015-08-18T00:12:00Z -0.00649999999999995
2015-08-18T00:24:00Z -0.015499999999999847

這個計算實際上是先根據GROUP BY time求平均值,而後對這個平均值再作變化比的計算。由於數據是按12分鐘分組的,而變化比的unit是6分鐘,因此差值除以2(12/6)才獲得變化比。如第一個值是 (7.8245-8.0625)/2 = -0.1190

> SELECT mean("water_level") FROM "h2o_feet" WHERE time >= '2015-08-18T00:00:00Z' AND time <= '2015-08-18T00:30:00Z' group by time(12m), *
name: h2o_feet
tags: location=coyote_creek
time                 mean
----                 ----
2015-08-18T00:00:00Z 8.0625
2015-08-18T00:12:00Z 7.8245
2015-08-18T00:24:00Z 7.5675

name: h2o_feet
tags: location=santa_monica
time                 mean
----                 ----
2015-08-18T00:00:00Z 2.09
2015-08-18T00:12:00Z 2.077
2015-08-18T00:24:00Z 2.0460000000000003

NON_NEGATIVE_DERIVATIVEDERIVATIVE不一樣的是它只返回的是非負的變化比:

> SELECT DERIVATIVE(mean("water_level"), 6m) FROM "h2o_feet" WHERE location='santa_monica' AND time >= '2015-08-18T00:00:00Z' AND time <= '2015-08-18T00:30:00Z' group by time(6m), *
name: h2o_feet
tags: location=santa_monica
time                 derivative
----                 ----------
2015-08-18T00:06:00Z 0.052000000000000046
2015-08-18T00:12:00Z -0.08800000000000008
2015-08-18T00:18:00Z 0.09799999999999986
2015-08-18T00:24:00Z -0.08499999999999996
2015-08-18T00:30:00Z 0.010000000000000231

> SELECT NON_NEGATIVE_DERIVATIVE(mean("water_level"), 6m) FROM "h2o_feet" WHERE location='santa_monica' AND time >= '2015-08-18T00:00:00Z' AND time <= '2015-08-18T00:30:00Z' group by time(6m), *
name: h2o_feet
tags: location=santa_monica
time                 non_negative_derivative
----                 -----------------------
2015-08-18T00:06:00Z 0.052000000000000046
2015-08-18T00:18:00Z 0.09799999999999986
2015-08-18T00:30:00Z 0.010000000000000231

4 連續查詢

4.1 基本語法

連續查詢(CONTINUOUS QUERY,簡寫爲CQ)是指定時自動在實時數據上進行的InfluxQL查詢,查詢結果能夠存儲到指定的measurement中。基本語法格式以下:

CREATE CONTINUOUS QUERY <cq_name> ON <database_name>
BEGIN
  <cq_query>
END

cq_query格式:
SELECT <function[s]> INTO <destination_measurement> FROM <measurement> [WHERE <stuff>] GROUP BY time(<interval>)[,<tag_key[s]>]

CQ操做的是實時數據,它使用本地服務器的時間戳、GROUP BY time()時間間隔以及InfluxDB預先設置好的時間範圍來肯定何時開始查詢以及查詢覆蓋的時間範圍。注意CQ語句裏面的WHERE條件是沒有時間範圍的,由於CQ會根據GROUP BY time()自動肯定時間範圍。

CQ執行的時間間隔和GROUP BY time()的時間間隔同樣,它在InfluxDB預先設置的時間範圍的起始時刻執行。若是GROUP BY time(1h),則單次查詢的時間範圍爲 now()-GROUP BY time(1h)now(),也就是說,若是當前時間爲17點,此次查詢的時間範圍爲 16:00到16:59.99999。

下面看幾個示例,示例數據以下,這是數據庫transportation中名爲bus_data的measurement,每15分鐘統計一次乘客數和投訴數。數據文件bus_data.txt以下:

# DDL
CREATE DATABASE transportation

# DML
# CONTEXT-DATABASE: transportation 

bus_data,complaints=9 passengers=5 1472367600
bus_data,complaints=9 passengers=8 1472368500
bus_data,complaints=9 passengers=8 1472369400
bus_data,complaints=9 passengers=7 1472370300
bus_data,complaints=9 passengers=8 1472371200
bus_data,complaints=7 passengers=15 1472372100
bus_data,complaints=7 passengers=15 1472373000
bus_data,complaints=7 passengers=17 1472373900
bus_data,complaints=7 passengers=20 1472374800

導入數據,命令以下:

root@f216e9be15bf:/# influx -import -path=bus_data.txt -precision=s
root@f216e9be15bf:/# influx -precision=rfc3339 -database=transportation
Connected to http://localhost:8086 version 1.3.5
InfluxDB shell version: 1.3.5
> select * from bus_data
name: bus_data
time                 complaints passengers
----                 ---------- ----------
2016-08-28T07:00:00Z 9          5
2016-08-28T07:15:00Z 9          8
2016-08-28T07:30:00Z 9          8
2016-08-28T07:45:00Z 9          7
2016-08-28T08:00:00Z 9          8
2016-08-28T08:15:00Z 7          15
2016-08-28T08:30:00Z 7          15
2016-08-28T08:45:00Z 7          17
2016-08-28T09:00:00Z 7          20

示例1 自動縮小取樣存儲到新的measurement中

對單個字段自動縮小取樣並存儲到新的measurement中。

CREATE CONTINUOUS QUERY "cq_basic" ON "transportation"
BEGIN
  SELECT mean("passengers") INTO "average_passengers" FROM "bus_data" GROUP BY time(1h)
END

這個CQ的意思就是對bus_data每小時自動計算取樣數據的平均乘客數並存儲到 average_passengers中。那麼在2016-08-28這天早上會執行以下流程:

At 8:00 cq_basic 執行查詢,查詢時間範圍 time >= '7:00' AND time < '08:00'.
cq_basic寫入一條記錄到 average_passengers:
name: average_passengers
------------------------
time                   mean
2016-08-28T07:00:00Z   7
At 9:00 cq_basic 執行查詢,查詢時間範圍 time >= '8:00' AND time < '9:00'.
cq_basic寫入一條記錄到 average_passengers:
name: average_passengers
------------------------
time                   mean
2016-08-28T08:00:00Z   13.75

# Results
> SELECT * FROM "average_passengers"
name: average_passengers
------------------------
time                   mean
2016-08-28T07:00:00Z   7
2016-08-28T08:00:00Z   13.75

示例2 自動縮小取樣並存儲到新的保留策略(Retention Policy)中

CREATE CONTINUOUS QUERY "cq_basic_rp" ON "transportation"
BEGIN
  SELECT mean("passengers") INTO "transportation"."three_weeks"."average_passengers" FROM "bus_data" GROUP BY time(1h)
END

與示例1相似,不一樣的是保留的策略不是autogen,而是改爲了three_weeks(建立保留策略語法 CREATE RETENTION POLICY "three_weeks" ON "transportation" DURATION 3w REPLICATION 1)。

> SELECT * FROM "transportation"."three_weeks"."average_passengers"
name: average_passengers
------------------------
time                   mean
2016-08-28T07:00:00Z   7
2016-08-28T08:00:00Z   13.75

示例3 使用後向引用(backreferencing)自動縮小取樣並存儲到新的數據庫中

CREATE CONTINUOUS QUERY "cq_basic_br" ON "transportation"
BEGIN
  SELECT mean(*) INTO "downsampled_transportation"."autogen".:MEASUREMENT FROM /.*/ GROUP BY time(30m),*
END

使用後向引用語法自動縮小取樣並存儲到新的數據庫中。語法 :MEASUREMENT 用來指代後面的表,而 /.*/則是分別查詢全部的表。這句CQ的含義就是每30分鐘自動查詢transportation的全部表(這裏只有bus_data一個表),並將30分鐘內數字字段(passengers和complaints)求平均值存儲到新的數據庫 downsampled_transportation中。

最終結果以下:

> SELECT * FROM "downsampled_transportation."autogen"."bus_data"
name: bus_data
--------------
time                   mean_complaints   mean_passengers
2016-08-28T07:00:00Z   9                 6.5
2016-08-28T07:30:00Z   9                 7.5
2016-08-28T08:00:00Z   8                 11.5
2016-08-28T08:30:00Z   7                 16

示例4 自動縮小取樣以及配置CQ的時間範圍

CREATE CONTINUOUS QUERY "cq_basic_offset" ON "transportation"
BEGIN
  SELECT mean("passengers") INTO "average_passengers" FROM "bus_data" GROUP BY time(1h,15m)
END

與前面幾個示例不一樣的是,這裏的GROUP BY time(1h, 15m)指定了一個時間偏移,也就是說 cq_basic_offset執行的時間再也不是整點,而是日後偏移15分鐘。執行流程以下:

At 8:15 cq_basic_offset 執行查詢的時間範圍 time >= '7:15' AND time < '8:15'.
name: average_passengers
------------------------
time                   mean
2016-08-28T07:15:00Z   7.75
At 9:15 cq_basic_offset 執行查詢的時間範圍 time >= '8:15' AND time < '9:15'.
name: average_passengers
------------------------
time                   mean
2016-08-28T08:15:00Z   16.75

最終結果:

> SELECT * FROM "average_passengers"
name: average_passengers
------------------------
time                   mean
2016-08-28T07:15:00Z   7.75
2016-08-28T08:15:00Z   16.75

4.2 高級語法

InfluxDB連續查詢的高級語法以下:

CREATE CONTINUOUS QUERY <cq_name> ON <database_name>
RESAMPLE EVERY <interval> FOR <interval>
BEGIN
  <cq_query>
END

與基本語法不一樣的是,多了RESAMPLE關鍵字。高級語法裏CQ的執行時間和查詢時間範圍則與RESAMPLE裏面的兩個interval有關係。

高級語法中CQ以EVERY interval的時間間隔執行,執行時查詢的時間範圍則是FOR interval來肯定。若是FOR interval爲2h,當前時間爲17:00,則查詢的時間範圍爲15:00-16:59.999999RESAMPLE的EVERY和FOR兩個關鍵字能夠只有一個

示例的數據表以下,比以前的多了幾條記錄爲了示例3和示例4的測試:

name: bus_data
--------------
time                   passengers
2016-08-28T06:30:00Z   2
2016-08-28T06:45:00Z   4
2016-08-28T07:00:00Z   5
2016-08-28T07:15:00Z   8
2016-08-28T07:30:00Z   8
2016-08-28T07:45:00Z   7
2016-08-28T08:00:00Z   8
2016-08-28T08:15:00Z   15
2016-08-28T08:30:00Z   15
2016-08-28T08:45:00Z   17
2016-08-28T09:00:00Z   20

示例1 只配置執行時間間隔

CREATE CONTINUOUS QUERY "cq_advanced_every" ON "transportation"
RESAMPLE EVERY 30m
BEGIN
  SELECT mean("passengers") INTO "average_passengers" FROM "bus_data" GROUP BY time(1h)
END

這裏配置了30分鐘執行一次CQ,沒有指定FOR interval,因而查詢的時間範圍仍是GROUP BY time(1h)指定的一個小時,執行流程以下:

At 8:00, cq_advanced_every 執行時間範圍 time >= '7:00' AND time < '8:00'.
name: average_passengers
------------------------
time                   mean
2016-08-28T07:00:00Z   7
At 8:30, cq_advanced_every 執行時間範圍 time >= '8:00' AND time < '9:00'.
name: average_passengers
------------------------
time                   mean
2016-08-28T08:00:00Z   12.6667
At 9:00, cq_advanced_every 執行時間範圍 time >= '8:00' AND time < '9:00'.
name: average_passengers
------------------------
time                   mean
2016-08-28T08:00:00Z   13.75

須要注意的是,這裏的 8點到9點這個區間執行了兩次,第一次執行時時8:30,平均值是 (8+15+15)/ 3 = 12.6667,而第二次執行時間是9:00,平均值是 (8+15+15+17) / 4=13.75,並且最後第二個結果覆蓋了第一個結果。InfluxDB如何處理重複的記錄能夠參見這個文檔

最終結果:

> SELECT * FROM "average_passengers"
name: average_passengers
------------------------
time                   mean
2016-08-28T07:00:00Z   7
2016-08-28T08:00:00Z   13.75

示例2 只配置查詢時間範圍

CREATE CONTINUOUS QUERY "cq_advanced_for" ON "transportation"
RESAMPLE FOR 1h
BEGIN
  SELECT mean("passengers") INTO "average_passengers" FROM "bus_data" GROUP BY time(30m)
END

只配置了時間範圍,而沒有配置EVERY interval。這樣,執行的時間間隔與GROUP BY time(30m)同樣爲30分鐘,而查詢的時間範圍爲1小時,因爲是按30分鐘分組,因此每次會寫入兩條記錄。執行流程以下:

At 8:00 cq_advanced_for 查詢時間範圍:time >= '7:00' AND time < '8:00'.
寫入兩條記錄。
name: average_passengers
------------------------
time                   mean
2016-08-28T07:00:00Z   6.5
2016-08-28T07:30:00Z   7.5
At 8:30 cq_advanced_for 查詢時間範圍:time >= '7:30' AND time < '8:30'.
寫入兩條記錄。
name: average_passengers
------------------------
time                   mean
2016-08-28T07:30:00Z   7.5
2016-08-28T08:00:00Z   11.5
At 9:00 cq_advanced_for 查詢時間範圍:time >= '8:00' AND time < '9:00'.
寫入兩條記錄。
name: average_passengers
------------------------
time                   mean
2016-08-28T08:00:00Z   11.5
2016-08-28T08:30:00Z   16

須要注意的是,cq_advanced_for每次寫入了兩條記錄,重複的記錄會被覆蓋。

最終結果:

> SELECT * FROM "average_passengers"
name: average_passengers
------------------------
time                   mean
2016-08-28T07:00:00Z   6.5
2016-08-28T07:30:00Z   7.5
2016-08-28T08:00:00Z   11.5
2016-08-28T08:30:00Z   16

示例3 同時配置執行時間間隔和查詢時間範圍

CREATE CONTINUOUS QUERY "cq_advanced_every_for" ON "transportation"
RESAMPLE EVERY 1h FOR 90m
BEGIN
  SELECT mean("passengers") INTO "average_passengers" FROM "bus_data" GROUP BY time(30m)
END

這裏配置了執行間隔爲1小時,而查詢範圍90分鐘,最後分組是30分鐘,每次插入了三條記錄。執行流程以下:

At 8:00 cq_advanced_every_for 查詢時間範圍 time >= '6:30' AND time < '8:00'.
插入三條記錄
name: average_passengers
------------------------
time                   mean
2016-08-28T06:30:00Z   3
2016-08-28T07:00:00Z   6.5
2016-08-28T07:30:00Z   7.5
At 9:00 cq_advanced_every_for 查詢時間範圍 time >= '7:30' AND time < '9:00'.
插入三條記錄
name: average_passengers
------------------------
time                   mean
2016-08-28T07:30:00Z   7.5
2016-08-28T08:00:00Z   11.5
2016-08-28T08:30:00Z   16

最終結果:

> SELECT * FROM "average_passengers"
name: average_passengers
------------------------
time                   mean
2016-08-28T06:30:00Z   3
2016-08-28T07:00:00Z   6.5
2016-08-28T07:30:00Z   7.5
2016-08-28T08:00:00Z   11.5
2016-08-28T08:30:00Z   16

示例4 配置查詢時間範圍和FILL填充

CREATE CONTINUOUS QUERY "cq_advanced_for_fill" ON "transportation"
RESAMPLE FOR 2h
BEGIN
  SELECT mean("passengers") INTO "average_passengers" FROM "bus_data" GROUP BY time(1h) fill(1000)
END

在前面值配置查詢時間範圍的基礎上,加上FILL填充空的記錄。執行流程以下:

At 6:00, cq_advanced_for_fill 查詢時間範圍:time >= '4:00' AND time < '6:00',沒有數據,不填充。

At 7:00, cq_advanced_for_fill 查詢時間範圍:time >= '5:00' AND time < '7:00'. 寫入兩條記錄,沒有數據的時間點填充1000。
------------------------
time                   mean
2016-08-28T05:00:00Z   1000          <------ fill(1000)
2016-08-28T06:00:00Z   3             <------ average of 2 and 4

[…] At 11:00, cq_advanced_for_fill 查詢時間範圍:time >= '9:00' AND time < '11:00'.寫入兩條記錄,沒有數據的點填充1000。
name: average_passengers
------------------------
2016-08-28T09:00:00Z   20            <------ average of 20
2016-08-28T10:00:00Z   1000          <------ fill(1000)     

At 12:00, cq_advanced_for_fill 查詢時間範圍:time >= '10:00' AND time < '12:00'。沒有數據,不填充。

最終結果:

> SELECT * FROM "average_passengers"
name: average_passengers
------------------------
time                   mean
2016-08-28T05:00:00Z   1000
2016-08-28T06:00:00Z   3
2016-08-28T07:00:00Z   7
2016-08-28T08:00:00Z   13.75
2016-08-28T09:00:00Z   20
2016-08-28T10:00:00Z   1000

5 參考資料

相關文章
相關標籤/搜索