OpenTSDB-Writing Data

Writing Data

You may want to jump right in and start throwing data into your TSD, but to really take advantage of OpenTSDB's power and flexibility, you may want to pause and think about your naming schema. After you've done that, you can procede to pushing data over the Telnet or HTTP APIs, or use an existing tool with OpenTSDB support such as 'tcollector'.html

你可能調到這裏,開始將數據丟進TSD中,可是真正地利用好OpenTSDB的強大功能以及靈活性,你可能須要停一下,想一下你的naming schema。java

而後,你就能夠繼續經過Telnet或者HTTPAPIs推送數據,或者利用現有OpenTSDB支持的工具,如tcollectornode

 

Naming Schema命名範式

Many metrics administrators are used to supplying a single name for their time series. For example, systems administrators used to RRD-style systems may name their time series webserver01.sys.cpu.0.user. The name tells us that the time series is recording the amount of time in user space for cpu 0 on webserver01. This works great if you want to retrieve just the user time for that cpu core on that particular web server later on.python

多數的metrics使用單個命名。例如,系統管理的參數使用RRD-格式命名,格式如webserver01.sys.cpu.0.user。這個名字告訴咱們,時間序列是記錄webser01上cpu0的user git

佔用的時間。若是你想獲取特定web server上cpu的用戶態使用時間的話,這將很好地支持。web

 

But what if the web server has 64 cores and you want to get the average time across all of them? Some systems allow you to specify a wild card such as webserver01.sys.cpu.*.user that would read all 64 files and aggregate the results. Alternatively, you could record a new time series called webserver01.sys.cpu.user.all that represents the same aggregate but you must now write '64 + 1' different time series. What if you had a thousand web servers and you wanted the average cpu time for all of your servers? You could craft a wild card query like *.sys.cpu.*.user and the system would open all 64,000 files, aggregate the results and return the data. Or you setup a process to pre-aggregate the data and write it to webservers.sys.cpu.user.all.apache

可是,若是web server有64個核,而你想獲取平均時間呢?有些系統容許你使用一個模糊匹配,例如webserver01.sys.cpu.*.user ,而後讀取64個文件,而後將它們聚合。api

另外,你能夠記錄一個新的時間序列,名爲webserver01.sys.cpu.user.all,這樣表示一樣的聚合效果,可是須要64+1個不一樣的時間序列。緩存

若是你有1000個webserer,對全部的server求cpu平均時間的畫?你可能使用*.sys.cpu.*.user ,而後讀取64000個文件,而後聚合結果返回數據,或者提早聚合數據,寫入新的時間序列如webservers.sys.cpu.user.all。sass

 

OpenTSDB handles things a bit differently by introducing the idea of 'tags'. Each time series still has a 'metric' name, but it's much more generic, something that can be shared by many unique time series. Instead, the uniqueness comes from a combination of tag key/value pairs that allows for flexible queries with very fast aggregations.

OpenTSDB使用不一樣的處理方式,引入tags的思想。每一個時間序列都有一個metric的名字,可是這個更通用,被不少不一樣的時間序列共享。

惟一性來自於tag,key/value pairs,這樣使用查詢靈活,也快速進行整合。

 

Note

Every time series in OpenTSDB must have at least one tag.

在OpenTSDB中的每一個時間至少有一個tag。

 

Take the previous example where the metric was webserver01.sys.cpu.0.user. In OpenTSDB, this may become sys.cpu.userhost=webserver01, cpu=0. Now if we want the data for an individual core, we can craft a query likesum:sys.cpu.user{host=webserver01,cpu=42}. If we want all of the cores, we simply drop the cpu tag and ask forsum:sys.cpu.user{host=webserver01}. This will give us the aggregated results for all 64 cores. If we want the results for all 1,000 servers, we simply request sum:sys.cpu.user. The underlying data schema will store all of the sys.cpu.user time series next to each other so that aggregating the individual values is very fast and efficient. OpenTSDB was designed to make these aggregate queries as fast as possible since most users start out at a high level, then drill down for detailed information.

回到前面的例子中的metric,webserver01.sys.cpu.0.user。在OpenTSDB中,將變爲sys.cpu.userhost=webserver01, cpu=0。

若是想獲取單個核的數據,可使用以下查詢sys.cpu.user{host=webserver01,cpu=42}。

若是想獲取全部核的話,可使用以下查詢sys.cpu.user{host=webserver01},這給出64個核聚合後的結果。

若是想獲取全部webserver的,查詢方式如sys.cpu.user。

底層的數據結構是逐個存儲sys.cpu.user時間序列,所以獲取單個值是很是快和高效的。

OpenTSDB設計的目標是儘量地快進行查詢的整合,由於大多數用戶進行更上層的查詢,而後獲取更細節的信息。

 

Aggregations——聚合

While the tagging system is flexible, some problems can arise if you don't understand how the querying side of OpenTSDB, hence the need for some forethought. Take the example query above: sum:sys.cpu.user{host=webserver01}. We recorded 64 unique time series forwebserver01, one time series for each of the CPU cores. When we issued that query, all of the time series for metric sys.cpu.user with the tag host=webserver01 were retrieved, averaged, and returned as one series of numbers. Let's say the resulting average was 50 for timestamp 1356998400. Now we were migrating from another system to OpenTSDB and had a process that pre-aggregated all 64 cores so that we could quickly get the average value and simply wrote a new time series sys.cpu.user host=webserver01. If we run the same query, we'll get a value of 100 at 1356998400. What happened? OpenTSDB aggregated all 64 time series and the pre-aggregated time series to get to that 100. In storage, we would have something like this:

雖然標籤系統很靈活,可是若是不瞭解OpenTSDB的查詢方式,可能還會遇到問題,所以須要進一步瞭解。

以上面的查詢做爲例子:sum:sys.cpu.user{host=webserver01}

webserver01記錄64個不一樣時間序列,每一個核都記錄一個。當討論查詢時,全部帶有標籤host=webserver01的sys.cpu.user的metric都會查詢,而後求平均,返回一串數字。

假設結果平均值爲50,時間戳爲1356998400。如今咱們移到另外一個OpenTSDB系統,它有一個進程提早整合64核的數據,這樣咱們將快速獲得平均值,寫入一個新的時間序列中sys.cpu.user host=webserver01,可是運行一樣的查詢,結果卻爲100。這樣是發生什麼事情呢?

在存儲中,數據格式以下:

sys.cpu.user host=webserver01        1356998400  50
sys.cpu.user host=webserver01,cpu=0  1356998400  1
sys.cpu.user host=webserver01,cpu=1  1356998400  0
sys.cpu.user host=webserver01,cpu=2  1356998400  2
sys.cpu.user host=webserver01,cpu=3  1356998400  0
...
sys.cpu.user host=webserver01,cpu=63 1356998400  1

OpenTSDB will automatically aggregate all of the time series for the metric in a query if no tags are given. If one or more tags are defined, the aggregate will 'include all' time series that match on that tag, regardless of other tags. With the querysum:sys.cpu.user{host=webserver01}, we would include sys.cpu.user host=webserver01,cpu=0 as well as sys.cpu.userhost=webserver01,cpu=0,manufacturer=Intel

sys.cpu.user host=webserver01,foo=bar and 

sys.cpu.userhost=webserver01,cpu=0,datacenter=lax,department=ops.

The moral of this example is: be careful with your naming schema.

若是在一個查詢中沒有設置tags,OpenTSDB自動整合全部時間序列。若是定義一個或者多個tags,整合只會包含和tag匹配的時間序列,忽略掉其餘的tags。

例如,查詢sum:sys.cpu.user{host=webserver01},將會包括以下:

sys.cpu.user host=webserver01,cpu=0

sys.cpu.userhost=webserver01,cpu=0,manufacturer=Intel

 

sys.cpu.user host=webserver01,foo=bar

sys.cpu.userhost=webserver01,cpu=0,datacenter=lax,department=ops

 

這個例子的寓意是:使用naming schema應謹慎

 

Time Series Cardinality--時間序列基數

A critical aspect of any naming schema is to consider the cardinality of your time series. Cardinality is defined as the number of unique items in a set. In OpenTSDB's case, this means the number of items associated with a metric, i.e. all of the possible tag name and value combinations, as well as the number of unique metric names, tag names and tag values. Cardinality is important for two reasons outlined below.

任何naming schema都須要考慮時間序列的基數。基數定義爲集合中惟一items的個數。

在OpenTSDB中,意思是一個metric關聯的items個數,全部tag的name和values的組合數,也多是惟一metric名稱、tag名稱以及tag值的數目。

 Cardinality比較重要,主要下面兩個緣由。

(1)Limited Unique IDs (UIDs)

There is a limited number of unique IDs to assign for each metric, tag name and tag value. By default there are just over 16 million possible IDs per type. If, for example, you ran a very popular web service and tried to track the IP address of clients as a tag, e.g. web.app.hitsclientip=38.26.34.10, you may quickly run into the UID assignment limit as there are over 4 billion possible IP version 4 addresses. Additionally, this approach would lead to creating a very sparse time series as the user at address 38.26.34.10 may only use your app sporadically, or perhaps never again from that specific address.

 

對於每一個metric,tag name以及tag value存在惟一一個ID標示。每一個類型的ID可能有1600萬+。

例如,你可能運行一個web service,將IP地址設置爲tag,例如web.app.hitsclientip=38.26.34.10。UID的最大值爲400萬,IP4的地址。

 

The UID limit is usually not an issue, however. A tag value is assigned a UID that is completely disassociated from its tag name. If you use numeric identifiers for tag values, the number is assigned a UID once and can be used with many tag names. For example, if we assign a UID to the number 2, we could store timeseries with the tag pairs cpu=2interface=2hdd=2 and fan=2 while consuming only 1 tag value UID (1) and 4 tag name UIDs (cpuinterfacehdd and fan).

UID限制不是關鍵。tag值賦值給UID,可能徹底和tag名稱沒有數目關係。

例如使用數字標示不一樣的tag,cpu=2interface=2hdd=2 and fan=2,一個tag value,4個tag name。

If you think that the UID limit may impact you, first think about the queries that you want to execute. If we look at the web.app.hitsexample above, you probably only care about the total number of hits to your service and rarely need to drill down to a specific IP address. In that case, you may want to store the IP address as an annotation. That way you could still benefit from low cardinality but if you need to, you could search the results for that particular IP using external scripts. (Note: Support for annotation queries is expected in a futureversion of OpenTSDB.)

若是你認爲UID限制可能會影響你,首先要考慮執行的查詢。若是查詢是web.app.hit,你可能須要關注一下服務的hists個數,而關注具體的IP。

這種場景,存儲IP地址做爲一個標示。這樣根據具體IP查詢相關結果。

支持annotation查詢是OpenTSD將來的版本。

If you desperately need more than 16 million values, you can increase the number of bytes that OpenTSDB uses to encode UIDs from 3 bytes up to a maximum of 8 bytes. This change would require modifying the value in source code, recompiling, deploying your customized code to all TSDs which will access this data, and maintaining this customization across all future patches and releases.

若是須要存儲超過1600+萬的值,須要增長OpenTSD使用UIDS的字節數,由3byte擴展到8byte。這個須要修改源碼,重編譯,而後從新部署全部的TSD。

 

(2)Query Speed

Cardinality also affects query speed a great deal, so consider the queries you will be performing frequently and optimize your naming schema for those. OpenTSDB creates a new row per time series per hour. If we have the time series sys.cpu.userhost=webserver01,cpu=0 with data written every second for 1 day, that would result in 24 rows of data. However if we have 8 possible CPU cores for that host, now we have 192 rows of data. This looks good because we can get easily a sum or average of CPU usage across all cores by issuing a query like start=1d-ago&m=avg:sys.cpu.user{host=webserver01}.

基數也影響查詢速度,所以須要考慮頻繁的查詢,優化naming schema。OpenTSDB每一個小時每一個時間序列都產生新的一行記錄。

例若有時間序列sys.cpu.userhost=webserver01,cpu=0 ,每秒寫入,採集一天,將產生24行數據。

每臺主機有8個核,一天就192行數據。對於這樣的查詢start=1d-ago&m=avg:sys.cpu.user{host=webserver01},看上去很好。

 

However what if we have 20,000 hosts, each with 8 cores? Now we will have 3.8 million rows per day due to a high cardinality of host values. Queries for the average core usage on host webserver01 will be slower as it must pick out 192 rows out of 3.8 million.

可是有20000個hosts,每一個有8核,天天將有380萬的記錄。

查詢webserver01的平均CPU使用性能,將從380萬數據中找出192行記錄。

 

The benefits of this schema are that you have very deep granularity in your data, e.g., storing usage metrics on a per-core basis. You can also easily craft a query to get the average usage across all cores an all hosts: start=1d-ago&m=avg:sys.cpu.user. However queries against that particular metric will take longer as there are more rows to sift through.

schema的優點在於有更粗的粒度。

 

Here are some common means of dealing with cardinality:

Pre-Aggregate - In the example above with sys.cpu.user, you generally care about the average usage on the host, not the usage per core. While the data collector may send a separate value per core with the tagging schema above, the collector could also send one extra data point such as sys.cpu.user.avg host=webserver01. Now you have a completely separate timeseries that would only have 24 rows per day and with 20K hosts, only 480K rows to sift through. Queries will be much more responsive for the per-host average and you still have per-core data to drill down to separately.

Shift to Metric - What if you really only care about the metrics for a particular host and don't need to aggregate across hosts? In that case you can shift the hostname to the metric. Our previous example becomes sys.cpu.user.websvr01 cpu=0. Queries against this schema are very fast as there would only be 192 rows per day for the metric. However to aggregate across hosts you would have to execute mutliple queries and aggregate outside of OpenTSDB. (Future work will include this capability).

Pre-Aggregate

在上面的列子中涉及的sys.cpu.user,你一般是求host的平均使用率,而不是每一個核的使用。當基於上面tag schema定義,collector採集數據是按照單個核的。

collector也能夠發送一個額外的數據點,例如sys.cpu.user.avg host=webserver01。如今你有一個單獨的時間序列,天天有24行記錄,20k臺機器,就是480k行記錄。

這樣查找每臺服務器平均使用率,同時,你也有單個核的數據。

Shift to Metric

若是你真的關注一個具體host的metric,不須要聚合全部機器。這樣你能夠將hostname和metric進行綁定。

前面的例子變爲sys.cpu.user.websvr01 cpu=0。

基於這個schema的查詢會很快,由於單個metric一天只有192行記錄。可是不一樣機器間的整合須要屢次查詢OpentTSDB,而後再外面整合。

 

Naming Conclusion

When you design your naming schema, keep these suggestions in mind:

  • Be consistent with your naming to reduce duplication. Always use the same case for metrics, tag names and values.
  • Use the same number and type of tags for each metric. E.g. don't store my.metric host=foo and my.metric datacenter=lga.
  • Think about the most common queries you'll be executing and optimize your schema for those queries
  • Think about how you may want to drill down when querying
  • Don't use too many tags, keep it to a fairly small number, usually up to 4 or 5 tags (By default, OpenTSDB supports a maximum of 8 tags).
 Naming總結:
建議:
一、命名的原則減小數據重複。這個原則也適應於metrics,tag names 和values
二、每一個metic使相同數目和類型的tag。例如不要存儲my.metric  host=foo和my.metric datacenter=lga.
三、多思考最多共同的查詢,這樣能夠基於這些查詢優化schema
四、在查詢的時候多思考如何減小數據量
五、不要使用太多的tags,保持在一個相對小的數目,通常4或者5個。OpenTSDB默認最多8個tags
 

Data Specification

Every time series data point requires the following data:

  • metric - A generic name for the time series such as sys.cpu.userstock.quote or env.probe.temp.
  • timestamp - A Unix/POSIX Epoch timestamp in seconds or milliseconds defined as the number of seconds that have elapsed since January 1st, 1970 at 00:00:00 UTC time. Only positive timestamps are supported at this time.
  • value - A numeric value to store at the given timestamp for the time series. This may be an integer or a floating point value.
  • tag(s) - A key/value pair consisting of a tagk (the key) and a tagv (the value). Each data point must have at least one tag.

每一個時間序列數據點包括如下數據:

一、metric:時間序列的名字,例如sys.cpu.userstock.quote or env.probe.temp

二、timestamp:UNIX/POSIX的時間戳

三、value:在對應的timestamp下metric對應的值,能夠是整數或者浮點數

四、tags:KV對,由tagk和tagv組成。每一個數據點至少包括一個tag

 

Timestamps

Data can be written to OpenTSDB with second or millisecond resolution. Timestamps must be integers and be no longer than 13 digits (See first [NOTE] below). Millisecond timestamps must be of the format 1364410924250 where the final three digits represent the milliseconds. Applications that generate timestamps with more than 13 digits (i.e., greater than millisecond resolution) must be rounded to a maximum of 13 digits before submitting or an error will be generated.

Timestamps with second resolution are stored on 2 bytes while millisecond resolution are stored on 4. Thus if you do not need millisecond resolution or all of your data points are on 1 second boundaries, we recommend that you submit timestamps with 10 digits for second resolution so that you can save on storage space. It's also a good idea to avoid mixing second and millisecond timestamps for a given time series. Doing so will slow down queries as iteration across mixed timestamps takes longer than if you only record one type or the other. OpenTSDB will store whatever you give it.

 Timestamps
數據寫入OpenTSDB能夠在秒級或者毫秒級別。Timestamps必須是整數,不超過13個數字。毫秒級的時間戳格式爲1364410924250,最後三個數字表示毫秒。
應用程序產生的時間戳不能超過13個數字,不然提交會出錯。
秒級的時間戳存儲在2個字節內,而毫秒級須要4個字節。所以,若是你不須要毫秒級解決方案,全部數據點都在1秒內,推薦使用10個數字,這樣能夠節約存儲空間。
在給定時間序列,避免混合使用秒級和毫秒級的時間戳。使用混合時間戳會下降查詢。不論選擇那種,OpenTSDB在給定類型下都會存儲。
 

Metrics and Tags

The following rules apply to metric and tag values:

  • Strings are case sensitive, i.e. "Sys.Cpu.User" will be stored separately from "sys.cpu.user"
  • Spaces are not allowed
  • Only the following characters are allowed: a to zA to Z0 to 9-_./ or Unicode letters (as per the specification)

Metric and tags are not limited in length, though you should try to keep the values fairly short.

Metrics和Tags

一、大小寫敏感

二、不容許空格

三、支持[a-zA-Z0-9-_./]類型

Metric和tags沒有長度限制,建議儘可能保持短一些

 

Integer Values

If the value from a put command is parsed without a decimal point (.), it will be treated as a signed integer. Integers are stored, unsigned, with variable length encoding so that a data point may take as little as 1 byte of space or up to 8 bytes. This means a data point can have a minimum value of -9,223,372,036,854,775,808 and a maximum value of 9,223,372,036,854,775,807 (inclusive). Integers cannot have commas or any character other than digits and the dash (for negative values). For example, in order to store the maximum value, it must be provided in the form 9223372036854775807.

Floating Point Values

If the value from a put command is parsed with a decimal point (.) it will be treated as a floating point value. Currently all floating point values are stored on 4 bytes, single-precision, with support for 8 bytes planned for a future release. Floats are stored in IEEE 754 floating-point "single format" with positive and negative value support. Infinity and Not-a-Number values are not supported and will throw an error if supplied to a TSD. See Wikipedia and the Java Documentation for details.

Note

Because OpenTSDB only supports floating point values, it is not suitable for storing measurements that require exact values like currency. This is why, when storing a value like 15.2 the database may return 15.199999809265137.

Integer Values

在put命令value不包括小數點,其將被做爲整數值。最多8個字節

 

Floating Point Values

value包含小數點,將被做爲float。如今float存儲4個字節,單精度

 

Ordering

Unlike other solutions, OpenTSDB allows for writing data for a given time series in any order you want. This enables significant flexibility in writing data to a TSD, allowing for populating current data from your systems, then importing historical data at a later time

OpenTSDB容許給定時間序列任何排序規則。加強寫入TSD數據的靈活性,容許收集系統如今狀態數據,後面再將歷史數據導入。

 

Duplicate Data Points

Writing data points in OpenTSDB is generally idempotent within an hour of the original write. This means you can write the value 42 at timestamp 1356998400 and then write 42 again for the same time and nothing bad will happen. However if you have compactions enabled to reduce storage consumption and write the same data point after the row of data has been compacted, an exception may be returned when you query over that row. If you attempt to write two different values with the same timestamp, a duplicate data point exception may be thrown during query time. This is due to a difference in encoding integers on 1, 2, 4 or 8 bytes and floating point numbers. If the first value was an integer and the second a floating point, the duplicate error will always be thrown. However if both values were floats or they were both integers that could be encoded on the same length, then the original value may be overwritten if a compaction has not occured on the row.

In most situations, if a duplicate data point is written it is usually an indication that something went wrong with the data source such as a process restarting unexpectedly or a bug in a script. OpenTSDB will fail "safe" by throwing an exception when you query over a row with one or more duplicates so you can down the issue.

With OpenTSDB 2.1 you can enable last-write-wins by setting the tsd.storage.fix_duplicates configuration value to true. With this flag enabled, at query time, the most recent value recorded will be returned instead of throwing an exception. A warning will also be written to the log file noting a duplicate was found. If compaction is also enabled, then the original compacted value will be overwritten with the latest value.

 寫入OpenTSDB中的數據點是冪等的。意思是你再時間點1356998400寫入值42,再寫一次42,是不會有問題的。可是若是從節約存儲角度考慮,寫入一樣的數據點須要被compacted,不然在查詢這行數據的時候可能會出現異常。若是你在同一個時間點寫入兩個不一樣值,查詢的時候可能會出現異常。

由於兩個值可能編碼類型不同,一個是整數,一個多是浮點數。

一般狀況下,是在採集的時候進行去重。OpenTSDB在查詢到重複數據的時候會返回異常,便於查找錯誤。

OpenTSDB2.1能夠開啓配置tsd.storage.fix_duplicates 。查詢時候返回最近的那個條記錄,而不是拋出一次。在日誌中記錄一條警告。

若是compatction可行,原始數據將被覆蓋。

 

Input Methods

There are currently three main methods to get data into OpenTSDB: Telnet API, HTTP API and batch import from a file. Alternatively you can use a tool that provides OpenTSDB support, or if you're extremely adventurous, use the Java library.

主要由三種方式從OpenTSDB獲取數據:

一、Telnet API

二、HTTP API

三、批量從文件導入

這樣使用OpenTSDB支持的tool,也能夠直接使用java庫

Telnet

The easiest way to get started with OpenTSDB is to open up a terminal or telnet client, connect to your TSD and issue a put command and hit 'enter'. If you are writing a program, simply open a socket, print the string command with a new line and send the packet. The telnet command format is:

put <metric> <timestamp> <value> <tagk1=tagv1[ tagk2=tagv2 ...tagkN=tagvN]>

For example:

put sys.cpu.user 1356998400 42.5 host=webserver01 cpu=0

Each put can only send a single data point. Don't forget the newline character, e.g. \n at the end of your command.

在Telnet鏈接上TSD後,使用put命令。不一樣行之間須要加上\n

Http API

As of version 2.0, data can be sent over HTTP in formats supported by 'Serializer' plugins. Multiple, un-related data points can be sent in a single HTTP POST request to save bandwidth. See the /api/put for details.

HTTPAPI方式,api/put

Batch Import

If you are importing data from another system or you need to backfill historical data, you can use the import CLI utility. See import for details.

導入歷史數據使用

 

Write Performance

OpenTSDB can scale to writing millions of data points per 'second' on commodity servers with regular spinning hard drives. However users who fire up a VM with HBase in stand-alone mode and try to slam millions of data points at a brand new TSD are disappointed when they can only write data in the hundreds of points per second. Here's what you need to do to scale for brand new installs or testing and for expanding existing systems.

 

 在通用的機器上OpenTSDB能夠擴展支持單秒百萬數據的寫入。
可是若是在VM上使用HBASE stand-alone模式,單秒只能寫入幾百個點。所以,須要在已經存在的系統中進行新的安裝和測試。

UID Assignment

The first sticking point folks run into is ''uid assignment''. Every string for a metric, tag key and tag value must be assigned a UID before the data point can be stored. For example, the metric sys.cpu.user may be assigned a UID of 000001 the first time it is encountered by a TSD. This assignment takes a fair amount of time as it must fetch an available UID, write a UID to name mapping and a name to UID mapping, then use the UID to write the data point's row key. The UID will be stored in the TSD's cache so that the next time the same metric comes through, it can find the UID very quickly.

每一個metric,tag key和tag value在數據存儲以前都有一個UID。例如sys.cpu.user metric,TSD第一次設置其UID爲000001。

UID做爲數據點的row key。UID會存儲在TSD的cache中,下次一樣的metric來了,查找UID就很快。

Therefore, we recommend that you 'pre-assign' UID to as many metrics, tag keys and tag values as you can. If you have designed a naming schema as recommended above, you'll know most of the values to assign. You can use the CLI tools mkmetricuid or the HTTP API /api/uid to perform pre-assignments. Any time you are about to send a bunch of new metrics or tags to a running OpenTSDB cluster, try to pre-assign or the TSDs will bog down a bit when they get the new data.

建議給不少metric、tag key和tag values提早分配UID。可使用CLI或HTTP API進行預分配。

若是在線上OpenTSDB新增metric或者tag,TSDB將性能會下降一點,當其獲取新數據的時候。

Note

If you restart a TSD, it will have to lookup the UID for every metric and tag so performance will be a little slow until the cache is filled.

重啓TSD的時候,它會查出每一個metric、tag的UID,所以性能會低一些。

 

Pre-Split HBase Regions

For brand new installs you will see much better performance if you pre-split the regions in HBase regardless of if you're testing on a stand-alone server or running a full cluster. HBase regions handle a defined range of row keys and are essentially a single file. When you create the tsdb table and start writing data for the first time, all of those data points are being sent to this one file on one server. As a region fills up, HBase will automatically split it into different files and move it to other servers in the cluster, but when this happens, the TSDs cannot write to the region and must buffer the data points. Therefore, if you can pre-allocate a number of regions before you start writing, the TSDs can send data to multiple files or servers and you'll be taking advantage of the linear scalability immediately.

不管是在進行單機仍是集羣測試的時候,提早在HBase上進行預分區,都將獲取較好的性能。

HBase的region能夠處理必定範圍的row keys。

當你建立一個tsdb表,開始第一次寫入數據,這些數據將被髮送到單個server的單個文件中。

當一個region滿了,HBase會主動將其分紅不一樣的文件,在一個cluster也可能將其挪到不一樣的機器上。

可是這個事情,TSD不能再往這個region寫入數據,必須緩存這些數據點。

所以在開始寫入數據前,提早分配好必定數目的region,TSD能夠持續往不一樣的files或者server發送數據,利用到其線性擴展的優點。

The simplest way to pre-split your tsdb table regions is to estimate the number of unique metric names you'll be recording. If you have designed a naming schema, you should have a pretty good idea. Let's say that we will track 4,000 metrics in our system. That's not to say 4,000 time series, as we're not counting the tags yet, just the metric names such as "sys.cpu.user". Data points are written in row keys where the metric's UID comprises the first bytes, 3 bytes by default. The first metric will be assigned a UID of 000001 as a hex encoded value. The 4,000th metric will have a UID of 000FA0 in hex. You can use these as the start and end keys in the script from the HBase Book to split your table into any number of regions. 256 regions may be a good place to start depending on how many time series share each metric.

簡單地劃分tsdb表是根據惟一metric名稱的數目來。若是你設計一個naming schema,會比較好知道這個。

假設咱們系統中有4000個metric,不是說有4000個時間序列,例如metric sys.cpu.user。

數據點寫入是根據row key。metric的uid組成第一個字節。

第一個metric,uid爲000001,第4000個metric,對應的UID爲 000FA0 。

能夠根據HBASE Book上的腳本計算region的數量。256個region是個推薦值。

But don't worry too much about splitting. As stated above, HBase will automatically split regions for you so over time, the data will be distributed fairly evenly.

 

不要擔憂分得太細。

 

Distributed HBase

HBase will run in stand-alone mode where it will use the local file system for storing files. It will still use multiple regions and perform as well as the underlying disk or raid array will let it. You'll definitely want a RAID array under HBase so that if a drive fails, you can replace it without losing data. This kind of setup is fine for testing or very small installations and you should be able to get into the low thousands of data points per second.

Hbase能夠運行在單機版,使用本地文件系統存儲文件。它依然使用不一樣的分區,在底層磁盤或者raid陣列上性能還不錯。

能夠在RAID陣列上運行HBASE,這樣單個盤故障能夠不會損失數據。

單秒數據點下降千點。

However if you want serious throughput and scalability you have to setup a Hadoop and HBase cluster with multiple servers. In a distributed setup HDFS manages region files, automatically distributing copies to different servers for fault tolerance. HBase assigns regions to different servers and OpenTSDB's client will send data points to the specific server where they will be stored. You're now spreading operations amongst multiple servers, increasing performance and storage. If you need even more throughput or storage, just add nodes or disks.

 

可是考慮吞吐量和擴展性,你須要在不一樣的server上安裝hadoop和hbase集羣。

使用HDFS管理region文件,在不一樣server之間自動拷貝數據。HBase在不一樣的server包含不一樣的region。

OpenTSDB client能夠將數據點發生到某個server上進行存儲。如今你能夠在不一樣server上進行操做,加強性能和存儲量。

若是你須要更好的吞吐和更大的存儲,只須要添加節點或者磁盤就能夠了。

There are a number of ways to setup a Hadoop/HBase cluster and a ton of various tuning tweaks to make, so Google around and ask user groups for advice. Some general recomendations include:

  • Dedicate a pair of high memory, low disk space servers for the Name Node. Set them up for high availability using something like Heartbeat and Pacemaker.
  • Setup Zookeeper on at least 3 servers for fault tolerance. They must have a lot of RAM and a fairly fast disk for log writing. On small clusters, these can run on the Name node servers.
  • JBOD for the HDFS data nodes
  • HBase region servers can be collocated with the HDFS data nodes
  • At least 1 gbps links between servers, 10 gbps preferable.
  • Keep the cluster in a single data center
 
 有不少方式安裝Hadoop和HBase cluster,也有不少的優化建議:
一、將高內存、低磁盤空間的server做爲Name Node,配置高可用,例如使用Heartbeat 和Pacemaker
二、配置zookeeper,至少3臺server,必須有能夠快速訪問磁盤,方便日誌寫入。對於小集羣而言,能夠部署在NameNode上
三、爲HDFS 數據節點配置JBOD
四、HBase的region server分配在HDFS的data node
五、server之間至少1gbps(千兆網卡),10gbps最好
六、在單個data center配置cluster
 

Multiple TSDs

A single TSD can handle thousands of writes per second. But if you have many sources it's best to scale by running multiple TSDs and using a load balancer (such as Varnish or DNS round robin) to distribute the writes. Many users colocate TSDs on their HBase region servers when the cluster is dedicated to OpenTSDB.

單個TSD每秒能夠處理幾個寫入。可是若是你有不少源,最好運行較多的TSDs,而且使用一個load balancer來均衡寫入。

不少使用者將TSD放在HBase region server上,當OpenTSDB使用cluster

 

Persistent Connections

Enable keep alives in the TSDs and make sure that any applications you are using to send time series data keep their connections open instead of opening and closing for every write. See Configuration for details.

保持長鏈接,避免對每一個寫入重複地打開和關閉鏈接

 

Disable Meta Data and Real Time Publishing

OpenTSDB 2.0 introduced meta data for tracking the kinds of data in the system. When tracking is enabled, a counter is incremented for every data point written and new UIDs or time series will generate meta data. The data may be pushed to a search engine or passed through tree generation code. These processes require greater memory in the TSD and may affect throughput. Tracking is disabled by default so test it out before enabling the feature.

2.0 also introduced a real-time publishing plugin where incoming data points can be emitted to another destination immediately after they're queued for storage. This is diabled by default so test any plugins you are interested in before deploying in production.

OpenTSDB2.0引入meta data,跟蹤系統中各種數據。若是跟蹤開啓,每一個數據點寫入新的UIDs都會有個counter,而後產生對應的meta data。

這些能夠推送給搜索引擎,產生tree generation code。這些過程須要更大的內存,可能會影響吞吐。

Tracking默認是被關閉的,因此開啓以前首先測試一下。

2.0也引入實時publishing plugin,這樣導入的數據能夠發生到其餘存儲上,經過隊列的方式。

默認也是關閉,要部署到生產環境上

 

【參考資料】

一、http://opentsdb.net/docs/build/html/user_guide/writing.html

二、http://en.wikipedia.org/wiki/IPv4_address_exhaustion 

三、Pacemaker

http://asram.blog.51cto.com/1442164/351135

相關文章
相關標籤/搜索