Logstash 參考指南（在配置中訪問事件數據和字段）

時間 2019-11-09

標籤 logstash 參考指南配置訪問事件數據字段欄目日誌分析简体版

原文原文鏈接

在配置中訪問事件數據和字段

Logstash代理是一個包含三個階段的處理管道：輸入 → 過濾 → 輸出，輸入生成事件，過濾器修改事件，輸出將事件發送到其餘地方。html

全部事件都有屬性，例如，apache訪問日誌包含狀態代碼（200，404）、請求路徑（「/」，「index.html」）、HTTP動做(GET，POST)、客戶端IP地址等，Logstash將這些屬性稱爲「字段」。git

Logstash中的一些配置選項須要字段的存在來發揮做用，由於輸入生成事件，在輸入塊中沒有要衡量的字段 — 他們還不存在！github

由於它們依賴於事件和字段，如下配置選項將只在過濾器和輸出塊中工做。web

下面描述的字段引用、sprintf格式和條件在輸入塊中不起做用。

字段引用

可以經過名稱引用字段一般頗有用，爲此，你能夠使用Logstash字段引用語法。正則表達式

訪問字段的語法是[fieldname]，若是你引用的是頂級字段，你能夠省略[]並簡單地使用fieldname，要引用嵌套字段，須要指定該字段的完整路徑：[top-level field][nested field]。apache

例如，如下事件有5個頂級字段（agent、ip、request、response、ua）和3個嵌套字段（status、bytes、os）。編程

{
  "agent": "Mozilla/5.0 (compatible; MSIE 9.0)",
  "ip": "192.168.24.44",
  "request": "/index.html"
  "response": {
    "status": 200,
    "bytes": 52353
  },
  "ua": {
    "os": "Windows 7"
  }
}

要引用os字段，你須要指定[ua][os]，要引用頂級字段（如request），只需簡單的指定字段名。segmentfault

sprintf格式

字段引用格式也被用在Logstash調用sprintf格式中，這種格式容許你從其餘字符串引用字段值，例如，statsd輸出有一個increment設置，能夠根據狀態碼保存apache日誌的計數：api

output {
  statsd {
    increment => "apache.%{[response][status]}"
  }
}

相似地，你能夠將@timestamp字段中的時間戳轉換爲字符串，不是在花括號中指定字段名，而是使用+FORMAT語法，其中FORMAT是時間格式。ruby

例如，若是但願使用文件輸出根據事件的日期、小時和type字段寫入日誌：

output {
  file {
    path => "/var/log/%{type}.%{+yyyy.MM.dd.HH}"
  }
}

條件

有時，你只想在特定條件下過濾或輸出事件，爲此，你能夠使用條件語句。

Logstash中的條件語句的外觀和操做方式與編程語言中的相同，條件語句支持if、else if和else語句，而且能夠嵌套。

條件語法是：

if EXPRESSION {
  ...
} else if EXPRESSION {
  ...
} else {
  ...
}

表達式是什麼？比較測試，布爾邏輯，等等！

你能夠使用如下比較運算符：

等式：==，!=，<，>，<=，>=
正則表達式：=~，!~（檢查右邊的模式對立左邊的字符串值）
包含：in，not in

支持的布爾運算符是：

and，or，nand，xor

支持的一元運算符是：

表達式能夠很長很複雜，表達式能夠包含其餘表達式，能夠用!來否認表達式，還能夠用括號(…)對它們進行分組。

例如，若是字段action的值爲login，下面的條件使用mutate過濾器來刪除字段secret：

filter {
  if [action] == "login" {
    mutate { remove_field => "secret" }
  }
}

能夠在單個條件中指定多個表達式：

output {
  # Send production errors to pagerduty
  if [loglevel] == "ERROR" and [deployment] == "production" {
    pagerduty {
    ...
    }
  }
}

你能夠使用in操做符來測試字段是否包含特定的字符串、鍵或（對於列表）元素：

filter {
  if [foo] in [foobar] {
    mutate { add_tag => "field in field" }
  }
  if [foo] in "foo" {
    mutate { add_tag => "field in string" }
  }
  if "hello" in [greeting] {
    mutate { add_tag => "string in field" }
  }
  if [foo] in ["hello", "world", "foo"] {
    mutate { add_tag => "field in list" }
  }
  if [missing] in [alsomissing] {
    mutate { add_tag => "shouldnotexist" }
  }
  if !("foo" in ["hello", "world"]) {
    mutate { add_tag => "shouldexist" }
  }
}

你也能夠用一樣的方式使用not in條件，例如，當grok成功時，你能夠使用not in只將事件路由到Elasticsearch：

output {
  if "_grokparsefailure" not in [tags] {
    elasticsearch { ... }
  }
}

你能夠檢查特定字段是否存在，但目前還沒法區分不存在的字段和不存在的字段，if [foo]表達式返回false時：

事件中不存在[foo]
事件中存在[foo]，但爲false
事件中存在[foo]，但爲null

有關更復雜的示例，請參閱Logstash配置示例。

@metadata字段

在Logstash 1.5和更高版本中，還有一個名爲@metadata的特殊字段，@metadata的內容在輸出時不會是你的任何事件的一部分，這使得它很好地用於條件，或使用字段引用和sprintf格式擴展和構建事件字段。

下面的配置文件將從STDIN生成事件，輸入的內容將成爲事件中的message字段，過濾器塊中的mutate事件將添加一些字段，其中一些嵌套在@metadata字段中。

input { stdin { } }

filter {
  mutate { add_field => { "show" => "This data will be in the output" } }
  mutate { add_field => { "[@metadata][test]" => "Hello" } }
  mutate { add_field => { "[@metadata][no_show]" => "This data will not be in the output" } }
}

output {
  if [@metadata][test] == "Hello" {
    stdout { codec => rubydebug }
  }
}

讓咱們看看結果是什麼：

$ bin/logstash -f ../test.conf
Pipeline main started
asdf
{
    "@timestamp" => 2016-06-30T02:42:51.496Z,
      "@version" => "1",
          "host" => "example.com",
          "show" => "This data will be in the output",
       "message" => "asdf"
}

輸入的「asdf」成爲message字段內容，條件成功地評估了嵌套在@metadata字段中的test字段的內容，可是輸出沒有顯示一個名爲@metadata的字段，或者它的內容。

rubydebug編解碼器容許你顯示@metadata字段的內容，若是你添加一個配置標誌，metadata => true：

stdout { codec => rubydebug { metadata => true } }

讓咱們看看這個更改後的輸出是什麼樣子的：

$ bin/logstash -f ../test.conf
Pipeline main started
asdf
{
    "@timestamp" => 2016-06-30T02:46:48.565Z,
     "@metadata" => {
           "test" => "Hello",
        "no_show" => "This data will not be in the output"
    },
      "@version" => "1",
          "host" => "example.com",
          "show" => "This data will be in the output",
       "message" => "asdf"
}

如今能夠看到@metadata字段及其子字段。

只有 rubydebug編解碼器容許顯示 @metadata字段的內容。

當你須要一個臨時字段但不但願它出如今最終輸出中時，請使用@metadata字段。

也許這個新字段最多見的用例之一是使用date過濾器並具備一個臨時的時間戳。

這個配置文件通過了簡化，可是使用了Apache和Nginx web服務器經常使用的時間戳格式，在過去，你必須本身刪除timestamp字段，而後使用它覆蓋@timestamp字段，對於@metadata字段，這再也不是必要的：

input { stdin { } }

filter {
  grok { match => [ "message", "%{HTTPDATE:[@metadata][timestamp]}" ] }
  date { match => [ "[@metadata][timestamp]", "dd/MMM/yyyy:HH:mm:ss Z" ] }
}

output {
  stdout { codec => rubydebug }
}

注意，在grok過濾器中的這個配置將提取的日期放入[@metadata][timestamp]字段中，讓咱們爲這個配置提供一個示例日期字符串，看看會獲得什麼：

$ bin/logstash -f ../test.conf
Pipeline main started
02/Mar/2014:15:36:43 +0100
{
    "@timestamp" => 2014-03-02T14:36:43.000Z,
      "@version" => "1",
          "host" => "example.com",
       "message" => "02/Mar/2014:15:36:43 +0100"
}

就是這樣！輸出中沒有額外的字段，並且配置文件更乾淨，由於在date過濾器中進行轉換後，不須要刪除「 timestamp」字段。

另外一個用例是CouchDB更改輸入插件（參閱https://github.com/logstash-plugins/logstash-input-couchdb_changes），這個插件自動將CouchDB文檔字段元數據捕獲到輸入插件自己的@metadata字段中。當事件經過Elasticsearch被索引時，Elasticsearch輸出插件容許你指定action（刪除、更新、插入等）和document_id，以下所示：

output {
  elasticsearch {
    action => "%{[@metadata][action]}"
    document_id => "%{[@metadata][_id]}"
    hosts => ["example.com"]
    index => "index_name"
    protocol => "http"
  }
}