Elasticsearch 理解mapping中的store屬性

時間 2021-08-13

標籤 html 數組 app elasticsearch ide code htm 索引文檔欄目日誌分析简体版

原文原文鏈接

默認狀況下，對字段值進行索引以使其可搜索，但不存儲它們 (store)。這意味着能夠查詢該字段，可是沒法檢索原始字段值。在這裏咱們必須理解的一點是: 若是一個字段的mapping中含有store屬性爲true，那麼有一個單獨的存儲空間爲這個字段作存儲，並且這個存儲是獨立於_source的存儲的。它具備更快的查詢。存儲該字段會佔用磁盤空間。若是須要從文檔中提取（即在腳本中和聚合），它會幫助減小計算。在聚合時，具備store屬性的字段會比不具備這個屬性的字段快。此選項的可能值爲false和true。html

一般這可有可無。該字段值已是_source字段的一部分，默認狀況下已存儲。若是您只想檢索單個字段或幾個字段的值，而不是整個_source的值，則能夠使用source filtering來實現。數組

在某些狀況下，存儲字段可能頗有意義。例如，若是您有一個帶有標題，日期和很大的內容字段的文檔，則可能只想檢索標題和日期，而沒必要從較大的_source字段中提取這些字段。app

接下來咱們仍是經過一個具體的例子來解釋這個，雖然上面的描述有點繞口。elasticsearch

首先咱們來建立一個叫作my_index的索引：ide

PUT my_index
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "store": true 
      },
      "date": {
        "type": "date",
        "store": true 
      },
      "content": {
        "type": "text"
      }
    }
  }
}

在上面的mapping中，咱們把title及date字段裏的store屬性設置爲true，代表有一個單獨的index fragement是爲它們而配備的，並存儲它們的值。咱們來寫入一個文檔到my_index索引中：ui

PUT my_index/_doc/1
{
  "title": "Some short title",
  "date": "2015-01-01",
  "content": "A very long content field..."
}

接下來，咱們來作一個搜索：code

GET my_index/_search

顯示的結果是：htm

"hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "title" : "Some short title",
          "date" : "2015-01-01",
          "content" : "A very long content field..."
        }
      }
    ]
  }

在上面咱們能夠在_source中看到這個文檔的title，date及content字段。索引

咱們能夠經過source filtering的方法提早咱們想要的字段：文檔

GET my_index/_search
{
  "_source": ["title", "date"]
}

顯示的結果是：

"hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "date" : "2015-01-01",
          "title" : "Some short title"
        }
      }
    ]
  }

顯然上面的結果顯示咱們想要的字段date及title是能夠從_source裏獲取的。

咱們也能夠經過以下的方法來獲取這兩個字段的值：

GET my_index/_search
{
  "stored_fields": [
    "title",
    "date"
  ]
}

返回的結果是：

"hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "fields" : {
          "date" : [
            "2015-01-01T00:00:00.000Z"
          ],
          "title" : [
            "Some short title"
          ]
        }
      }
    ]
  }

在上面，咱們能夠看出來在fields裏有一個date及title的數組返回查詢的結果。

也許咱們不少人想知道到底這個store到底有什麼用途呢？若是都能從_source裏獲得字段的值。

有一種就是咱們在開頭咱們已經說明的狀況：咱們有時候並不想存下全部的字段在_source裏，由於該字段的內容很大，或者咱們根本就不想存_source，可是有些字段，咱們仍是想要獲取它們的內容。那麼在這種狀況下，咱們就能夠使用store來實現。

咱們仍是用一個例子來講明。首先建立一個叫作my_index1的索引：

PUT my_index1
{
  "mappings": {
    "_source": {
      "enabled": false
    },
    "properties": {
      "title": {
        "type": "text",
        "store": true
      },
      "date": {
        "type": "date",
        "store": true
      },
      "content": {
        "type": "text",
        "store": false
      }
    }
  }
}

由於咱們認爲content字段的內容可能會很大，那麼我不想存這個字段。在上面，咱們也把_source的enabled開關設置爲false，代表將不存儲任何的source字段。接下來寫入一個文檔到my_index1裏去：

PUT my_index1/_doc/1
{
  "title": "Some short title",
  "date": "2015-01-01",
  "content": "A very long content field..."
}

一樣咱們來作一個搜索：

GET my_index1/_search
{
  "query": {
    "match": {
      "content": "content"
    }
  }
}

咱們能夠看到搜索的結果：

"hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "my_index1",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.2876821
      }
    ]
  }

在此次的顯示中，咱們沒有看到_source字段，這是由於咱們已經把它給disabled了。可是咱們能夠經過以下的方法來獲取那些store 字段：

GET my_index1/_search
{
  "stored_fields": [
    "title",
    "date"
  ],
  "query": {
    "match": {
      "content": "content"
    }
  }
}

返回結果是：

"hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "my_index1",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.2876821,
        "fields" : {
          "date" : [
            "2015-01-01T00:00:00.000Z"
          ],
          "title" : [
            "Some short title"
          ]
        }
      }
    ]
  }

咱們能夠在返回結果裏查看到date及title的值。

能夠合理地存儲字段的另外一種狀況是，對於那些未出如今_source字段（例如copy_to字段）中的字段。您能夠參閱個人另一篇文章「如何使用Elasticsearch中的copy_to來提升搜索效率」。

若是你想了解更多關於Elasticsearch的存儲，能夠閱讀文章「Elasticsearch：inverted index，doc_values及source」。

參考：