ElasticSearch 數據建模

時間 2021-03-02

標籤 html git github shell 網絡 app elasticsearch ide 性能優化欄目日誌分析简体版

原文原文鏈接

公號：碼農充電站pro
主頁：https://codeshellme.github.iohtml

一般在使用 ES 構建數據模型時，須要考慮如下幾點：git

字段類型
是否須要搜索與分詞
是否須要聚合與排序
是否須要額外的存儲

1，字段類型

對於不一樣類型的數據，主要考慮下面幾點：github

對於 Text 類型：用於全文本字段，數據會被分詞器分詞。
- 默認不支持聚合分析及排序，須要設置 fielddata 爲 true。
對於 Keyword 類型：用於不須要分詞處理的文本，例如手機號，email 地址，性別等。
- 適用於精確匹配，支持聚合與排序。
對於多字段類型：默認狀況下，ES 會爲將文本設置爲 text 類型，並添加一個 keyword 子字段。
- 在處理人類語言時，能夠經過增長「英文」，「拼音」和「標準」分詞器，來知足搜索需求。
對於數值類型：儘可能選擇貼近的類型。好比 byte 類型能知足需求，就不要用 long。

2，搜索需求

對於搜索需求，主要考慮如下幾點：shell

若是不須要檢索，排序和聚合，可將 enabled 設置成 false，以減小沒必要要的處理（磁盤開銷），來提升性能。
若是不須要檢索，但須要排序與聚合，可將 index 設置成 false。

3，聚合與排序

對於聚合與排序，主要考慮如下幾點：網絡

若是不須要檢索，排序和聚合，可將 enabled 設置成 false。
若是須要檢索，但不須要排序與聚合，可將 doc_values 和 fielddata 設置成 false。
對於keyword 類型的字段，若是更新與聚合比較頻繁，推薦將 eager_global_ordinals 設置爲 true（能夠達到利用緩衝的目的，提升性能）。

4，額外存儲

將 store 設置爲 true（默認爲 false），能夠存儲字段的原始內容；通常在 _source 的 enabled 爲 false 時使用。app

5，示例

若是須要對一些圖書信息進行建模，需求以下：elasticsearch

書名：支持全文本及精確匹配
簡介：支持全文本
做者：支持精確匹配
出版日期：日期類型
圖書封面：不須要支持搜索

示例數據以下：ide

{
  "title":"Mastering ElasticSearch 5.0",
  "description":"Master the searching, indexing, and aggregation features in ElasticSearch Improve users’ search experience with Elasticsearch’s functionalities and develop your own Elasticsearch plugins",
  "author":"Bharvi Dixit",
  "public_date":"2017",
  "cover_url":"https://images-na.ssl-images-amazon.com/images/I/51OeaMFxcML.jpg"
}

若是不手動設置 mapping，那麼每一個字段將被 ES 設置爲以下類型：性能

{
  "type" : "text",   # text 類型
  "fields" : {       # 並添加一個 keyword 子字段
    "keyword" : {
      "type" : "keyword",
      "ignore_above" : 256
    }
  }
}

5.1，手動設置 mapping

下面根據需求，手動設置 mapping：優化

PUT books
{
	"mappings": {
		"properties": {
			"author": {
				"type": "keyword"
			},
			"cover_url": {
				"type": "keyword",
				"index": false      # 不須要支持搜索
			},
			"description": {
				"type": "text"
			},
			"public_date": {
				"type": "date"
			},
			"title": {
				"type": "text",
				"fields": {
					"keyword": {
						"type": "keyword",
						"ignore_above": 100
					}
				}
			}
		}
	}
}

5.2，增長需求

若是如今須要添加一個字段 content，用於存儲圖書的內容，所以該字段的信息量將很是大，這將致使 _source 的內容過大，致使過大的網絡開銷。

爲了優化，能夠將 _source 的 enabled 設置爲 false，而後將每一個字段的 store 設置爲 true（打開額外存儲）。

以下：

PUT books
{
	"mappings": {
		"_source": {
			"enabled": false    # enabled 爲 false
		},
		"properties": {
			"author": {
				"type": "keyword",
				"store": true   # store 爲 true
			},
			"cover_url": {
				"type": "keyword",
				"index": false,
				"store": true  # store 爲 true
			},
			"description": {
				"type": "text",
				"store": true  # store 爲 true
			},
			"content": {
				"type": "text",
				"store": true  # store 爲 true
			},
			"public_date": {
				"type": "date",
				"store": true # store 爲 true
			},
			"title": {
				"type": "text",
				"fields": {
					"keyword": {
						"type": "keyword",
						"ignore_above": 100
					}
				},
				"store": true # store 爲 true
			}
		}
	}
}

將 _source 禁止掉以後，查詢的結果中就沒有了 _source 字段；若是須要哪些字段的內容，則須要設置 stored_fields，以下：

POST books/_search
{
  "stored_fields": ["title","author","public_date"],
  "query": {
    "match": {
      "content": "searching"
    }
  }
}

（本節完。）

推薦閱讀：

ElasticSearch DSL 查詢

ElasticSearch 文檔及操做

ElasticSearch 搜索模板與建議

ElasticSearch 聚合分析

ElasticSearch 中的 Mapping

歡迎關注做者公衆號，獲取更多技術乾貨。

相關標籤/搜索

數學建模

建模

elasticsearch+elasticsearch

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。