(1) 計算每一個tag下的文檔數量, 請求語法:html
GET book_shop/it_book/_search { "size": 0, // 不顯示命中(hits)的全部文檔信息 "aggs": { "group_by_tags": { // 聚合結果的名稱, 須要自定義(複製時請去掉此註釋) "terms": { "field": "tags" } } } }
(2) 發生錯誤:java
說明: 索引book_shop的mapping映射是ES自動建立的, 它把tag解析成了text類型, 在發起對tag的聚合請求後, 將拋出以下錯誤:編程
{ "error": { "root_cause": [ { "type": "illegal_argument_exception", "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [tags] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead." } ], "type": "search_phase_execution_exception", "reason": "all shards failed", "phase": "query", "grouped": true, "failed_shards": [......] }, "status": 400 }
(3) 錯誤分析:json
錯誤信息:
Set fielddata=true on [xxxx] ......
錯誤分析: 默認狀況下, Elasticsearch 對 text 類型的字段(field)禁用了 fielddata;
text 類型的字段在建立索引時會進行分詞處理, 而聚合操做必須基於字段的原始值進行分析;
因此若是要對 text 類型的字段進行聚合操做, 就須要存儲其原始值 —— 建立mapping時指定fielddata=true
, 以便經過反轉倒排索引(即正排索引)將索引數據加載至內存中.app
(4) 解決方案一: 對text類型的字段開啓fielddata屬性:jvm
將要分組統計的text field(即tags)的fielddata設置爲true:elasticsearch
PUT book_shop/_mapping/it_book { "properties": { "tags": { "type": "text", "fielddata": true } } }
可參考官方文檔進行設置:
https://www.elastic.co/guide/en/elasticsearch/reference/6.6/fielddata.html. 成功後的結果以下:編程語言
{ "acknowledged": true }
再次統計, 獲得的結果以下:ide
{ "took": 153, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 4, "max_score": 0.0, "hits": [] }, "aggregations": { "group_by_tags": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 6, "buckets": [ { "key": "java", "doc_count": 3 }, { "key": "程", "doc_count": 2 }, ...... ] } } }
(5) 解決方法二: 使用內置keyword字段:ui
開啓fielddata將佔用大量的內存.
Elasticsearch 5.x 版本開始支持經過text的內置字段keyword做精確查詢、聚合分析:
GET shop/it_book/_search { "size": 0, "aggs": { "group_by_tags": { "terms": { "field": "tags.keyword" // 使用text類型的內置keyword字段 } } } }
(1) 統計name中含有「jvm」的圖書中每一個tag的文檔數量, 請求語法:
GET book_shop/it_book/_search { "query": { "match": { "name": "jvm" } }, "aggs": { "group_by_tags": { // 聚合結果的名稱, 須要自定義. 下面使用內置的keyword字段: "terms": { "field": "tags.keyword" } } } }
(2) 響應結果:
{ "took" : 7, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.64072424, "hits" : [ { "_index" : "book_shop", "_type" : "it_book", "_id" : "2", "_score" : 0.64072424, "_source" : { "name" : "深刻理解Java虛擬機:JVM高級特性與最佳實踐", "author" : "周志明", "category" : "編程語言", "desc" : "Java圖書領域公認的經典著做", "price" : 79.0, "date" : "2013-10-01", "publisher" : "機械工業出版社", "tags" : [ "Java", "虛擬機", "最佳實踐" ] } } ] }, "aggregations" : { "group_by_tags" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "Java", "doc_count" : 1 }, { "key" : "最佳實踐", "doc_count" : 1 }, { "key" : "虛擬機", "doc_count" : 1 } ] } } }
爲某個 text 類型的字段開啓fielddata字段後, 聚合分析操做會對這個字段的全部分詞分別進行聚合, 得到的結果大多數狀況下並不符合咱們的需求.
使用keyword內置字段, 不會對相關的分詞進行聚合, 結果可能更有用.
—— 推薦使用text類型字段的內置keyword進行聚合操做.
(1) 先按tags分組, 再計算每一個tag下圖書的平均價格, 請求語法:
GET book_shop/it_book/_search { "size": 0, "aggs": { "group_by_tags": { "terms": { "field": "tags.keyword" }, "aggs": { "avg_price": { "avg": { "field": "price" } } } } } }
(2) 響應結果:
"hits" : { "total" : 3, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "group_by_tags" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "Java", "doc_count" : 3, "avg_price" : { "value" : 102.33333333333333 } }, { "key" : "編程語言", "doc_count" : 2, "avg_price" : { "value" : 114.0 } }, ...... ] } }
(1) 計算每一個tag下圖書的平均價格, 再按平均價格降序排序, 查詢語法:
GET book_shop/it_book/_search { "size": 0, "aggs": { "all_tags": { "terms": { "field": "tags.keyword", "order": { "avg_price": "desc" } // 根據下述統計的結果排序 }, "aggs": { "avg_price": { "avg": { "field": "price" } } } } } }
(2) 響應結果:
與#2.1節內容類似, 區別在於按照價格排序顯示了.
(1) 先按價格區間分組, 組內再按tags分組, 計算每一個tags組的平均價格, 查詢語法:
GET book_shop/it_book/_search { "size": 0, "aggs": { "group_by_price": { "range": { "field": "price", "ranges": [ { "from": 00, "to": 100 }, { "from": 100, "to": 150 } ] }, "aggs": { "group_by_tags": { "terms": { "field": "tags.keyword" }, "aggs": { "avg_price": { "avg": { "field": "price" } } } } } } } }
(2) 響應結果:
"hits" : { "total" : 3, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "group_by_price" : { "buckets" : [ { "key" : "0.0-100.0", // 區間0.0-100.0 "from" : 0.0, "to" : 100.0, "doc_count" : 1, // 共查找到了3條文檔 "group_by_tags" : { // 對tags分組聚合 "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "Java", "doc_count" : 1, "avg_price" : { "value" : 79.0 } }, ...... ] } }, { "key" : "100.0-150.0", "from" : 100.0, "to" : 150.0, "doc_count" : 2, "group_by_tags" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "Java", "doc_count" : 2, "avg_price" : { "value" : 114.0 } }, ...... } ] } } ] } }
版權聲明
出處: 博客園 馬瘦風的博客(https://www.cnblogs.com/shoufeng)
感謝閱讀, 若是文章有幫助或啓發到你, 點個[好文要頂👆] 或 [推薦👍] 吧😜
本文版權歸博主全部, 歡迎轉載, 但 [必須在文章頁面明顯位置標明原文連接], 不然博主保留追究相關人員法律責任的權利.