本文以 Elasticsearch 6.2.4爲例。html
通過前面的基礎入門,咱們對ES的基本操做也會了。如今來學習ES最強大的部分:全文檢索。git
先須要準備點數據,而後導入:github
wget https://raw.githubusercontent.com/elastic/elasticsearch/master/docs/src/test/resources/accounts.json curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/account/_bulk" --data-binary "@accounts.json"
這樣咱們就導入了1000條數據到ES。web
注意:
accounts.json
每行必須以\n
換行。若是提示The bulk request must be terminated by a newline [\n]
,請檢查最後一行是否以\n
換行。json
index是bank。咱們能夠查看如今有哪些index:數組
curl "localhost:9200/_cat/indices?format=json&pretty"
結果:app
[ { "health" : "yellow", "status" : "open", "index" : "bank", "uuid" : "MDxR02uESgKSynX6k8B-og", "pri" : "5", "rep" : "1", "docs.count" : "1000", "docs.deleted" : "0", "store.size" : "474.6kb", "pri.store.size" : "474.6kb" } ]
該小節是可選的,若是不感興趣,能夠跳過。less
該小節要求你已經搭建好了ElasticSearch + Kibana。curl
打開kibana web地址:http://127.0.0.1:5601,依次打開:Management
-> Kibana
-> Index Patterns
,選擇Create Index Pattern
:elasticsearch
a. Index pattern 輸入:bank
;
b. 點擊Create。
而後打開Discover,選擇 bank
就能看到剛纔導入的數據了。
咱們在可視化界面裏檢索數據:
是否是很酷!
接下來咱們使用API來實現檢索。
uri檢索是經過提供請求參數純粹使用URI來執行搜索請求。
GET /bank/_search?q=Virginia&pretty GET /bank/_search?q=firstname:Virginia
curl:
curl -XGET "localhost:9200/bank/_search?q=Virginia&pretty" curl -XGET "localhost:9200/bank/_search?q=firstname:Virginia&pretty"
解釋:檢索關鍵字爲"Virginia"的結果。結果示例:
{ "took": 4, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 2, "max_score": 4.631368, "hits": [ { "_index": "bank", "_type": "account", "_id": "298", "_score": 4.631368, "_source": { "account_number": 298, "balance": 34334, "firstname": "Bullock", "lastname": "Marsh", "age": 20, "gender": "M", "address": "589 Virginia Place", "employer": "Renovize", "email": "bullockmarsh@renovize.com", "city": "Coinjock", "state": "UT" } }, { "_index": "bank", "_type": "account", "_id": "25", "_score": 4.6146765, "_source": { "account_number": 25, "balance": 40540, "firstname": "Virginia", "lastname": "Ayala", "age": 39, "gender": "F", "address": "171 Putnam Avenue", "employer": "Filodyne", "email": "virginiaayala@filodyne.com", "city": "Nicholson", "state": "PA" } } ] } }
返回字段含義:
參數:
fieldName
或 fieldName:asc/
的形式fieldName:desc
。fieldName
能夠是文檔中的實際字段,也能夠是特殊_score
名稱,表示基於分數的排序。能夠有幾個sort參數(順序很重要)。詳見: https://www.elastic.co/guide/en/elasticsearch/reference/6.2/search-uri-request.html
示例:
GET /bank/_search?q=*&sort=account_number:asc&pretty
解釋:全部結果經過account_number字段升序排列。默認只返回前10條。
下面的查詢與上面的含義一致:
GET /bank/_search { "query": { "multi_match" : { "query" : "Virginia", "fields" : ["_all"] } } } GET /bank/_search { "query": { "match_all": {} }, "sort": [ { "account_number": "asc" } ] }
一般咱們會採用傳JSON方式查詢。Elasticsearch提供了一種JSON樣式的特定於域的語言,可用於執行查詢。這被稱爲查詢DSL。
注意:上述的查詢裏面咱們僅指定了index,並無指定type,那麼ES將不會區分type。若是想區分,請在URI後面追加type。示例:
GET /bank/account/_search
。
GET /bank/_search { "query" : { "match" : { "address" : "Avenue" } } }
curl:
curl -XGET -H "Content-Type: application/json" "localhost:9200/bank/_search?pretty" -d '{"query":{"match":{"address":"Avenue"}}}'
上述查詢返回結果是address
含有Avenue
的結果。
GET /bank/_search { "query" : { "term" : { "address" : "Avenue" } } }
curl:
curl -XGET -H "Content-Type: application/json" "localhost:9200/bank/_search?pretty" -d '{"query":{"term":{"address":"Avenue"}}}'
上述查詢返回結果是address
等於Avenue
的結果。
注:若是一個字段既須要分詞搜索,又須要精準匹配,最好是一開始設置mapping的時候就設置正確。例如:經過增長
.keyword
字段來支持精準匹配:
{ "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }
這樣至關於有
address
和address.keyword
兩個字段。這個後面mapping章節再講解。
分頁使用關鍵字from、size,分別表示偏移量、分頁大小。
GET /bank/_search { "query": { "match_all": {} }, "from": 0, "size": 2 }
from默認是0,size默認是10。
注意:ES的from、size分頁不是真正的分頁,稱之爲淺分頁。from+ size不能超過
index.max_result_window
默認爲10,000
的索引設置。有關 更有效的深度滾動方法,請參閱 Scroll或 Search After API。
字段排序關鍵字是sort。支持升序(asc)、降序(desc)。默認是對_score
字段進行排序。
GET /bank/_search { "query": { "match_all": {} }, "sort": [ { "account_number": "asc" } ], "from":0, "size":10 }
多個字段排序:
GET /bank/_search { "query": { "match_all": {} }, "sort": [ { "account_number": "asc" }, { "_score": "asc" } ], "from":0, "size":10 }
先按照account_number
排序,再按照_score
排序。
容許基於自定義腳本進行排序,這是一個示例:
GET bank/account/_search { "query": { "range": { "age": {"gt": 20} }}, "sort" : { "_script" : { "type" : "number", "script" : { "lang": "painless", "source": "doc['account_number'].value * params.factor", "params" : { "factor" : 1.1 } }, "order" : "asc" } } }
上述查詢是使用腳本進行排序:按 account_number*1.1
的結果進行升序。其中lang
指的是使用的腳本語言類型爲painless
。painless
支持Math.log
函數。
上述例子僅僅是演示使用方法,沒有實際含義。
默認狀況下,ES返回全部字段。這被稱爲源(_source
搜索命中中的字段)。若是咱們不但願返回全部字段,咱們能夠只請求返回源中的幾個字段。
GET /bank/_search { "query": { "match_all": {} }, "_source": ["account_number", "balance"] }
經過_source
關鍵字能夠實現字段過濾。
能夠經過腳本動態返回新定義字段。示例:
GET bank/account/_search { "query" : { "match_all": {} }, "size":2, "script_fields" : { "age2" : { "script" : { "lang": "painless", "source": "doc['age'].value * 2" } }, "age3" : { "script" : { "lang": "painless", "source": "params['_source']['age'] * params.factor", "params" : { "factor" : 2.0 } } } } }
結果:
{ "took": 2, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 1000, "max_score": 1, "hits": [ { "_index": "bank", "_type": "account", "_id": "25", "_score": 1, "fields": { "age3": [ 78 ], "age2": [ 78 ] } }, { "_index": "bank", "_type": "account", "_id": "44", "_score": 1, "fields": { "age3": [ 74 ], "age2": [ 74 ] } } ] } }
注意:使用
doc['my_field_name'].value
比使用params['_source']['my_field_name']
更快更效率,推薦使用。
若是咱們想同時查詢符合A和B字段的結果,該怎麼查呢?可使用must關鍵字組合。
GET /bank/_search { "query": { "bool": { "must": [ { "match": { "address": "mill" } }, { "match": { "address": "lane" } } ] } } } GET /bank/_search { "query": { "bool": { "must": [ { "match": { "account_number":136 } }, { "match": { "address": "lane" } }, { "match": { "city": "Urie" } } ] } } }
must也等價於:
GET /bank/_search { "query": { "bool": { "must": [ { "match": { "address": "mill" } } ], "must": [ { "match": { "address": "lane" } } ] } } }
這種至關於先查詢A再查詢B,而上面的則是同時查詢符合A和B,但結果是同樣的,執行效率可能有差別。有知道緣由的朋友能夠告知。
ES使用should關鍵字來實現OR查詢。
GET /bank/_search { "query": { "bool": { "should": [ { "match": { "account_number":136 } }, { "match": { "address": "lane" } }, { "match": { "city": "Urie" } } ] } } }
must_not
關鍵字實現了既不包含A也不包含B的查詢。
GET /bank/_search { "query": { "bool": { "must_not": [ { "match": { "address": "mill" } }, { "match": { "address": "lane" } } ] } }
表示 address 字段須要符合既不包含 mill 也不包含 lane。
咱們能夠組合 must 、should 、must_not 進行復雜的查詢。
GET /bank/_search { "query": { "bool": { "must": [ { "match": { "age": 40 } } ], "must_not": [ { "match": { "state": "ID" } } ] } } }
至關於SQL:
select * from bank where age=40 and state!= "ID";
GET /bank/_search { "query":{ "bool":{ "must":[ {"match":{"age":39}}, {"bool":{"should":[ {"match":{"city":"Nicholson"}}, {"match":{"city":"Yardville"}} ]} } ] } } }
至關於SQL:
select * from bank where age=39 and (city="Nicholson" or city="Yardville");
GET /bank/_search { "query": { "bool": { "must": { "match_all": {} }, "filter": { "range": { "balance": { "gte": 20000, "lte": 30000 } } } } } }
若是僅僅是單字段範圍查詢,也能夠直接省略 must、filter等關鍵字:
GET /bank/_search { "query":{ "range":{ "balance":{ "gte":20000, "lte":30000 } } } }
至關於SQL:
select * from bank where balance between 20000 and 30000;
多字段範圍查詢:
GET /bank/_search { "query": { "bool": { "must": { "match_all": {} }, "filter": { "bool":{ "must":[ {"range": {"balance": {"gte": 20000,"lte": 30000}}}, {"range": {"age": {"gte": 30}}} ] } } } } }
ES能夠高亮返回結果裏的關鍵字,使用html標記標出。
GET bank/account/_search { "query" : { "match": { "address": "Avenue" } }, "from": 0, "size": 1, "highlight" : { "require_field_match": false, "fields": { "*" : { } } } }
輸出:
{ "took": 10, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 214, "max_score": 1.5814995, "hits": [ { "_index": "bank", "_type": "account", "_id": "102", "_score": 1.5814995, "_source": { "account_number": 102, "balance": 29712, "firstname": "Dena", "lastname": "Olson", "age": 27, "gender": "F", "address": "759 Newkirk Avenue", "employer": "Hinway", "email": "denaolson@hinway.com", "city": "Choctaw", "state": "NJ" }, "highlight": { "address": [ "759 Newkirk <em>Avenue</em>" ] } } ] } }
返回結果裏的highlight
部分就是高亮結果,默認使用<em>
標出。若是須要修改,可使用pre_tags
設置修改:
"fields": { "*" : { "pre_tags" : ["<strong>"], "post_tags" : ["</strong>"] } }
*
表明全部字段都高亮,也能夠只高亮具體的字段,直接用具體字段替換*
便可。
require_field_match
:默認狀況下,僅突出顯示包含查詢匹配的字段。設置require_field_match爲false突出顯示全部字段。默認爲true。詳見:https://www.elastic.co/guide/en/elasticsearch/reference/6.2/search-request-highlighting.html
GET /bank/_search { "size": 0, "aggs": { "group_by_state": { "terms": { "field": "state.keyword" } } } }
結果:
{ "took": 29, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped" : 0, "failed": 0 }, "hits" : { "total" : 1000, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "group_by_state" : { "doc_count_error_upper_bound": 20, "sum_other_doc_count": 770, "buckets" : [ { "key" : "ID", "doc_count" : 27 }, { "key" : "TX", "doc_count" : 27 }, { "key" : "AL", "doc_count" : 25 }, { "key" : "MD", "doc_count" : 25 }, { "key" : "TN", "doc_count" : 23 }, { "key" : "MA", "doc_count" : 21 }, { "key" : "NC", "doc_count" : 21 }, { "key" : "ND", "doc_count" : 21 }, { "key" : "ME", "doc_count" : 20 }, { "key" : "MO", "doc_count" : 20 } ] } } }
查詢結果返回了ID州(Idaho)有27個帳戶,TX州(Texas)有27個帳戶。
至關於SQL:
SELECT state, COUNT(*) FROM bank GROUP BY state ORDER BY COUNT(*) DESC
該查詢意思是按照字段state分組,返回前10個聚合結果。
其中size設置爲0意思是不返回文檔內容,僅返回聚合結果。state.keyword
表示字段精確匹配,由於使用模糊匹配性能很低,因此不支持。
咱們能夠在聚合的基礎上再進行聚合,例如求和、求平均值等等。
GET /bank/_search { "size": 0, "aggs": { "group_by_state": { "terms": { "field": "state.keyword" }, "aggs": { "average_balance": { "avg": { "field": "balance" } } } } } }
上述查詢實現了在前一個聚合的基礎上,按州計算平均賬戶餘額(一樣僅針對按降序排序的前10個州)。
咱們能夠在聚合中任意嵌套聚合,以從數據中提取所需的統計數據。
在前一個聚合的基礎上,咱們如今按降序排列平均餘額:
GET /bank/_search { "size": 0, "aggs": { "group_by_state": { "terms": { "field": "state.keyword", "order": { "average_balance": "desc" } }, "aggs": { "average_balance": { "avg": { "field": "balance" } } } } } }
這裏基於第二個聚合結果進行倒序排列。其實上一個例子隱藏了默認排序,也就是默認按照_sort
(分值)倒序:
GET /bank/_search { "size": 0, "aggs": { "group_by_state": { "terms": { "field": "state.keyword", "order": { "_sort": "desc" } }, "aggs": { "average_balance": { "avg": { "field": "balance" } } } } } }
此示例演示了咱們如何按年齡段(20-29歲,30-39歲和40-49歲)進行分組,而後按性別分組,最後獲得每一個年齡段的平均賬戶餘額:
GET /bank/_search { "size": 0, "aggs": { "group_by_age": { "range": { "field": "age", "ranges": [ { "from": 20, "to": 30 }, { "from": 30, "to": 40 }, { "from": 40, "to": 50 } ] }, "aggs": { "group_by_gender": { "terms": { "field": "gender.keyword" }, "aggs": { "average_balance": { "avg": { "field": "balance" } } } } } } } }
這個結果就複雜了,屬於嵌套分組,結果也是嵌套的:
{ "took": 5, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 1000, "max_score": 0, "hits": [] }, "aggregations": { "group_by_age": { "buckets": [ { "key": "20.0-30.0", "from": 20, "to": 30, "doc_count": 451, "group_by_gender": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "M", "doc_count": 232, "average_balance": { "value": 27374.05172413793 } }, { "key": "F", "doc_count": 219, "average_balance": { "value": 25341.260273972603 } } ] } }, { "key": "30.0-40.0", "from": 30, "to": 40, "doc_count": 504, "group_by_gender": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "F", "doc_count": 253, "average_balance": { "value": 25670.869565217392 } }, { "key": "M", "doc_count": 251, "average_balance": { "value": 24288.239043824702 } } ] } }, { "key": "40.0-50.0", "from": 40, "to": 50, "doc_count": 45, "group_by_gender": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "M", "doc_count": 24, "average_balance": { "value": 26474.958333333332 } }, { "key": "F", "doc_count": 21, "average_balance": { "value": 27992.571428571428 } } ] } } ] } } }
首先你們看下面的例子有什麼區別:
已知條件:ES裏address
爲171 Putnam Avenue
的數據有1條;address
爲Putnam
的數據有0條。index爲bank,type爲account,文檔ID爲25。
GET /bank/_search { "query": { "match" : { "address" : "Putnam" } } } GET /bank/_search { "query": { "match" : { "address.keyword" : "Putnam" } } } GET /bank/_search { "query": { "term" : { "address" : "Putnam" } } }
結果:
一、第一個能匹配到數據,由於會分詞查詢。
二、第二個不能匹配到數據,由於不分詞的話沒有該條數據。
三、結果不肯定。須要看實際是怎麼分詞的。
咱們經過下列查詢能夠知曉該條數據字段address
的分詞狀況:
GET /bank/account/25/_termvectors?fields=address
結果:
{ "_index": "bank", "_type": "account", "_id": "25", "_version": 1, "found": true, "took": 0, "term_vectors": { "address": { "field_statistics": { "sum_doc_freq": 591, "doc_count": 197, "sum_ttf": 591 }, "terms": { "171": { "term_freq": 1, "tokens": [ { "position": 0, "start_offset": 0, "end_offset": 3 } ] }, "avenue": { "term_freq": 1, "tokens": [ { "position": 2, "start_offset": 11, "end_offset": 17 } ] }, "putnam": { "term_freq": 1, "tokens": [ { "position": 1, "start_offset": 4, "end_offset": 10 } ] } } } } }
能夠看出該條數據字段address
一共分了3個詞:
171 avenue putnam
如今能夠得出第三個查詢的答案:匹配不到!但值改爲小寫的putnam
又能匹配到了!
緣由是:
因爲Putnam
不在分詞裏(大小寫敏感),因此匹配不到。match query先對filed進行分詞,也就是分紅putnam
,再去匹配倒排索引中的term,因此能匹配到。
standard
analyzer 分詞器分詞默認會將大寫字母所有轉爲小寫字母。
一、Getting Started | Elasticsearch Reference [6.2] | Elastic https://www.elastic.co/guide/en/elasticsearch/reference/6.2/getting-started.html 二、Elasticsearch 5.x 關於term query和match query的認識 - wangchuanfu - 博客園 https://www.cnblogs.com/wangchuanfu/p/7444253.html