一、 Elasticsearch
的請求與結果node
請求結構
curl -X<VERB> '<PROTOCOL>://<HOST>:<PORT>/<PATH>?<QUERY_STRING>' -d '<BODY>'
- VERB HTTP方法:GET, POST, PUT, HEAD, DELETE
- PROTOCOL http或者https協議(只有在Elasticsearch前面有https代理的時候可用)
- HOST Elasticsearch集羣中的任何一個節點的主機名,若是是在本地的節點,那麼就叫localhost
- PORT Elasticsearch HTTP服務所在的端口,默認爲9200
- PATH API路徑(例如_count將返回集羣中文檔的數量),PATH能夠包含多個組件,例如_cluster/stats或者_nodes/stats/jvm
- QUERY_STRING 一些可選的查詢請求參數,例如?pretty參數將使請求返回更加美觀易讀的JSON數據
BODY 一個JSON格式的請求主體(若是請求須要的話)
PUT建立(索引建立)
$ curl -XPUT 'http://localhost:9200/megacorp/employee/3?pretty' -d ' { "first_name" : "Douglas", "last_name" : "Fir", "age" : 35, "about": "I like to build cabinets", "interests": [ "forestry" ] } ’{ "_index" : "megacorp", "_type" : "employee", "_id" : "3", "_version" : 1, "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "created" : true }GET請求(搜索)
檢索文檔
$ curl -XGET 'http://localhost:9200/megacorp/employee/1?pretty'{ "_index" : "megacorp", "_type" : "employee", "_id" : "1", "_version" : 1, "found" : true, "_source" : { "first_name" : "John", "last_name" : "Smith", "age" : 25, "about" : "I love to go rock climbing", "interests" : [ "sports", "music" ] } }簡單搜索
使用
megacorp
索引和employee
類型,可是咱們在結尾使用關鍵字_search來取代原來的文檔ID。響應內容的hits數組中包含了咱們全部的三個文檔。默認狀況下搜索會返回前10個結果。數據庫$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty'{ "took" : 2, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 3, "max_score" : 1.0, "hits" : [ { "_index" : "megacorp", "_type" : "employee", "_id" : "2", "_score" : 1.0, "_source" : { "first_name" : "Jane", "last_name" : "Smith", "age" : 32, "about" : "I like to collect rock albums", "interests" : [ "music" ] } }, { "_index" : "megacorp", "_type" : "employee", "_id" : "1", "_score" : 1.0, "_source" : { "first_name" : "John", "last_name" : "Smith", "age" : 25, "about" : "I love to go rock climbing", "interests" : [ "sports", "music" ] } }, { "_index" : "megacorp", "_type" : "employee", "_id" : "3", "_score" : 1.0, "_source" : { "first_name" : "Douglas", "last_name" : "Fir", "age" : 35, "about" : "I like to build cabinets", "interests" : [ "forestry" ] } } ] } }接下來,讓咱們搜索姓氏中包含「Smith」的員工。咱們將在命令行中使用輕量級的搜索方法。這種方法常被稱做查詢字符串(query string)搜索,由於咱們像傳遞URL參數同樣去傳遞查詢語句:數組
$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?q=last_name:Smith&pretty'{ "took" : 4, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 2, "max_score" : 0.30685282, "hits" : [ { "_index" : "megacorp", "_type" : "employee", "_id" : "2", "_score" : 0.30685282, "_source" : { "first_name" : "Jane", "last_name" : "Smith", "age" : 32, "about" : "I like to collect rock albums", "interests" : [ "music" ] } }, { "_index" : "megacorp", "_type" : "employee", "_id" : "1", "_score" : 0.30685282, "_source" : { "first_name" : "John", "last_name" : "Smith", "age" : 25, "about" : "I love to go rock climbing", "interests" : [ "sports", "music" ] } } ] } }使用DSL語句查詢
查詢字符串搜索便於經過命令行完成特定(ad hoc)的搜索,可是它也有侷限性(參閱簡單搜索章節)。Elasticsearch提供豐富且靈活的查詢語言叫作DSL查詢(Query DSL),它容許你構建更加複雜、強大的查詢。curl
DSL(Domain Specific Language特定領域語言)以JSON請求體的形式出現。咱們能夠這樣表示以前關於「Smith」的查詢:jvm
$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d ' { "query" : { "match" : { "last_name" : "Smith" } } } '更復雜的搜索
咱們讓搜索稍微再變的複雜一些。咱們依舊想要找到姓氏爲「Smith」的員工,可是咱們只想獲得年齡大於30歲的員工。咱們的語句將添加過濾器(filter),它使得咱們高效率的執行一個結構化搜索:elasticsearch
$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d ' { "query" : { "filtered" : { "filter" : { "range" : { "age" : { "gt" : 30 } --<1> } }, "query" : { "match" : { "last_name" : "smith" --<2> } } } } } '
- <1> 這部分查詢屬於區間過濾器(range filter),它用於查找全部年齡大於30歲的數據——gt爲"greater than"的縮寫。
- <2> 這部分查詢與以前的match語句(query)一致。
{ "took" : 2, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.30685282, "hits" : [ { "_index" : "megacorp", "_type" : "employee", "_id" : "2", "_score" : 0.30685282, "_source" : { "first_name" : "Jane", "last_name" : "Smith", "age" : 32, "about" : "I like to collect rock albums", "interests" : [ "music" ] } } ] } }全文搜索
到目前爲止搜索都很簡單:搜索特定的名字,經過年齡篩選。讓咱們嘗試一種更高級的搜索,全文搜索——一種傳統數據庫很難實現的功能。學習
咱們將會搜索全部喜歡「rock climbing」的員工:ui
$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d ' { "query" : { "match" : { "about" : "rock climbing" } } } '你能夠看到咱們使用了以前的
match
查詢,從about
字段中搜索"rock climbing",咱們獲得了兩個匹配文檔:url{ "took" : 3, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 2, "max_score" : 0.16273327, "hits" : [ { "_index" : "megacorp", "_type" : "employee", "_id" : "1", "_score" : 0.16273327,<1> "_source" : { "first_name" : "John", "last_name" : "Smith", "age" : 25, "about" : "I love to go rock climbing", "interests" : [ "sports", "music" ] } }, { "_index" : "megacorp", "_type" : "employee", "_id" : "2", "_score" : 0.016878016,<2> "_source" : { "first_name" : "Jane", "last_name" : "Smith", "age" : 32, "about" : "I like to collect rock albums", "interests" : [ "music" ] } } ] } }
- <1><2> 結果相關性評分。
默認狀況下,Elasticsearch根據結果相關性評分來對結果集進行排序,所謂的「結果相關性評分」就是文檔與查詢條件的匹配程度。很顯然,排名第一的
John Smith
的about
字段明確的寫到「rock climbing」命令行可是爲何
Jane Smith
也會出如今結果裏呢?緣由是「rock」在她的abuot字段中被說起了。由於只有「rock」被說起而「climbing」沒有,因此她的_score
要低於John。短語搜索
目前咱們能夠在字段中搜索單獨的一個詞,這挺好的,可是有時候你想要確切的匹配若干個單詞或者短語(phrases)。例如咱們想要查詢同時包含"rock"和"climbing"(而且是相鄰的)的員工記錄。
要作到這個,咱們只要將
match
查詢變動爲match_phrase
查詢便可:$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d ' { "query" : { "match_phrase" : { "about" : "rock climbing" } } } '{ "took" : 16, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.23013961, "hits" : [ { "_index" : "megacorp", "_type" : "employee", "_id" : "1", "_score" : 0.23013961, "_source" : { "first_name" : "John", "last_name" : "Smith", "age" : 25, "about" : "I love to go rock climbing", "interests" : [ "sports", "music" ] } } ] } }高亮咱們的搜索
不少應用喜歡從每一個搜索結果中高亮(highlight)匹配到的關鍵字,這樣用戶能夠知道爲何這些文檔和查詢相匹配。在Elasticsearch中高亮片斷是很是容易的。
讓咱們在以前的語句上增長
highlight
參數:$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d ' { "query" : { "match_phrase" : { "about" : "rock climbing" } }, "highlight": { "fields" : { "about" : {} } } } '當咱們運行這個語句時,會命中與以前相同的結果,可是在返回結果中會有一個新的部分叫作
highlight
,這裏包含了來自about
字段中的文本,而且用<em></em>來標識匹配到的單詞。{ "took" : 33, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.23013961, "hits" : [ { "_index" : "megacorp", "_type" : "employee", "_id" : "1", "_score" : 0.23013961, "_source" : { "first_name" : "John", "last_name" : "Smith", "age" : 25, "about" : "I love to go rock climbing", "interests" : [ "sports", "music" ] }, "highlight" : { "about" : [ "I love to go <em>rock</em> <em>climbing</em>" ] } } ] } }聚合
分析
最後,咱們還有一個需求須要完成:容許管理者在職員目錄中進行一些分析。 Elasticsearch有一個功能叫作聚合(aggregations),它容許你在數據上生成複雜的分析統計。它很像SQL中的
GROUP BY
可是功能更強大。$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d ' { "aggs": { "all_interests": { "terms": { "field": "interests" } } } } '查詢結果:
{... "aggregations" : { "all_interests" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "music", "doc_count" : 2 }, { "key" : "forestry", "doc_count" : 1 }, { "key" : "sports", "doc_count" : 1 } ] } } }這些數據並無被預先計算好,它們是實時的從匹配查詢語句的文檔中動態計算生成的。
若是咱們想知道全部姓"Smith"的人最大的共同點(興趣愛好),咱們只須要增長合適的語句既可:
$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d ' { "query": { "match": { "last_name": "smith" } }, "aggs": { "all_interests": { "terms": { "field": "interests" } } } } 'all_interests聚合已經變成只包含和查詢語句相匹配的文檔了:
... "all_interests": { "buckets": [ { "key": "music", "doc_count": 2 }, { "key": "sports", "doc_count": 1 } ] }聚合也容許分級彙總。例如,讓咱們統計每種興趣下職員的平均年齡:
$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d ' { "aggs" : { "all_interests" : { "terms" : { "field" : "interests" }, "aggs" : { "avg_age" : { "avg" : { "field" : "age" } } } } } } '雖然此次返回的聚合結果有些複雜,但仍然很容易理解:
... "all_interests": { "buckets": [ { "key": "music", "doc_count": 2, "avg_age": { "value": 28.5 } }, { "key": "forestry", "doc_count": 1, "avg_age": { "value": 35 } }, { "key": "sports", "doc_count": 1, "avg_age": { "value": 25 } } ] }該聚合結果比以前的聚合結果要更加豐富。咱們依然獲得了興趣以及數量(指具備該興趣的員工人數)的列表,可是如今每一個興趣額外擁有
avg_age
字段來顯示具備該興趣員工的平均年齡。