Elasticsearch學習記錄(入門篇)

時間 2019-11-30

原文原文鏈接

Elasticsearch學習記錄(入門篇)

一、 Elasticsearch的請求與結果node

請求結構
curl -X<VERB> '<PROTOCOL>://<HOST>:<PORT>/<PATH>?<QUERY_STRING>' -d '<BODY>'
VERB HTTP方法：GET, POST, PUT, HEAD, DELETE

PROTOCOL http或者https協議（只有在Elasticsearch前面有https代理的時候可用）

HOST Elasticsearch集羣中的任何一個節點的主機名，若是是在本地的節點，那麼就叫localhost

PORT Elasticsearch HTTP服務所在的端口，默認爲9200

PATH API路徑（例如_count將返回集羣中文檔的數量），PATH能夠包含多個組件，例如_cluster/stats或者_nodes/stats/jvm

QUERY_STRING 一些可選的查詢請求參數，例如?pretty參數將使請求返回更加美觀易讀的JSON數據
BODY 一個JSON格式的請求主體（若是請求須要的話）

PUT建立(索引建立)
$ curl -XPUT 'http://localhost:9200/megacorp/employee/3?pretty' -d ' 
{
 "first_name" : "Douglas",
 "last_name" : "Fir",
 "age" : 35,
 "about": "I like to build cabinets",
 "interests": [ "forestry" ]
}
’
{
 "_index" : "megacorp",
 "_type" : "employee",
 "_id" : "3",
 "_version" : 1,
 "_shards" : {
 "total" : 2,
 "successful" : 1,
 "failed" : 0
 },
 "created" : true
}
GET請求(搜索)

檢索文檔
$ curl -XGET 'http://localhost:9200/megacorp/employee/1?pretty'
{
 "_index" : "megacorp",
 "_type" : "employee",
 "_id" : "1",
 "_version" : 1,
 "found" : true,
 "_source" : {
 "first_name" : "John",
 "last_name" : "Smith",
 "age" : 25,
 "about" : "I love to go rock climbing",
 "interests" : [ "sports", "music" ]
 }
}
簡單搜索

使用megacorp索引和employee類型，可是咱們在結尾使用關鍵字_search來取代原來的文檔ID。響應內容的hits數組中包含了咱們全部的三個文檔。默認狀況下搜索會返回前10個結果。數據庫
$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty'
{
 "took" : 2,
 "timed_out" : false,
 "_shards" : {
 "total" : 5,
 "successful" : 5,
 "failed" : 0
 },
 "hits" : {
 "total" : 3,
 "max_score" : 1.0,
 "hits" : [ {
 "_index" : "megacorp",
 "_type" : "employee",
 "_id" : "2",
 "_score" : 1.0,
 "_source" : {
 "first_name" : "Jane",
 "last_name" : "Smith",
 "age" : 32,
 "about" : "I like to collect rock albums",
 "interests" : [ "music" ]
 }
 }, {
 "_index" : "megacorp",
 "_type" : "employee",
 "_id" : "1",
 "_score" : 1.0,
 "_source" : {
 "first_name" : "John",
 "last_name" : "Smith",
 "age" : 25,
 "about" : "I love to go rock climbing",
 "interests" : [ "sports", "music" ]
 }
 }, {
 "_index" : "megacorp",
 "_type" : "employee",
 "_id" : "3",
 "_score" : 1.0,
 "_source" : {
 "first_name" : "Douglas",
 "last_name" : "Fir",
 "age" : 35,
 "about" : "I like to build cabinets",
 "interests" : [ "forestry" ]
 }
 } ]
 }
}
接下來，讓咱們搜索姓氏中包含「Smith」的員工。咱們將在命令行中使用輕量級的搜索方法。這種方法常被稱做查詢字符串(query string)搜索，由於咱們像傳遞URL參數同樣去傳遞查詢語句：數組
$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?q=last_name:Smith&pretty'
{
 "took" : 4,
 "timed_out" : false,
 "_shards" : {
 "total" : 5,
 "successful" : 5,
 "failed" : 0
 },
 "hits" : {
 "total" : 2,
 "max_score" : 0.30685282,
 "hits" : [ {
 "_index" : "megacorp",
 "_type" : "employee",
 "_id" : "2",
 "_score" : 0.30685282,
 "_source" : {
 "first_name" : "Jane",
 "last_name" : "Smith",
 "age" : 32,
 "about" : "I like to collect rock albums",
 "interests" : [ "music" ]
 }
 }, {
 "_index" : "megacorp",
 "_type" : "employee",
 "_id" : "1",
 "_score" : 0.30685282,
 "_source" : {
 "first_name" : "John",
 "last_name" : "Smith",
 "age" : 25,
 "about" : "I love to go rock climbing",
 "interests" : [ "sports", "music" ]
 }
 } ]
 }
}
使用DSL語句查詢

查詢字符串搜索便於經過命令行完成特定(ad hoc)的搜索，可是它也有侷限性（參閱簡單搜索章節）。Elasticsearch提供豐富且靈活的查詢語言叫作DSL查詢(Query DSL),它容許你構建更加複雜、強大的查詢。curl

DSL(Domain Specific Language特定領域語言)以JSON請求體的形式出現。咱們能夠這樣表示以前關於「Smith」的查詢:jvm
$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d ' 
{
 "query" : {
 "match" : {
 "last_name" : "Smith"
 }
 }
}
'
更復雜的搜索

咱們讓搜索稍微再變的複雜一些。咱們依舊想要找到姓氏爲「Smith」的員工，可是咱們只想獲得年齡大於30歲的員工。咱們的語句將添加過濾器(filter),它使得咱們高效率的執行一個結構化搜索：elasticsearch
$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
{
 "query" : {
 "filtered" : {
 "filter" : {
 "range" : {
 "age" : { "gt" : 30 } --<1>
 }
 },
 "query" : {
 "match" : {
 "last_name" : "smith" --<2>
 }
 }
 }
 }
}
'
<1> 這部分查詢屬於區間過濾器(range filter),它用於查找全部年齡大於30歲的數據——gt爲"greater than"的縮寫。

<2> 這部分查詢與以前的match語句(query)一致。
{
 "took" : 2,
 "timed_out" : false,
 "_shards" : {
 "total" : 5,
 "successful" : 5,
 "failed" : 0
 },
 "hits" : {
 "total" : 1,
 "max_score" : 0.30685282,
 "hits" : [ {
 "_index" : "megacorp",
 "_type" : "employee",
 "_id" : "2",
 "_score" : 0.30685282,
 "_source" : {
 "first_name" : "Jane",
 "last_name" : "Smith",
 "age" : 32,
 "about" : "I like to collect rock albums",
 "interests" : [ "music" ]
 }
 } ]
 }
}
全文搜索

到目前爲止搜索都很簡單：搜索特定的名字，經過年齡篩選。讓咱們嘗試一種更高級的搜索，全文搜索——一種傳統數據庫很難實現的功能。學習

咱們將會搜索全部喜歡「rock climbing」的員工：ui
$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
{
 "query" : {
 "match" : {
 "about" : "rock climbing"
 }
 }
}
'
你能夠看到咱們使用了以前的match查詢，從about字段中搜索"rock climbing"，咱們獲得了兩個匹配文檔：url
{
 "took" : 3,
 "timed_out" : false,
 "_shards" : {
 "total" : 5,
 "successful" : 5,
 "failed" : 0
 },
 "hits" : {
 "total" : 2,
 "max_score" : 0.16273327,
 "hits" : [ {
 "_index" : "megacorp",
 "_type" : "employee",
 "_id" : "1",
 "_score" : 0.16273327,<1>
 "_source" : {
 "first_name" : "John",
 "last_name" : "Smith",
 "age" : 25,
 "about" : "I love to go rock climbing",
 "interests" : [ "sports", "music" ]
 }
 }, {
 "_index" : "megacorp",
 "_type" : "employee",
 "_id" : "2",
 "_score" : 0.016878016,<2>
 "_source" : {
 "first_name" : "Jane",
 "last_name" : "Smith",
 "age" : 32,
 "about" : "I like to collect rock albums",
 "interests" : [ "music" ]
 }
 } ]
 }
}
<1><2> 結果相關性評分。

默認狀況下，Elasticsearch根據結果相關性評分來對結果集進行排序，所謂的「結果相關性評分」就是文檔與查詢條件的匹配程度。很顯然，排名第一的John Smith的about字段明確的寫到「rock climbing」命令行

可是爲何Jane Smith也會出如今結果裏呢？緣由是「rock」在她的abuot字段中被說起了。由於只有「rock」被說起而「climbing」沒有，因此她的_score要低於John。

短語搜索

目前咱們能夠在字段中搜索單獨的一個詞，這挺好的，可是有時候你想要確切的匹配若干個單詞或者短語(phrases)。例如咱們想要查詢同時包含"rock"和"climbing"（而且是相鄰的）的員工記錄。

要作到這個，咱們只要將match查詢變動爲match_phrase查詢便可:
$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
{
 "query" : {
 "match_phrase" : {
 "about" : "rock climbing"
 }
 }
}
'
{
 "took" : 16,
 "timed_out" : false,
 "_shards" : {
 "total" : 5,
 "successful" : 5,
 "failed" : 0
 },
 "hits" : {
 "total" : 1,
 "max_score" : 0.23013961,
 "hits" : [ {
 "_index" : "megacorp",
 "_type" : "employee",
 "_id" : "1",
 "_score" : 0.23013961,
 "_source" : {
 "first_name" : "John",
 "last_name" : "Smith",
 "age" : 25,
 "about" : "I love to go rock climbing",
 "interests" : [ "sports", "music" ]
 }
 } ]
 }
}
高亮咱們的搜索

不少應用喜歡從每一個搜索結果中高亮(highlight)匹配到的關鍵字，這樣用戶能夠知道爲何這些文檔和查詢相匹配。在Elasticsearch中高亮片斷是很是容易的。

讓咱們在以前的語句上增長highlight參數：
$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
{
 "query" : {
 "match_phrase" : {
 "about" : "rock climbing"
 }
 },
 "highlight": {
 "fields" : {
 "about" : {}
 } 
 } 
} 
'
當咱們運行這個語句時，會命中與以前相同的結果，可是在返回結果中會有一個新的部分叫作highlight，這裏包含了來自about字段中的文本，而且用來標識匹配到的單詞。
{
 "took" : 33,
 "timed_out" : false,
 "_shards" : {
 "total" : 5,
 "successful" : 5,
 "failed" : 0
 },
 "hits" : {
 "total" : 1,
 "max_score" : 0.23013961,
 "hits" : [ {
 "_index" : "megacorp",
 "_type" : "employee",
 "_id" : "1",
 "_score" : 0.23013961,
 "_source" : {
 "first_name" : "John",
 "last_name" : "Smith",
 "age" : 25,
 "about" : "I love to go rock climbing",
 "interests" : [ "sports", "music" ]
 },
 "highlight" : {
 "about" : [ "I love to go rock climbing" ]
 }
 } ]
 }
}
聚合

分析

最後，咱們還有一個需求須要完成：容許管理者在職員目錄中進行一些分析。 Elasticsearch有一個功能叫作聚合(aggregations)，它容許你在數據上生成複雜的分析統計。它很像SQL中的GROUP BY可是功能更強大。
$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
{
 "aggs": {
 "all_interests": {
 "terms": { "field": "interests" }
 }
 }
}
'
查詢結果：
{...
 "aggregations" : {
 "all_interests" : {
 "doc_count_error_upper_bound" : 0,
 "sum_other_doc_count" : 0,
 "buckets" : [ {
 "key" : "music",
 "doc_count" : 2
 }, {
 "key" : "forestry",
 "doc_count" : 1
 }, {
 "key" : "sports",
 "doc_count" : 1
 } ]
 }
 }
}
這些數據並無被預先計算好，它們是實時的從匹配查詢語句的文檔中動態計算生成的。

若是咱們想知道全部姓"Smith"的人最大的共同點（興趣愛好），咱們只須要增長合適的語句既可：
$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
{
 "query": {
 "match": {
 "last_name": "smith"
 }
 },
 "aggs": {
 "all_interests": {
 "terms": {
 "field": "interests"
 }
 }
 }
}
'
all_interests聚合已經變成只包含和查詢語句相匹配的文檔了：
...
 "all_interests": {
 "buckets": [
 {
 "key": "music",
 "doc_count": 2
 },
 {
 "key": "sports",
 "doc_count": 1
 }
 ]
 }
聚合也容許分級彙總。例如，讓咱們統計每種興趣下職員的平均年齡：
$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
{
 "aggs" : {
 "all_interests" : {
 "terms" : { "field" : "interests" },
 "aggs" : {
 "avg_age" : {
 "avg" : { "field" : "age" }
 }
 }
 }
 }
}
'
雖然此次返回的聚合結果有些複雜，但仍然很容易理解：
...
 "all_interests": {
 "buckets": [
 {
 "key": "music",
 "doc_count": 2,
 "avg_age": {
 "value": 28.5
 }
 },
 {
 "key": "forestry",
 "doc_count": 1,
 "avg_age": {
 "value": 35
 }
 },
 {
 "key": "sports",
 "doc_count": 1,
 "avg_age": {
 "value": 25
 }
 }
 ]
 }
該聚合結果比以前的聚合結果要更加豐富。咱們依然獲得了興趣以及數量（指具備該興趣的員工人數）的列表，可是如今每一個興趣額外擁有avg_age字段來顯示具備該興趣員工的平均年齡。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。