Elasticsearch學習記錄(入門篇)

Elasticsearch學習記錄(入門篇)

一、 Elasticsearch的請求與結果node

請求結構

curl -X<VERB> '<PROTOCOL>://<HOST>:<PORT>/<PATH>?<QUERY_STRING>' -d '<BODY>'
  • VERB HTTP方法:GET, POST, PUT, HEAD, DELETE
  • PROTOCOL http或者https協議(只有在Elasticsearch前面有https代理的時候可用)
  • HOST Elasticsearch集羣中的任何一個節點的主機名,若是是在本地的節點,那麼就叫localhost
  • PORT Elasticsearch HTTP服務所在的端口,默認爲9200
  • PATH API路徑(例如_count將返回集羣中文檔的數量),PATH能夠包含多個組件,例如_cluster/stats或者_nodes/stats/jvm
  • QUERY_STRING 一些可選的查詢請求參數,例如?pretty參數將使請求返回更加美觀易讀的JSON數據
    BODY 一個JSON格式的請求主體(若是請求須要的話)

PUT建立(索引建立)

$ curl -XPUT 'http://localhost:9200/megacorp/employee/3?pretty' -d ' 
{
    "first_name" :  "Douglas",
    "last_name" :   "Fir",
    "age" :         35,
    "about":        "I like to build cabinets",
    "interests":  [ "forestry" ]
}
’
{
  "_index" : "megacorp",
  "_type" : "employee",
  "_id" : "3",
  "_version" : 1,
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "created" : true
}

GET請求(搜索)

檢索文檔

$ curl -XGET 'http://localhost:9200/megacorp/employee/1?pretty'
{
  "_index" : "megacorp",
  "_type" : "employee",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "first_name" : "John",
    "last_name" : "Smith",
    "age" : 25,
    "about" : "I love to go rock climbing",
    "interests" : [ "sports", "music" ]
  }
}

簡單搜索

使用megacorp索引和employee類型,可是咱們在結尾使用關鍵字_search來取代原來的文檔ID。響應內容的hits數組中包含了咱們全部的三個文檔。默認狀況下搜索會返回前10個結果。數據庫

$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty'
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "megacorp",
      "_type" : "employee",
      "_id" : "2",
      "_score" : 1.0,
      "_source" : {
        "first_name" : "Jane",
        "last_name" : "Smith",
        "age" : 32,
        "about" : "I like to collect rock albums",
        "interests" : [ "music" ]
      }
    }, {
      "_index" : "megacorp",
      "_type" : "employee",
      "_id" : "1",
      "_score" : 1.0,
      "_source" : {
        "first_name" : "John",
        "last_name" : "Smith",
        "age" : 25,
        "about" : "I love to go rock climbing",
        "interests" : [ "sports", "music" ]
      }
    }, {
      "_index" : "megacorp",
      "_type" : "employee",
      "_id" : "3",
      "_score" : 1.0,
      "_source" : {
        "first_name" : "Douglas",
        "last_name" : "Fir",
        "age" : 35,
        "about" : "I like to build cabinets",
        "interests" : [ "forestry" ]
      }
    } ]
  }
}

接下來,讓咱們搜索姓氏中包含「Smith」的員工。咱們將在命令行中使用輕量級的搜索方法。這種方法常被稱做查詢字符串(query string)搜索,由於咱們像傳遞URL參數同樣去傳遞查詢語句:數組

$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?q=last_name:Smith&pretty'
{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.30685282,
    "hits" : [ {
      "_index" : "megacorp",
      "_type" : "employee",
      "_id" : "2",
      "_score" : 0.30685282,
      "_source" : {
        "first_name" : "Jane",
        "last_name" : "Smith",
        "age" : 32,
        "about" : "I like to collect rock albums",
        "interests" : [ "music" ]
      }
    }, {
      "_index" : "megacorp",
      "_type" : "employee",
      "_id" : "1",
      "_score" : 0.30685282,
      "_source" : {
        "first_name" : "John",
        "last_name" : "Smith",
        "age" : 25,
        "about" : "I love to go rock climbing",
        "interests" : [ "sports", "music" ]
      }
    } ]
  }
}

使用DSL語句查詢

查詢字符串搜索便於經過命令行完成特定(ad hoc)的搜索,可是它也有侷限性(參閱簡單搜索章節)。Elasticsearch提供豐富且靈活的查詢語言叫作DSL查詢(Query DSL),它容許你構建更加複雜、強大的查詢。curl

DSL(Domain Specific Language特定領域語言)以JSON請求體的形式出現。咱們能夠這樣表示以前關於「Smith」的查詢:jvm

$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d ' 
{
    "query" : {
        "match" : {
            "last_name" : "Smith"
        }
    }
}
'

更復雜的搜索

咱們讓搜索稍微再變的複雜一些。咱們依舊想要找到姓氏爲「Smith」的員工,可是咱們只想獲得年齡大於30歲的員工。咱們的語句將添加過濾器(filter),它使得咱們高效率的執行一個結構化搜索:elasticsearch

$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
{
    "query" : {
        "filtered" : {
            "filter" : {
                "range" : {
                    "age" : { "gt" : 30 } --<1>
                }
            },
            "query" : {
                "match" : {
                    "last_name" : "smith" --<2>
                }
            }
        }
    }
}
'
  • <1> 這部分查詢屬於區間過濾器(range filter),它用於查找全部年齡大於30歲的數據——gt爲"greater than"的縮寫。
  • <2> 這部分查詢與以前的match語句(query)一致。
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.30685282,
    "hits" : [ {
      "_index" : "megacorp",
      "_type" : "employee",
      "_id" : "2",
      "_score" : 0.30685282,
      "_source" : {
        "first_name" : "Jane",
        "last_name" : "Smith",
        "age" : 32,
        "about" : "I like to collect rock albums",
        "interests" : [ "music" ]
      }
    } ]
  }
}

全文搜索

到目前爲止搜索都很簡單:搜索特定的名字,經過年齡篩選。讓咱們嘗試一種更高級的搜索,全文搜索——一種傳統數據庫很難實現的功能。學習

咱們將會搜索全部喜歡「rock climbing」的員工:ui

$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
{
    "query" : {
        "match" : {
            "about" : "rock climbing"
        }
    }
}
'

你能夠看到咱們使用了以前的match查詢,從about字段中搜索"rock climbing",咱們獲得了兩個匹配文檔:url

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.16273327,
    "hits" : [ {
      "_index" : "megacorp",
      "_type" : "employee",
      "_id" : "1",
      "_score" : 0.16273327,<1>
      "_source" : {
        "first_name" : "John",
        "last_name" : "Smith",
        "age" : 25,
        "about" : "I love to go rock climbing",
        "interests" : [ "sports", "music" ]
      }
    }, {
      "_index" : "megacorp",
      "_type" : "employee",
      "_id" : "2",
      "_score" : 0.016878016,<2>
      "_source" : {
        "first_name" : "Jane",
        "last_name" : "Smith",
        "age" : 32,
        "about" : "I like to collect rock albums",
        "interests" : [ "music" ]
      }
    } ]
  }
}
  • <1><2> 結果相關性評分。

默認狀況下,Elasticsearch根據結果相關性評分來對結果集進行排序,所謂的「結果相關性評分」就是文檔與查詢條件的匹配程度。很顯然,排名第一的John Smithabout字段明確的寫到「rock climbing命令行

可是爲何Jane Smith也會出如今結果裏呢?緣由是「rock」在她的abuot字段中被說起了。由於只有「rock」被說起而「climbing」沒有,因此她的_score要低於John。

短語搜索

目前咱們能夠在字段中搜索單獨的一個詞,這挺好的,可是有時候你想要確切的匹配若干個單詞或者短語(phrases)。例如咱們想要查詢同時包含"rock"和"climbing"(而且是相鄰的)的員工記錄。

要作到這個,咱們只要將match查詢變動爲match_phrase查詢便可:

$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
{
    "query" : {
        "match_phrase" : {
            "about" : "rock climbing"
        }
    }
}
'
{
  "took" : 16,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.23013961,
    "hits" : [ {
      "_index" : "megacorp",
      "_type" : "employee",
      "_id" : "1",
      "_score" : 0.23013961,
      "_source" : {
        "first_name" : "John",
        "last_name" : "Smith",
        "age" : 25,
        "about" : "I love to go rock climbing",
        "interests" : [ "sports", "music" ]
      }
    } ]
  }
}

高亮咱們的搜索

不少應用喜歡從每一個搜索結果中高亮(highlight)匹配到的關鍵字,這樣用戶能夠知道爲何這些文檔和查詢相匹配。在Elasticsearch中高亮片斷是很是容易的。

讓咱們在以前的語句上增長highlight參數:

$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
{
    "query" : {
        "match_phrase" : {
            "about" : "rock climbing"
        }
    },
    "highlight": {
        "fields" : {
            "about" : {}
        }               
    }                   
}        
'

當咱們運行這個語句時,會命中與以前相同的結果,可是在返回結果中會有一個新的部分叫作highlight,這裏包含了來自about字段中的文本,而且用<em></em>來標識匹配到的單詞。

{
  "took" : 33,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.23013961,
    "hits" : [ {
      "_index" : "megacorp",
      "_type" : "employee",
      "_id" : "1",
      "_score" : 0.23013961,
      "_source" : {
        "first_name" : "John",
        "last_name" : "Smith",
        "age" : 25,
        "about" : "I love to go rock climbing",
        "interests" : [ "sports", "music" ]
      },
      "highlight" : {
        "about" : [ "I love to go <em>rock</em> <em>climbing</em>" ]
      }
    } ]
  }
}

聚合

分析

最後,咱們還有一個需求須要完成:容許管理者在職員目錄中進行一些分析。 Elasticsearch有一個功能叫作聚合(aggregations),它容許你在數據上生成複雜的分析統計。它很像SQL中的GROUP BY可是功能更強大。

$  curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
{
  "aggs": {
    "all_interests": {
      "terms": { "field": "interests" }
    }
  }
}
'

查詢結果:

{...
  "aggregations" : {
    "all_interests" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ {
        "key" : "music",
        "doc_count" : 2
      }, {
        "key" : "forestry",
        "doc_count" : 1
      }, {
        "key" : "sports",
        "doc_count" : 1
      } ]
    }
  }
}

這些數據並無被預先計算好,它們是實時的從匹配查詢語句的文檔中動態計算生成的。

若是咱們想知道全部姓"Smith"的人最大的共同點(興趣愛好),咱們只須要增長合適的語句既可:

$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
{
  "query": {
    "match": {
      "last_name": "smith"
    }
  },
  "aggs": {
    "all_interests": {
      "terms": {
        "field": "interests"
      }
    }
  }
}
'

all_interests聚合已經變成只包含和查詢語句相匹配的文檔了:

...
  "all_interests": {
     "buckets": [
        {
           "key": "music",
           "doc_count": 2
        },
        {
           "key": "sports",
           "doc_count": 1
        }
     ]
  }

聚合也容許分級彙總。例如,讓咱們統計每種興趣下職員的平均年齡:

$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
{
    "aggs" : {
        "all_interests" : {
            "terms" : { "field" : "interests" },
            "aggs" : {
                "avg_age" : {
                    "avg" : { "field" : "age" }
                }
            }
        }
    }
}
'

雖然此次返回的聚合結果有些複雜,但仍然很容易理解:

...
  "all_interests": {
     "buckets": [
        {
           "key": "music",
           "doc_count": 2,
           "avg_age": {
              "value": 28.5
           }
        },
        {
           "key": "forestry",
           "doc_count": 1,
           "avg_age": {
              "value": 35
           }
        },
        {
           "key": "sports",
           "doc_count": 1,
           "avg_age": {
              "value": 25
           }
        }
     ]
  }

該聚合結果比以前的聚合結果要更加豐富。咱們依然獲得了興趣以及數量(指具備該興趣的員工人數)的列表,可是如今每一個興趣額外擁有avg_age字段來顯示具備該興趣員工的平均年齡。

相關文章
相關標籤/搜索