ElasticSearch 學習筆記 - 8. 查詢

時間 2019-12-07

原文原文鏈接

一、概念

映射（Mapping）
描述數據在每一個字段內如何存儲
分析（Analysis）
全文是如何處理使之能夠被搜索的
領域特定查詢語言（Query DSL）
Elasticsearch 中強大靈活的查詢語言

二、空搜索

GET /_search

{
   "hits" : {
      "total" :       14,
      "hits" : [
        {
          "_index":   "us",
          "_type":    "tweet",
          "_id":      "7",
          "_score":   1,
          "_source": {
             "date":    "2014-09-17",
             "name":    "John Smith",
             "tweet":   "The Query DSL is really powerful and flexible",
             "user_id": 2
          }
       },
        ... 9 RESULTS REMOVED ...
      ],
      "max_score" :   1
   },
   "took" :           4,
   "_shards" : {
      "failed" :      0,
      "successful" :  10,
      "total" :       10
   },
   "timed_out" :      false
}

hits

返回結果中最重要的部分是 hits ，它包含 total 字段來表示匹配到的文檔總數，而且一個 hits 數組包含所查詢結果的前十個文檔。web

max_score 值是與查詢所匹配文檔的 _score 的最大值。sql

took

took 值告訴咱們執行整個搜索請求耗費了多少毫秒。centos

shards

_shards 部分告訴咱們在查詢中參與分片的總數，以及這些分片成功了多少個失敗了多少個。正常狀況下咱們不但願分片失敗，可是分片失敗是可能發生的。若是咱們遭遇到一種災難級別的故障，在這個故障中丟失了相同分片的原始數據和副本，那麼對這個分片將沒有可用副原本對搜索請求做出響應。倘若這樣，Elasticsearch 將報告這個分片是失敗的，可是會繼續返回剩餘分片的結果。數組

三、多索引，多類型

/_search

在全部的索引中搜索全部的類型app

/gb/_search

在 gb 索引中搜索全部的類型nosql

/gb,us/_search

在 gb 和 us 索引中搜索全部的文檔elasticsearch

/g,u/_search

在任何以 g 或者 u 開頭的索引中搜索全部的類型flex

/gb/user/_search

在 gb 索引中搜索 user 類型spa

/gb,us/user,tweet/_search

在 gb 和 us 索引中搜索 user 和 tweet 類型code

/_all/user,tweet/_search

在全部的索引中搜索 user 和 tweet 類型

四、分頁

和 SQL 使用 LIMIT 關鍵字返回單個 page 結果的方法相同
Elasticsearch 接受 from 和 size 參數：

size

顯示應該返回的結果數量，默認是 10

from

顯示應該跳過的初始結果數量，默認是 0

GET /_search?size=5&from=5

五、請求體查詢

空查詢

GET /_search
{} 

GET /index_2014*/type1,type2/_search
{}

GET /_search
{
  "from": 30,
  "size": 10
}

查詢表達式

GET /_search
{
    "query": YOUR_QUERY_HERE
}

舉個例子，你可使用 match 查詢語句來查詢 tweet 字段中包含 elasticsearch 的 tweet：

GET /_search
{
    "query": {
        "match": {
            "tweet": "elasticsearch"
        }
    }
}

合併查詢

{
    "bool": {
        "must": { "match":   { "email": "business opportunity" }},
        "should": [
            { "match":       { "starred": true }},
            { "bool": {
                "must":      { "match": { "folder": "inbox" }},
                "must_not":  { "match": { "spam": true }}
            }}
        ],
        "minimum_should_match": 1
    }
}

最重要的查詢

match_all查詢

match_all 查詢簡單的 匹配全部文檔。在沒有指定查詢方式時，它是默認的查詢：

{ "match_all": {}}

match 查詢

高級別全文檢索一般用於在全文本字段（如電子郵件正文）上運行全文檢索。
他們瞭解如何分析被查詢的字段，並在執行以前將每一個字段的分析器（或search_analyzer）應用於查詢字符串。

就是說查詢以前會對查詢的字符串先作分詞處理

{ "match": { "tweet": "About Search" }}


match 的operator 操做。必須同時知足 centos 、升、級

GET website/_search
{
  "query": {
    "match": {
        "title":{
          "query":"centos升級",
          "operator":"and"
        }
    }
  }
}

multi_match 查詢

multi_match 查詢能夠在多個字段上執行相同的 match 查詢：

{
    "multi_match": {
        "query":    "full text search",
        "fields":   [ "title", "body" ]
    }
}

match_phrase查詢（短語查詢）

match_phrase查詢會將查詢內容分詞，分詞器能夠自定義，文檔中同時知足如下兩個條件纔會被檢索到：

分詞後全部詞項都要出如今該字段中
字段中的詞項順序要一致

（1）、建立索引插入數據

PUT test

PUT test/hello/1
{ "content":"World Hello"}

PUT test/hello/2
{ "content":"Hello World"}

PUT test/hello/3
{ "content":"I just said hello world"}

（2）、使用match_phrase查詢」hello world」

GET test/_search
{
  "query": {
    "match_phrase": {
      "content": "hello world"
    }
  }
}

上面後兩個文檔匹配，被檢索出來；第1個文檔的詞序與被查詢內容不一致，因此不匹配。
{
  "took": 21,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "test",
        "_type": "hello",
        "_id": "2",
        "_score": 0.5753642,
        "_source": {
          "content": "Hello World"
        }
      },
      {
        "_index": "test",
        "_type": "hello",
        "_id": "3",
        "_score": 0.5753642,
        "_source": {
          "content": "I just said hello world"
        }
      }
    ]
  }
}

range 查詢

range 查詢找出那些落在指定區間內的數字或者時間：

gt 大於
gte 大於等於
lt 小於
lte 小於等於

{
    "range": {
        "age": {
            "gte":  20,
            "lt":   30
        }
    }
}

term 查詢

term 查詢被用於精確值 匹配，這些精確值多是數字、時間、布爾或者那些 not_analyzed 的字符串：
term 查詢對於輸入的文本不分析，因此它將給定的值進行精確查詢。

{ "term": { "age":    26           }}
{ "term": { "date":   "2014-09-01" }}
{ "term": { "public": true         }}
{ "term": { "tag":    "full_text"  }}

terms 查詢

terms 查詢和 term 查詢同樣，但它容許你指定多值進行匹配。若是這個字段包含了指定值中的任何一個值，那麼這個文檔知足條件：

{ "terms": { "tag": [ "search", "full_text", "nosql" ] }}

exists 查詢和 missing 查詢

exists 查詢和 missing 查詢被用於查找那些指定字段中有值 (exists) 或無值 (missing) 的文檔。
這與SQL中的 IS_NULL (missing) 和 NOT IS_NULL (exists) 在本質上具備共性：

{
    "exists":   {
        "field":    "title"
    }
}

組合查詢

bool 查詢來實現你的需求。這種查詢將多查詢組合在一塊兒，接收一下的參數

must
文檔 必須匹配這些條件才能被包含進來。
must_not
文檔 必須不匹配這些條件才能被包含進來。
should
若是知足這些語句中的任意語句，將增長 _score ，不然，無任何影響。它們主要用於修正每一個文檔的相關性得分。
filter
必須匹配，但它以不評分、過濾模式來進行。這些語句對評分沒有貢獻，只是根據過濾標準來排除或包含文檔。

下面的查詢用於查找 title 字段匹配 how to make millions
而且不被標識爲 spam 的文檔。
那些被標識爲 starred 或在2014以後的文檔，將比另外那些文檔擁有更高的排名。
若是二者都知足，那麼它排名將更高：

{
    "bool": {
        "must":     { "match": { "title": "how to make millions" }},
        "must_not": { "match": { "tag":   "spam" }},
        "should": [
            { "match": { "tag": "starred" }},
            { "range": { "date": { "gte": "2014-01-01" }}}
        ]
    }
}

增長filter查詢

若是咱們不想由於文檔的時間而影響得分，能夠用 filter 語句來重寫前面的例子：

{
    "bool": {
        "must":     { "match": { "title": "how to make millions" }},
        "must_not": { "match": { "tag":   "spam" }},
        "should": [
            { "match": { "tag": "starred" }}
        ],
        "filter": {
          "range": { "date": { "gte": "2014-01-01" }} 
        }
    }
}

驗證查詢

GET /gb/tweet/_validate/query
{
   "query": {
      "tweet" : {
         "match" : "really powerful"
      }
   }
}

{
  "valid" :         false,
  "_shards" : {
    "total" :       1,
    "successful" :  1,
    "failed" :      0
  }
}

理解查詢語句

GET /cars/transactions/_validate/query?explain
{
  "query": {
    "match": {
      "make": "toyota"
    }
  }  
}

{
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "valid": true,
  "explanations": [
    {
      "index": "cars",
      "valid": true,
      "explanation": "+make:toyota #*:*"
    }
  ]
}