Elasticsearch 結構化搜索、keyword、Term查詢

時間 2021-03-17

標籤 html app less elasticsearch 分佈式 ide 網站 code htm 欄目日誌分析简体版

原文原文鏈接

前言

Elasticsearch 中的結構化搜索，即面向數值、日期、時間、布爾等類型數據的搜索，這些數據類型格式精確，一般使用基於詞項的term精確匹配或者prefix前綴匹配。本文還將新版本的「text」，「keyword」進行說明，還有Term查詢。html

結構化搜索

結構化搜索（Structured search）是指對結構化的數據進行搜索。好比日期、時間和數字都是結構化的，它們有精確的格式，咱們能夠對這些格式進行邏輯操做。比較常見的操做包括比較數字或時間的範圍、斷定兩個值的大小、前綴匹配等。app

文本也能夠是結構化的。如彩色筆能夠有離散的顏色集合：紅（red）、綠（green）、藍（blue）。一個博客可能被標記了關鍵詞分佈式（distributed）和搜索（search）。電商網站上的商品都有 UPCs（通用產品碼 Universal Product Codes）或其餘的惟一標識，它們都須要聽從嚴格規定的、結構化的格式。less

在結構化查詢中，咱們獲得的結果只有「是」或「否」兩個值，能夠根據場景須要，決定結構化搜索是否須要打分，但一般咱們是不須要打分的。elasticsearch

精確值查找

讓咱們如下面的例子開始介紹，建立並索引一些表示產品的文檔，文檔裏有字段 price ，productID，show，createdAt，tags （ 價格，產品ID，是否展現，建立時間， 打標信息）分佈式

POST products/_doc/_bulk
{ "index": { "_id": 1 }}
{ "price" : 10, "productID" : "XHDK-A-1293-#fJ3", "show":true, "createdAt":"2021-03-03", "tags":"abc" }
{ "index": { "_id": 2 }}
{ "price" : 20, "productID" : "KDKE-B-9947-#kL5", "show":true, "createdAt":"2021-03-04" }
{ "index": { "_id": 3 }}
{ "price" : 30, "productID" : "JODL-X-1937-#pV7", "show":false, "createdAt":"2021-03-05"}
{ "index": { "_id": 4 }}
{ "price" : 30, "productID" : "QQPX-R-3956-#aD8", "show":true, "createdAt":"2021-03-06"}

數字

如今咱們想要作的是查找具備某個價格的全部產品，假設咱們要獲取價格是20元的商品，咱們能夠使用 term 查詢，以下ide

GET products/_search
{
  "query": {
    "term": {
      "price": 20
    }
  }
}

一般查找一個精確值的時候，咱們不但願對查詢進行評分計算。只但願對文檔進行包括或排除的計算，因此咱們會使用 constant_score 查詢以非評分模式來執行 term 查詢並以1.0做爲統一評分。網站

最終組合的結果是一個 constant_score 查詢，它包含一個 term 查詢：ui

GET products/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "price": 20
        }
      }
    }
  }
}

對於數字，通常還有範圍查詢code

GET products/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "range": {
          "price": {
            "gte": 10,
            "lte": 20
          }
        }
      }
    }
  }
}

range 支持的選項htm

gt: > 大於（greater than）
lt: < 小於（less than）
gte: >= 大於或等於（greater than or equal to）
lte: <= 小於或等於（less than or equal to）

布爾值

GET products/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "show": true
        }
      }
    }
  }
}

日期

搜索必定時間範圍內的文檔

POST products/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "range": {
          "createdAt": {
            "gte": "now-9d"
          }
        }
      }
    }
  }
}

POST products/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "range": {
          "createdAt": {
            "gte": "2021-01-05"
          }
        }
      }
    }
  }
}

日期匹配表達式

y 年
M 月
w 周
d 天
H/h 小時
m 分鐘
s 秒

文本

POST products/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "terms": {
          "productID.keyword": [
            "XHDK-A-1293-#fJ3",
            "KDKE-B-9947-#kL5"
          ]
        }
      }
    }
  }
}

「productID.keyword」中的「keyword」不是關鍵字，而是Elasticsearch在插入文檔的時候，自動爲「productID」生成的子字段，名字是「keyword」。

null 處理

存在用「exists」，不存在用「must_not」搭配「exists」

// 存在「tags」字段
POST products/_search
{
    "query" : {
        "constant_score" : {
            "filter" : {
                "exists": {
                    "field":"tags"
                }
            }
        }
    }
}

// 不存在「tags」字段，老版本用「missing」關鍵字，如今已經廢除了
POST products/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "bool": {
          "must_not": {
            "exists": {
              "field": "tags"
            }
          }
        }
      }
    }
  }
}

注意，新版本不要再使用「missing」關鍵字，如今已經廢除了，用「must_not」作取反。
使用「missing」會報錯，報錯信息以下：

"reason": "no [query] registered for [missing]"

keyword

在2.x版本里面文本使用的是string字段。
5.0以後，把string字段設置爲了過期字段，引入text與keyword字段，這兩個字段均可以存儲字符串使用。

「text」用於全文搜索，「keyword」用於結構化搜索。「keyword」相似Java中的枚舉。在新版本中，若是沒有本身建立mapping，那麼在文本的處理中，會把文本自動映射爲「text」，同時會生成一個子字段「keyword」，類型是「keyword」。

在存儲上，「text」會被分詞器進行分詞，而「keyword」會被原樣保留。好比「Rabit is jumping」，「text」的狀況下可能被存儲爲「rabit」，「jump」，而「keyword」狀況下就會存儲爲「Rabit is jumping」。

Term查詢

在ES中，term查詢，對輸入不作分詞，會將輸入做爲一個總體，在倒排索引中查找精確的詞項，而且使用相關性算分公式爲每一個包含該詞項的文檔進行相關度算分。

好比上面的（"productID": "QQPX-R-3956-#aD8"），會被分詞爲「qqpx」，「r」，「3956」，「ad8」。

「productID.keyword」的類型是keyword，因此即便使用match查詢，最終也會變成Term查詢。

// "productID.keyword": "qqpx-r-3956-#ad8" 沒搜索出數據，其餘都有
GET products/_search
{
  "query": {
    "match": {
      //"productID": "QQPX-R-3956-#aD8"
      //"productID": "qqpx"
      //"productID": "qqpx-r-3956-#ad8"
      //"productID.keyword": "QQPX-R-3956-#aD8"
      "productID.keyword": "qqpx-r-3956-#ad8"
    }
  }
}

// "productID": "qqpx" 與 "productID.keyword": "QQPX-R-3956-#aD8" 能夠搜索出數據，其餘不行
GET products/_search
{
  "query": {
    "term": {
      "productID": "QQPX-R-3956-#aD8"
      //"productID": "qqpx"
      //"productID": "qqpx-r-3956-#ad8"
      //"productID.keyword": "QQPX-R-3956-#aD8"
      //"productID.keyword": "qqpx-r-3956-#ad8"
    }
  }
}