Elasticsearch DSL 經常使用語法介紹

CentOS 7.3 x64
JDK 版本：1.8（最低要求），主推：JDK 1.8.0_121
Elasticsearch 版本：5.2.0
相關軟件包百度雲下載地址（密碼：0yzd）：http://pan.baidu.com/s/1qXQXZRm
注意注意： Elasticsearch、Kibana 安裝過程請移步到我 Github 上的這套 Linux 教程：https://github.com/judasn/Linux-Tutorial/blob/master/ELK-Install-And-Settings.md
Elasticsearch 和 Kibana 都要安裝。後面的教程都是在 Kibana 的 Dev Tools 工具上執行的命令。

DSL 介紹

這個纔是實際最經常使用的方式，能夠構建複雜的查詢條件。
不用一開始就想着怎樣用 Java Client 端去調用 Elasticsearch 接口。DSL 會了，Client 的也只是用法問題而已。

DSL 語句的校驗以及 score 計算原理

對於複雜的查詢，最好都先校驗下，看有沒有報錯。

GET /product_index/product/_validate/query?explain
{
  "query": { "match": { "product_name": "toothbrush" } } }

DSL 簡單用法

查詢全部的商品：

GET /product_index/product/_search
{
  "query": { "match_all": {} } }

查詢商品名稱包含 toothbrush 的商品，同時按照價格降序排序：

GET /product_index/product/_search
{
  "query": { "match": { "product_name": "toothbrush" } }, "sort": [ { "price": "desc" } ] }

分頁查詢商品：

GET /product_index/product/_search
{
  "query": { "match_all": {} }, "from": 0, ## 從第幾個商品開始查，最開始是 0 "size": 1 ## 要查幾個結果 }

指定查詢結果字段（field）

GET /product_index/product/_search
{
  "query": { "match_all": {} }, "_source": [ "product_name", "price" ] }

相關符號標識，官網：https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html

符號標識	表明含義
gte	大於或等於
gt	大於
lte	小於或等於
lt	小於

搜索商品名稱包含 toothbrush，並且售價大於 400 元，小於 700 的商品

GET /product_index/product/_search
{
  "query": { "bool": { "must": { "match": { "product_name": "toothbrush" } }, "filter": { "range": { "price": { "gt": 400, "lt": 700 } } } } } }

full-text search 全文檢索，倒排索引
索引中只要有任意一個匹配拆分後詞就能夠出如今結果中，只是匹配度越高的排越前面
好比查詢：PHILIPS toothbrush，會被拆分紅兩個單詞：PHILIPS 和 toothbrush。只要索引中 product_name 中只要含有任意對應單詞，都會在搜索結果中，只是若是有數據同時含有這兩個單詞，則排序在前面。

GET /product_index/product/_search
{
  "query": { "match": { "product_name": "PHILIPS toothbrush" } } }

phrase search 短語搜索
索引中必須同時匹配拆分後詞就能夠出如今結果中
好比查詢：PHILIPS toothbrush，會被拆分紅兩個單詞：PHILIPS 和 toothbrush。索引中必須有同時有這兩個單詞的纔會在結果中。

GET /product_index/product/_search
{
  "query": { "match_phrase": { "product_name": "PHILIPS toothbrush" } } }

Highlight Search 高亮搜索
給匹配拆分後的查詢詞增長高亮的 html 標籤，好比這樣的結果："<em>PHILIPS</em> <em>toothbrush</em> HX6730/02"

GET /product_index/product/_search
{
  "query": { "match": { "product_name": "PHILIPS toothbrush" } }, "highlight": { "fields": { "product_name": {} } } }

range 用法，查詢數值、時間區間：

GET /product_index/product/_search
{
  "query": { "range": { "price": { "gte": 30.00 } } } }

match 用法（與 term 進行對比）：
查詢的字段內容是進行分詞處理的，只要分詞的單詞結果中，在數據中有知足任意的分詞結果都會被查詢出來

GET /product_index/product/_search
{
  "query": { "match": { "product_name": "PHILIPS toothbrush" } } }

match 還有一種狀況，就是必須知足分詞結果中全部的詞，而不是像上面，任意一個就能夠的。（這個常見，因此很重要）
看下面的 JSON 其實你也能夠猜出來，其實上面的 JSON 和下面的 JSON 本質是：operator 的差異，上面是 or，下面是 and 關係。

GET /product_index/product/_search
{
  "query": { "match": { "product_name": { "query": "PHILIPS toothbrush", "operator": "and" } } } }

match 還還有一種狀況，就是必須知足分詞結果中百分比的詞，好比搜索詞被分紅這樣子：java 程序員書推薦，這裏就有 4 個詞，假如要求 50% 命中其中兩個詞就返回，咱們能夠這樣：
固然，這種需求也能夠用 must、must_not、should 匹配同一個字段進行組合來查詢

GET /product_index/product/_search
{
  "query": { "match": { "product_name": { "query": "java 程序員 書 推薦", "minimum_should_match": "50%" } } } }

multi_match 用法：
查詢 product_name 和 product_desc 字段中，只要有：toothbrush 關鍵字的就查詢出來。

GET /product_index/product/_search
{
  "query": { "multi_match": { "query": "toothbrush", "fields": [ "product_name", "product_desc" ] } } }

multi_match 跨多個 field 查詢，表示查詢分詞必須出如今相同字段中。

GET /product_index/product/_search
{
  "query": { "multi_match": { "query": "PHILIPS toothbrush", "type": "cross_fields", "operator": "and", "fields": [ "product_name", "product_desc" ] } } }

match_phrase 用法（短語搜索）（與 match 進行對比）：
對這個查詢詞不進行分詞，必須徹底匹配查詢詞才能夠做爲結果顯示。

GET /product_index/product/_search
{
  "query": { "match_phrase": { "product_name": "PHILIPS toothbrush" } } }

match_phrase + slop（與 match_phrase 進行對比）：
在說 slop 的用法以前，須要先說明原數據是：PHILIPS toothbrush HX6730/02，被分詞後至少有：PHILIPS，toothbrush，HX6730 三個 term。
match_phrase 的用法咱們上面說了，按理說查詢的詞必須徹底匹配才能查詢到，PHILIPS HX6730 很明顯是不徹底匹配的。
可是有時候咱們就是要這種不徹底匹配，只要求他們儘量靠譜，中間有幾個單詞是沒啥問題的，那就能夠用到 slop。slop = 2 表示中間若是間隔 2 個單詞之內也算是匹配的結果（）。
其實也不能稱做間隔，應該說是移位，查詢的關鍵字分詞後移動多少位能夠跟 doc 內容匹配，移動的次數就是 slop。因此 HX6730 PHILIPS 其實也是能夠匹配到 doc 的，只是 slop = 5 才行。

GET /product_index/product/_search
{
  "query": { "match_phrase": { "product_name" : { "query" : "PHILIPS HX6730", "slop" : 1 } } } }

match + match_phrase + slop 組合查詢，使查詢結果更加精準和結果更多
可是 match_phrase 性能沒有 match 好，因此通常須要先用 match 第一步進行過濾，而後在用 match_phrase 進行進一步匹配，而且從新打分，這裏又用到了：rescore，window_size 表示對前 10 個進行從新打分
下面第一個是未從新打分的，第二個是從新打分的

GET /product_index/product/_search
{
  "query": { "bool": { "must": { "match": { "product_name": { "query": "PHILIPS HX6730" } } }, "should": { "match_phrase": { "product_name": { "query": "PHILIPS HX6730", "slop": 10 } } } } } } GET /product_index/product/_search { "query": { "match": { "product_name": "PHILIPS HX6730" } }, "rescore": { "window_size": 10, "query": { "rescore_query": { "match_phrase": { "product_name": { "query": "PHILIPS HX6730", "slop": 10 } } } } } }

match_phrase_prefix 用法（不經常使用），通常用於相似 Google 搜索框，關鍵字輸入推薦
max_expansions 用來限定最多匹配多少個 term，優化性能
可是整體來講性能仍是不好，由於仍是會掃描整個倒排索引。推薦用 edge_ngram 作該功能

GET /product_index/product/_search
{
  "query": { "match_phrase_prefix": { "product_name": "PHILIPS HX", "slop": 5, "max_expansions": 20 } } }

term 用法（與 match 進行對比）（term 通常用在不分詞字段上的，由於它是徹底匹配查詢，若是要查詢的字段是分詞字段就會被拆分紅各類分詞結果，和徹底查詢的內容就對應不上了。）：
因此本身設置 mapping 的時候有些不分詞的時候就最好設置不分詞。
其實 Elasticsearch 5.X 以後給 text 類型的分詞字段，又默認新增了一個子字段 keyword，這個字段的類型就是 keyword，是不分詞的，默認保留 256 個字符。假設 product_name 是分詞字段，那有一個 product_name.keyword 是不分詞的字段，也能夠用這個子字段來作徹底匹配查詢。

GET /product_index/product/_search
{
  "query": { "term": { "product_name": "PHILIPS toothbrush" } } } GET /product_index/product/_search { "query": { "constant_score": { "filter":{ "term": { "product_name": "PHILIPS toothbrush" } } } } }

terms 用法，相似於數據庫的 in

GET /product_index/product/_search
{
  "query": { "constant_score": { "filter": { "terms": { "product_name": [ "toothbrush", "shell" ] } } } } }

query 和 filter 差別

只用 query：

GET /product_index/_search
{
  "query": { "bool": { "must": [ { "terms": { "product_name": [ "PHILIPS", "toothbrush" ] } }, { "range": { "price": { "gt": 12.00 } } } ] } } }

只用 filter：
下面語句使用了 constant_score 查詢，它能夠包含查詢或過濾，爲任意一個匹配的文檔指定評分 1 ，忽略 TF/IDF 信息，不需再計算評分。
也還能夠指定分數：https://www.elastic.co/guide/cn/elasticsearch/guide/current/ignoring-tfidf.html

GET /product_index/product/_search
{
  "query": { "constant_score": { "filter": { "range": { "price": { "gte": 30.00 } } } } } }

query 和 filter 一塊兒使用的話，filter 會先執行，看本文下面的：多搜索條件組合查詢
官網文檔：https://www.elastic.co/guide/en/elasticsearch/guide/current/_queries_and_filters.html
從搜索結果上看：
- filter，只查詢出搜索條件的數據，不計算相關度分數
- query，查詢出搜索條件的數據，並計算相關度分數，按照分數進行倒序排序
從性能上看：
- filter（性能更好，無排序），無需計算相關度分數，也就無需排序，內置的自動緩存最常使用查詢結果的數據
  - 緩存的東西不是文檔內容，而是 bitset，具體看：https://www.elastic.co/guide/en/elasticsearch/guide/2.x/_finding_exact_values.html#_internal_filter_operation
- query（性能較差，有排序），要計算相關度分數，按照分數進行倒序排序，沒有緩存結果的功能
- filter 和 query 一塊兒使用能夠兼顧二者的特性，因此看你業務需求。324

多搜索條件組合查詢（最經常使用）

bool 下包括：must（必須匹配，相似於數據庫的 =），must_not（必須不匹配，相似於數據庫的 !=），should（沒有強制匹配，相似於數據庫的 or），filter（過濾）

GET /product_index/product/_search
{
  "query": { "bool": { "must": [ { "match": { "product_name": "PHILIPS toothbrush" } } ], "should": [ { "match": { "product_desc": "刷頭" } } ], "must_not": [ { "match": { "product_name": "HX6730" } } ], "filter": { "range": { "price": { "gte": 33.00 } } } } } } GET /product_index/product/_search { "query": { "bool": { "should": [ { "term": { "product_name": "飛利浦" } }, { "bool": { "must": [ { "term": { "product_desc": "刷頭" }, "term": { "price": 30 } } ] } } ] } } }

should 有一個特殊性，若是組合查詢中沒有 must 條件，那麼 should 中必須至少匹配一個。咱們也還能夠經過 minimum_should_match 來限制它匹配更多個。

GET /product_index/product/_search
{
  "query": { "bool": { "should": [ { "match": { "product_name": "java" } }, { "match": { "product_name": "程序員" } }, { "match": { "product_name": "書" } }, { "match": { "product_name": "推薦" } } ], "minimum_should_match": 3 } } }

下面還用到自定義排序。
排序最好別用到字符串字段上。由於字符串字段會進行分詞，Elasticsearch 默認是拿分詞後的某個詞去進行排序，排序結果每每跟咱們想象的不同。解決這個辦法是在設置 mapping 的時候，多個這個字段設置一個 fields raw，讓這個不進行分詞，而後查詢排序的時候使用這個 raw，具體看這裏：https://www.elastic.co/guide/cn/elasticsearch/guide/current/multi-fields.html

GET /product_index/product/_search
{
  "query": { "bool": { "must": [ { "match": { "product_name": "PHILIPS toothbrush" } } ], "should": [ { "match": { "product_desc": "刷頭" } } ], "filter": { "bool": { "must": [ { "range": { "price": { "gte": 33.00 } } }, { "range": { "price": { "lte": 555.55 } } } ], "must_not": [ { "term": { "product_name": "HX6730" } } ] } } } }, "sort": [ { "price": { "order": "desc" } } ] }

boost 用法（默認是 1）。在搜索精準度的控制上，還有一種需求，好比搜索：PHILIPS toothbrush，要比：Braun toothbrush 更加優先，咱們能夠這樣：

GET /product_index/product/_search
{
  "query": { "bool": { "must": [ { "match": { "product_name": "toothbrush" } } ], "should": [ { "match": { "product_name": { "query": "PHILIPS", "boost": 4 } } }, { "match": { "product_name": { "query": "Braun", "boost": 3 } } } ] } } }

dis_max 用法，也稱做：best fields 策略。
因爲查詢關鍵字是會被分詞的，默認 query bool 查詢多個字段的語法時候，每一個字段匹配到一個或多個的時候分數比：一個字段匹配到查詢分詞的全部結果的分數來的大。可是對於咱們來說這樣的不夠精準的。因此咱們但願查詢字段中，匹配的關鍵字越多排序越靠前，而不是每一個字段查詢了一個分詞就排前，咱們可使用 dis_max。
可是使用 dis_max，通常還不夠，建議再加上 tie_breaker。
tie_breaker 是一個小數值，在 0~1 之間用來將其餘查詢結果分數，乘以 tie_breaker 的值，而後再綜合與 dis_max 最高分數的的分數一塊兒進行計算。除了取 dis_max 的最高分之外，還會考慮其餘的查詢結果的分數。
在 dis_max 基礎上，爲了增長精準，咱們還能夠加上：boost、minimum_should_match 等相關參數。其中 minimum_should_match 比較經常使用，由於查詢字段的分詞中若是隻有一個分詞查詢上了這種結果基本是沒啥用的。
官網資料：https://www.elastic.co/guide/en/elasticsearch/guide/current/_best_fields.html

GET /product_index/product/_search
{
  "query": { "dis_max": { "queries": [ { "match": { "product_name": "PHILIPS toothbrush" } }, { "match": { "product_desc": "iphone shell" } } ], "tie_breaker": 0.2 } } } GET /product_index/product/_search { "query": { "dis_max": { "queries": [ { "match": { "product_name": { "query": "PHILIPS toothbrush", "minimum_should_match": "50%", "boost": 3 } } }, { "match": { "product_desc": { "query": "iphone shell", "minimum_should_match": "50%"， "boost": 2 } } } ], "tie_breaker": 0.3 } } }

prefix 前綴搜索（性能較差，掃描全部倒排索引）
好比有一個不分詞字段 product_name，分別有兩個 doc 是：iphone-6，iphone-7。咱們搜索 iphone 這個前綴關鍵字就能夠搜索到結果

GET /product_index/product/_search
{
  "query": { "prefix": { "product_name": { "value": "iphone" } } } }

通配符搜索（性能較差，掃描全部倒排索引）

GET /product_index/product/_search
{
  "query": { "wildcard": { "product_name": { "value": "ipho*" } } } }

正則搜索（性能較差，掃描全部倒排索引）

GET /product_index/product/_search
{
  "query": { "regexp": { "product_name": "iphone[0-9].+" } } }

fuzzy 糾錯查詢
參數 fuzziness 默認是 2，表示最多能夠糾錯兩次，可是這個值不能很大，否則沒效果。通常 AUTO 是自動糾錯。
下面的關鍵字漏了一個字母 o。

GET /product_index/product/_search
{
  "query": { "match": { "product_name": { "query": "PHILIPS tothbrush", "fuzziness": "AUTO", "operator": "and" } } } }

Elasticsearch DSL 經常使用語法介紹

課程環境

DSL 介紹

DSL 語句的校驗以及 score 計算原理

DSL 簡單用法

query 和 filter 差別

多搜索條件組合查詢（最經常使用）