在Elasticsearch全文檢索中,咱們用的比較多的就是Multi Match Query,其支持對多個字段進行匹配。Elasticsearch支持5種類型的Multi Match,咱們一塊兒來深刻學習下它們的區別。html
直接從官網的文檔上摘抄一段來:json
這裏咱們只考慮前面三種,後兩種能夠另外單獨研究,就先忽略了。app
PUT /gino_product { "mappings": { "product": { "properties": { "productName": { "type": "string", "analyzer": "fulltext_analyzer", "copy_to": [ "bigSearchField" ] }, "brandName": { "type": "string", "analyzer": "fulltext_analyzer", "copy_to": [ "bigSearchField" ], "fields": { "brandName_pinyin": { "type": "string", "analyzer": "pinyin_analyzer", "search_analyzer": "standard" }, "brandName_keyword": { "type": "string", "analyzer": "keyword", "search_analyzer": "standard" } } }, "sortName": { "type": "string", "analyzer": "fulltext_analyzer", "copy_to": [ "bigSearchField" ], "fields": { "sortName_pinyin": { "type": "string", "analyzer": "pinyin_analyzer", "search_analyzer": "standard" } } }, "productKeyword": { "type": "string", "analyzer": "fulltext_analyzer", "copy_to": [ "bigSearchField" ] }, "bigSearchField": { "type": "string", "analyzer": "fulltext_analyzer" } } } }, "settings": { "index": { "number_of_shards": 1, "number_of_replicas": 0 }, "analysis": { "tokenizer": { "simple_pinyin": { "type": "pinyin", "first_letter": "none" } }, "analyzer": { "fulltext_analyzer": { "type": "ik", "use_smart": true }, "pinyin_analyzer": { "type": "custom", "tokenizer": "simple_pinyin", "filter": [ "word_delimiter", "lowercase" ] } } } } }
POST /gino_product/product/1 { "productName": "耐克女生運動輕跑鞋", "brandName": "耐克", "sortName": "鞋子", "productKeyword": "耐克,潮流,運動,輕跑鞋" } POST /gino_product/product/2 { "productName": "耐克女生休閒運動服", "brandName": "耐克", "sortName": "上衣", "productKeyword": "耐克,休閒,運動" } POST /gino_product/product/3 { "productName": "阿迪達斯女生冬季運動板鞋", "brandName": "阿迪達斯", "sortName": "鞋子", "productKeyword": "阿迪達斯,冬季,運動,板鞋" } POST /gino_product/product/4 { "productName": "阿迪達斯女生冬季運動夾克外套", "brandName": "阿迪達斯", "sortName": "上衣", "productKeyword": "阿迪達斯,冬季,運動,夾克,外套" }
POST /gino_product/_search { "query": { "multi_match": { "query": "運動", "fields": [ "brandName^100", "brandName.brandName_pinyin^100", "brandName.brandName_keyword^100", "sortName^80", "sortName.sortName_pinyin^80", "productName^60", "productKeyword^20" ], "type": <multi-match-type>, "operator": "AND" } } }
發現使用3種type均可以搜索出4條商品數據,並且排序也是一致的。elasticsearch
POST /gino_product/_search { "query": { "multi_match": { "query": "運動 上衣", "fields": [ "brandName^100", "brandName.brandName_pinyin^100", "brandName.brandName_keyword^100", "sortName^80", "sortName.sortName_pinyin^80", "productName^60", "productKeyword^20" ], "type": <multi-match-type>, "operator": "AND" } } }
此次搜索只有cross_field才能搜索出數據,而使用best_fields和most_fields不行,爲何?ide
使用validate API來比較區別post
POST /gino_product/_validate/query?rewrite=true { "query": { "multi_match": { "query": "運動 上衣", "fields": [ "brandName^100", "brandName.brandName_pinyin^100", "brandName.brandName_keyword^100", "sortName^80", "sortName.sortName_pinyin^80", "productName^60", "productKeyword^20" ], "type": <multi-match-type>, "operator": "AND" } } }
每一個字段匹配時分別使用mapping上定義的analyzer和search_analyzer。學習
(+brandName:運動 +brandName:上衣)^100.0 | (+brandName.brandName_pinyin:運 +brandName.brandName_pinyin:動 +brandName.brandName_pinyin:上 +brandName.brandName_pinyin:衣)^100.0 | (+brandName.brandName_keyword:運 +brandName.brandName_keyword:動 +brandName.brandName_keyword:上 +brandName.brandName_keyword:衣)^100.0 | (+sortName:運動 +sortName:上衣)^80.0 | (+sortName.sortName_pinyin:運 +sortName.sortName_pinyin:動 +sortName.sortName_pinyin:上 +sortName.sortName_pinyin:衣)^80.0 | (+productName:運動 +productName:上衣)^60.0 | (+productKeyword:運動 +productKeyword:上衣)^20.0
與best_fields不一樣之處在於相關性評分,best_fields取最大匹配得分(max計算),而most_fields取全部匹配之和(sum計算)。測試
( (+brandName:運動 +brandName:上衣)^100.0 (+brandName.brandName_pinyin:運 +brandName.brandName_pinyin:動 +brandName.brandName_pinyin:上 +brandName.brandName_pinyin:衣)^100.0 (+brandName.brandName_keyword:運 +brandName.brandName_keyword:動 +brandName.brandName_keyword:上 +brandName.brandName_keyword:衣)^100.0 (+sortName:運動 +sortName:上衣)^80.0 (+sortName.sortName_pinyin:運 +sortName.sortName_pinyin:動 +sortName.sortName_pinyin:上 +sortName.sortName_pinyin:衣)^80.0 (+productName:運動 +productName:上衣)^60.0 (+productKeyword:運動 +productKeyword:上衣)^20.0 )
首先ES會對cross_fields進行查詢重寫分組,分組的依據是search_analyzer。具體到咱們的例子中【brandName.brandName_pinyin、brandName.brandName_keyword、sortName.sortName_pinyin】這三個字段的search_analyzer是standard,而其他的字段是fulltext_analyzer,所以最終被分爲了兩組。ui
( ( +(brandName.brandName_pinyin:運^100.0 | sortName.sortName_pinyin:運^80.0 | brandName.brandName_keyword:運^100.0) +(brandName.brandName_pinyin:動^100.0 | sortName.sortName_pinyin:動^80.0 | brandName.brandName_keyword:動^100.0) +(brandName.brandName_pinyin:上^100.0 | sortName.sortName_pinyin:上^80.0 | brandName.brandName_keyword:上^100.0) +(brandName.brandName_pinyin:衣^100.0 | sortName.sortName_pinyin:衣^80.0 | brandName.brandName_keyword:衣^100.0) ) ( +(productKeyword:運動^20.0 | brandName:運動^100.0 | sortName:運動^80.0 | productName:運動^60.0) +(productKeyword:上衣^20.0 | brandName:上衣^100.0 | sortName:上衣^80.0 | productName:上衣^60.0) ) )
最多見的作法就是使用_all字段或者copyTo字段來實現,好比咱們mapping裏面的bigSearchField字段。spa
因爲cross_fields須要根據search_analyzer進行分組,所以像搜索【運動 shangyi】這樣的輸入時是沒法匹配到商品的,所以應該儘量地減小分組既儘可能使用統一的search_analyzer,或者在search時強制指定search_analyzer覆蓋mapping裏定義的search_analyzer。
在上面的例子中,咱們設置的operator均爲AND,意味着全部搜索的Token都必須被匹配。那設置成OR會怎麼樣以及什麼場景下該使用OR呢?
在使用OR的時候要特別注意,由於只要有一個Token匹配就會把商品搜索出來,好比上面的搜索【運動 上衣】的時候,會把鞋子的商品也匹配出來,這樣搜索的準確度會遠遠下降。
在一些特殊的搜索中,好比咱們搜索【耐克 阿迪達斯 上衣】,若是使用operator爲AND,則不管使用哪一種multi-search-type都沒法匹配出商品(想一想爲何?),此時咱們能夠設置operator爲OR而且設置minimum_should_match爲60%,這樣就能夠搜索出屬於耐克和阿迪達斯的上衣了,這種狀況至關於一種智能的搜索降級了。
/gino_product/_search { "query": { "multi_match": { "query": "耐克 阿迪達斯 上衣", "fields": [ "brandName^100", "brandName.brandName_pinyin^100", "brandName.brandName_keyword^100", "sortName^80", "sortName.sortName_pinyin^80", "productName^60", "productKeyword^20" ], "type": "cross_fields", "operator": "OR", "minimum_should_match": "60%" } } }
在Elasticsearch相關性打分機制學習一文中咱們曾經探討過best_fields和cross_fields相關性評分的機制,其中的例子使用的相同的search_analyzer。那對於分組狀況下,cross_fields評分又是如何計算的呢?
咱們仍是用上面的例子,增長explain參數來看一下。
POST /gino_product/_search { "explain": true, "query": { "multi_match": { "query": "運動 上衣", "fields": [ "brandName^100", "brandName.brandName_pinyin^100", "brandName.brandName_keyword^100", "sortName^80", "sortName.sortName_pinyin^80", "productName^60", "productKeyword^20" ], "type": "cross_fields", "operator": "AND" } } }
詳細ES響應報文:cross_fields_scoring.json
經過上述validate API獲得的分組信息和explain獲得的評分詳情信息,能夠總結出一個cross_fields評分公式:
score(q, d) = coord(q, d) * ∑(∑(max(score(t, f))))