承接上一篇博客 https://segmentfault.com/a/11...segmentfault
most_fields是以字段爲中心,這就使得它會查詢最多匹配的字段。
假設咱們有一個讓用戶搜索地址。其中有兩個文檔以下:app
PUT /test_index/_create/1 { "street": "5 Poland Street", "city": "Poland", "country": "United W1V", "postcode": "W1V 3DG" } PUT /test_index/_create/2 { "street": "5 Poland Street W1V", "city": "London", "country": "United Kingdom", "postcode": "3DG" }
使用most_fields進行查詢:dom
GET /test_index/_search { "query": { "bool": { "should": [ { "match": { "street": "Poland Street W1V" } }, { "match": { "city": "Poland Street W1V" } }, { "match": { "country": "Poland Street W1V" } }, { "match": { "postcode": "Poland Street W1V" } } ] } } }
咱們發現對每一個字段重複查詢字符串很快就會顯得冗長,此時用multi_match進行簡化以下:post
GET /test_index/_search { "query": { "multi_match": { "query": "Poland Street W1V", "type": "most_fields", "fields": ["street", "city", "country", "postcode"] } } }
結果:設計
{ "took" : 4, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : 2.3835402, "hits" : [ { "_index" : "test_index", "_type" : "_doc", "_id" : "1", "_score" : 2.3835402, "_source" : { "street" : "5 Poland Street", "city" : "Poland", "country" : "United W1V", "postcode" : "W1V 3DG" } }, { "_index" : "test_index", "_type" : "_doc", "_id" : "2", "_score" : 0.99938464, "_source" : { "street" : "5 Poland Street W1V", "city" : "London", "country" : "United Kingdom", "postcode" : "3DG" } } ] } }
若是用best_fields,那麼doc2會在doc1的前面code
GET /test_index/_search { "query": { "multi_match": { "query": "Poland Street W1V", "type": "best_fields", "fields": ["street", "city", "country", "postcode"] } } }
結果:排序
{ "took" : 3, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : 0.99938464, "hits" : [ { "_index" : "test_index", "_type" : "_doc", "_id" : "2", "_score" : 0.99938464, "_source" : { "street" : "5 Poland Street W1V", "city" : "London", "country" : "United Kingdom", "postcode" : "3DG" } }, { "_index" : "test_index", "_type" : "_doc", "_id" : "1", "_score" : 0.6931472, "_source" : { "street" : "5 Poland Street", "city" : "Poland", "country" : "United W1V", "postcode" : "W1V 3DG" } } ] } }
(1)它被設計用來找到匹配任意單詞的多數字段,而不是找到跨越全部字段的最匹配的單詞
(2)它不能使用operator或者minimum_should_match參數來減小低相關度結果帶來的長尾效應
(3)每一個字段的詞條頻度是不一樣的,會互相干擾最終獲得較差的排序結果索引
上面那說了most_fields的問題,下面就來解決一下這個問題,解決這個問題的第一種方式就是使用copy_to參數。
咱們能夠用copy_to將多個field組合成一個field
創建以下索引:ip
DELETE /test_index PUT /test_index { "mappings": { "properties": { "street": { "type": "text", "copy_to": "full_address" }, "city": { "type": "text", "copy_to": "full_address" }, "country": { "type": "text", "copy_to": "full_address" }, "postcode": { "type": "text", "copy_to": "full_address" }, "full_address": { "type": "text" } } } }
插入以前的數據:ci
PUT /test_index/_create/1 { "street": "5 Poland Street", "city": "Poland", "country": "United W1V", "postcode": "W1V 3DG" } PUT /test_index/_create/2 { "street": "5 Poland Street W1V", "city": "London", "country": "United Kingdom", "postcode": "3DG" }
查詢:
GET /test_index/_search { "query": { "match": { "full_address": "Poland Street W1V" } } }
結果:
{ "took" : 2, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : 0.68370587, "hits" : [ { "_index" : "test_index", "_type" : "_doc", "_id" : "1", "_score" : 0.68370587, "_source" : { "street" : "5 Poland Street", "city" : "Poland", "country" : "United W1V", "postcode" : "W1V 3DG" } }, { "_index" : "test_index", "_type" : "_doc", "_id" : "2", "_score" : 0.5469647, "_source" : { "street" : "5 Poland Street W1V", "city" : "London", "country" : "United Kingdom", "postcode" : "3DG" } } ] } }
咱們能夠發現這樣變成一個字段full_address以後,就能夠解決most_fields的問題了。
解決most_fields的問題的第二種方式就是使用cross_fields查詢。
若是咱們在索引文檔以前都可以使用_all或是提早定義好copy_to的話,那就沒什麼問題。可是,Elasticsearch同時也提供了一個搜索期間的解決方案就是使用cross_fields查詢。cross_fields採用了一種以詞條爲中心的方法,這種方法和best_fields以及most_fields採用的以字段爲中心的方法有很大的區別。它將全部的字段視爲一個大的字段,而後在任一字段中搜索每一個詞條。
下面解釋一下以字段爲中心和以詞條爲中心的區別。
經過查詢:
GET /test_index/_validate/query?explain { "query": { "multi_match": { "query": "Poland Street W1V", "type": "best_fields", "fields": ["street", "city", "country", "postcode"] } } }
獲得:
{ "_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "valid" : true, "explanations" : [ { "index" : "test_index", "valid" : true, "explanation" : "((postcode:poland postcode:street postcode:w1v) | (country:poland country:street country:w1v) | (city:poland city:street city:w1v) | (street:poland street:street street:w1v))" } ] }
((postcode:poland postcode:street postcode:w1v) |
(country:poland country:street country:w1v) |
(city:poland city:street city:w1v) |
(street:poland street:street street:w1v))
這個就是規則。
將operator設置成and就變成
((+postcode:poland +postcode:street +postcode:w1v) |
(+country:poland +country:street +country:w1v) |
(+city:poland +city:street +city:w1v) |
(+street:poland +street:street +street:w1v))
標識四個詞條都須要出如今相同的字段中
經過查詢
GET /test_index/_validate/query?explain { "query": { "multi_match": { "query": "Poland Street W1V", "type": "cross_fields", "operator": "and", "fields": ["street", "city", "country", "postcode"] } } }
獲得:
{ "_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "valid" : true, "explanations" : [ { "index" : "test_index", "valid" : true, "explanation" : "+blended(terms:[postcode:poland, country:poland, city:poland, street:poland]) +blended(terms:[postcode:street, country:street, city:street, street:street]) +blended(terms:[postcode:w1v, country:w1v, city:w1v, street:w1v])" } ] }
+blended(terms:[postcode:poland, country:poland, city:poland, street:poland]) +blended(terms:[postcode:street, country:street, city:street, street:street]) +blended(terms:[postcode:w1v, country:w1v, city:w1v, street:w1v])
這個是規則。換言之全部的詞必須出如今任意字段中。
cross_fields類型首先會解析查詢字符串來獲得一個詞條列表,而後在任一字段中搜索每一個詞條。經過混合字段的倒排文檔頻度來解決詞條頻度問題。從而完美結局了most_fields的問題。
使用cross_fields相比較於copy_to,能夠在查詢期間對個別字段進行加權。
示例:
GET /test_index/_search { "query": { "multi_match": { "query": "Poland Street W1V", "type": "cross_fields", "fields": ["street^2", "city", "country", "postcode"] } } }
這樣street字段的boost就是2,其它字段都爲1