es原理

一:  一個請求到達es集羣,選中一個coordinate節點之後,會經過請求路由到指定primary shard中,若是分發策略選擇爲round-robin,若是來4個請求,則2個打到primary shard中2個打到replic shard中。java

二: es在多個shard進行分片但數據傾斜嚴重的時候有可能會發生搜索score不許的狀況,由於IDF分值的計算方法實在shard本地完成的;如shard1中數據較多,在計算某一詞搜索時的分值時會致使分值總體降低,而這時shard2中出現的詞頻較少會總體分值偏高,這樣容易致使本來不太相關的內容卻變得分值高了起來,從而使排序不許;解決方法就是讓多個shard在生產環境中儘可能作到數據均衡分佈,這樣就不會由於score的本地計算而總體受影響。app

三: es計算分值時有兩種策略:ide

1)most-field->默認策略是全文檢索的全部關鍵詞,在document的每個field中可匹配的次數越多則分值越高;規則:(每一個match中field匹配分值的和) *(實際document匹配到了字段個數)/(query中match的個數) ,以下代碼:idea

GET /index3/type3/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title":"spark"//title中可匹配成功
          }
        },
        {
          "match": {
            "content":"java"//content中也可匹配成功
          }
        }
      ]
    }
  }
}
View Code

2)beast-field->若是使用dis_max,document的分值則會根據match中field匹配分值最高的決定,也就是說和其餘屬性無關spa

GET /index3/type3/_search
{
  "query": {
    "dis_max": {
      "queries": [
        {
          "match": {
            "title": "spark"
          }
        },
        {
          "match": {
            "content": "java"
          }
        }
      ]
    }
  }
View Code

 3)es中除了most_fields和beast_fields之外,使用cross_fields的狀況仍是比較多的,使用es系統中默認的cross_fields策略實質是將 "fields": ["name","content"]兩個字段的內容放到一塊兒後創建索引,這樣就能經過一個fullField字段進行fullText,使結果更加準確code

搜索參數:
GET /index2/type2/_search
{
  "query": {
    "multi_match": {
      "query": "happening like",
      //query中的搜索詞條去content和name兩個字段中來匹配,不過會因爲兩個字段mapping定義不一樣致使得分不一樣,排序結果可能有差別
      "fields": ["name","content"],
      //best_fields策略是每一個document的得分等於得分最高的match field的值;而匹配出最佳之後,其它document得分未必準確;most_fields根據每一個field的評分計算出ducoment的綜合評分
      "type":"cross_fields",
      "operator":"and"
    }
  }
}
結果:
{
  "took": 36,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0.84968257,
    "hits": [
      {
        "_index": "index2",
        "_type": "type2",
        "_id": "2",
        "_score": 0.84968257,
        "_source": {
          "num": 10,
          "title": "他的名字",
          "name": "yes happening like write",
          "content": "happening like"
        }
      },
      {
        "_index": "index2",
        "_type": "type2",
        "_id": "4",
        "_score": 0.8164005,
        "_source": {
          "num": 1000,
          "title": "個人名字",
          "name": "happening like write",
          "content": "happening hello like yeas and he happening like had read a lot about happening hello like"
        }
      },
      {
        "_index": "index2",
        "_type": "type2",
        "_id": "3",
        "_score": 0.5063205,
        "_source": {
          "num": 105,
          "title": "這是誰的名字",
          "name": "happening like write",
          "content": " national  treasure because  of its rare number and cute appearance. Many foreign people are so crazy about  pandas and they can’t watching these  lovely creatures all the time. Though some action"
        }
      }
    ]
  }
}
View Code

 四:提高全文檢索效果的兩種方法blog

1) 使用boost提高檢索分值排序

GET index3/type3/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "content": {
              "query": "from",
              "boost":5//使用boost將term檢索評分提高5倍
            }
          }
        },{
          "match": {
            "content": {
              "query": "foot"//若是不使用boost則搜索foot則會得分較高
            }
          }
        }
      ]
    }
  }
}
結果:
{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1.3150566,
    "hits": [
      {
        "_index": "index3",
        "_type": "type3",
        "_id": "1",
        "_score": 1.3150566,
        "_source": {
          "date": "2019-01-02",
          "name": "the little",
          "content": "Half the hello book ideas in his talk were plagiarized from an article I wrote last month.",
          "no": "123"
        }
      },
      {
        "_index": "index3",
        "_type": "type3",
        "_id": "5",
        "_score": 1.3114156,
        "_source": {
          "date": "2019-05-01",
          "name": "http litty",
          "content": "There are hello moments in life when you miss book someone so much that you just want to pick them from your dreams",
          "no": "564",
          "description": "描述"
        }
      },
      {
        "_index": "index3",
        "_type": "type3",
        "_id": "3",
        "_score": 0.28582606,
        "_source": {
          "date": "2019-07-01",
          "name": "very tag",
          "content": "Some of our hello  comrades love book to write long articles with no substance, very much like the foot bindings of a slattern, long as well as smelly",
          "no": "123"
        }
      }
    ]
  }
}
View Code

2)使用boosting的positive和negative進行反向篩選,經過設置 (negative_boost:0.5) 下降分值索引

GET index3/type3/_search
{
  "query": {
    "boosting": {
      //正常匹配的
      "positive": {
        "match": {
          "content": "from"
        }
      },
      //下降分值去匹配的,如下字段的分值乘以negative_boost值
      "negative": {
        "match": {
            "content": {
              "query": "Half"
            }
          }
      },
      "negative_boost": 0.1
    }
  }
}
結果:
{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.26228312,
    "hits": [
      {
        "_index": "index3",
        "_type": "type3",
        "_id": "5",
        "_score": 0.26228312,
        "_source": {
          "date": "2019-05-01",
          "name": "http litty",
          "content": "There are hello moments in life when you miss book someone so much that you just want to pick them from your dreams",
          "no": "564",
          "description": "描述"
        }
      },
      {
        "_index": "index3",
        "_type": "type3",
        "_id": "1",
        "_score": 0.026301134,
        "_source": {
          "date": "2019-01-02",
          "name": "the little",
          "content": "Half the hello book ideas in his talk were plagiarized from an article I wrote last month.",
          "no": "123"
        }
      }
    ]
  }
}
View Code
相關文章
相關標籤/搜索