ES11-全文檢索

時間 2019-12-19

標籤 es11 全文檢索简体版

原文原文鏈接

高級別全文檢索一般用於在全文本字段（如電子郵件正文）上運行全文檢索。他們瞭解如何分析被查詢的字段，並在執行以前將每一個字段的分析器（或search_analyzer）應用於查詢字符串。spa

1.term查詢

term是表明徹底匹配，也就是精確查詢，搜索前不會再對搜索詞進行分詞，因此咱們的搜索詞必須是文檔分詞集合中的一個。code

例如咱們能夠經過指定分詞器對」週五召開董事會會議審議及批准更新後的一季報「進行分詞。token

GET telegraph/_analyze
{
  "analyzer": "ik_max_word",
  "text": "週五召開董事會會議 審議及批准更新後的一季報"
}

分詞結果集合中共有15個ip

{
  "tokens": [
    {
      "token": "週五",
      "start_offset": 0,
      "end_offset": 2,
      "type": "CN_WORD",
      "position": 0
    },
    {
      "token": "五",
      "start_offset": 1,
      "end_offset": 2,
      "type": "TYPE_CNUM",
      "position": 1
    },
    {
      "token": "召開",
      "start_offset": 2,
      "end_offset": 4,
      "type": "CN_WORD",
      "position": 2
    },
    {
      "token": "董事會",
      "start_offset": 4,
      "end_offset": 7,
      "type": "CN_WORD",
      "position": 3
    },
    {
      "token": "董事",
      "start_offset": 4,
      "end_offset": 6,
      "type": "CN_WORD",
      "position": 4
    },
    {
      "token": "會會",
      "start_offset": 6,
      "end_offset": 8,
      "type": "CN_WORD",
      "position": 5
    },
    {
      "token": "會議",
      "start_offset": 7,
      "end_offset": 9,
      "type": "CN_WORD",
      "position": 6
    },
    {
      "token": "審議",
      "start_offset": 10,
      "end_offset": 12,
      "type": "CN_WORD",
      "position": 7
    },
    {
      "token": "及",
      "start_offset": 12,
      "end_offset": 13,
      "type": "CN_CHAR",
      "position": 8
    },
    {
      "token": "批准",
      "start_offset": 13,
      "end_offset": 15,
      "type": "CN_WORD",
      "position": 9
    },
    {
      "token": "更新",
      "start_offset": 15,
      "end_offset": 17,
      "type": "CN_WORD",
      "position": 10
    },
    {
      "token": "後",
      "start_offset": 17,
      "end_offset": 18,
      "type": "CN_CHAR",
      "position": 11
    },
    {
      "token": "的",
      "start_offset": 18,
      "end_offset": 19,
      "type": "CN_CHAR",
      "position": 12
    },
    {
      "token": "一季",
      "start_offset": 19,
      "end_offset": 21,
      "type": "CN_WORD",
      "position": 13
    },
    {
      "token": "一",
      "start_offset": 19,
      "end_offset": 20,
      "type": "TYPE_CNUM",
      "position": 14
    },
    {
      "token": "季報",
      "start_offset": 20,
      "end_offset": 22,
      "type": "CN_WORD",
      "position": 15
    }
  ]
}

咱們用term進行搜索」會議「文檔

GET telegraph/_search
{
  "query": {
    "term": {
      "title": {
        "value": "會議"
      }
    }
  }
}

因爲搜索字段」會議「屬於分詞集合，能夠搜索到結果字符串

{
  "took": 9,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "telegraph",
        "_type": "msg",
        "_id": "AZetp2QBW8hrYY3zGJk7",
        "_score": 0.2876821,
        "_source": {
          "title": "週五召開董事會會議 審議及批准更新後的一季報",
          "content": "以審議及批准更新後的2018年第一季度報告",
          "author": "中興通信",
          "pubdate": "2018-07-17T12:33:11"
        }
      }
    ]
  }
}

若是咱們搜索」董事會會議「string

GET telegraph/_search
{
  "query": {
    "term": {
      "title": {
        "value": "董事會會議"
      }
    }
  }
}

」董事會會議「雖然屬於文檔文本中的一部分，可是因爲沒有在分詞集合中，因此也是搜索不到的it

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

2.match搜索

match查詢會先對搜索詞進行分詞,分詞完畢後再逐個對分詞結果進行匹配，所以相比於term的精確搜索，match是分詞匹配搜索。io

當咱們搜索」河北會議「時，搜索詞首先會被分解爲」河北「、」會議「，只要文檔中包含」河北「、」會議「任意一個就會被搜索到。固然咱們也能夠經過」operator「來指定被分解詞匹配邏輯關係，好比咱們能夠指定」operator「爲」and「時，只有文檔的分詞集合中同時含有」河北「和」會議「纔會被搜索到。默認」operator「爲」or「，也就是隻要文檔分詞集合中只要含有任意一個就會被搜索到。date

GET telegraph/_search
{
  "query": {
    "match": {
      "title": {
        "query": "河北會議"
      }
    }
  }
}

搜索結果

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.99277425,
    "hits": [
      {
        "_index": "telegraph",
        "_type": "msg",
        "_id": "BJetp2QBW8hrYY3zGJk7",
        "_score": 0.99277425,
        "_source": {
          "title": "河北聚焦十大行業推動國際產能合做",
          "content": "河北省政府近日出臺積極參與「一帶一路」建設推動國際產能合做實施方案",
          "author": "財聯社",
          "pubdate": "2018-07-17T14:14:55"
        }
      },
      {
        "_index": "telegraph",
        "_type": "msg",
        "_id": "AZetp2QBW8hrYY3zGJk7",
        "_score": 0.2876821,
        "_source": {
          "title": "週五召開董事會會議 審議及批准更新後的一季報",
          "content": "以審議及批准更新後的2018年第一季度報告",
          "author": "中興通信",
          "pubdate": "2018-07-17T12:33:11"
        }
      }
    ]
  }
}

若是咱們指定」operator「爲」and「進行搜索

GET telegraph/_search
{
  "query": {
    "match": {
      "title": {
        "query": "河北會議",
        "operator": "and"
      }
    }
  }
}

由於全部文檔中沒有一個的分詞集合中既包含」河北「又包含」會議「，因此搜索結果爲空。

{
  "took": 8,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

3.match_phrase查詢

match_phrase查詢會將查詢內容分詞，分詞器能夠自定義，文檔中同時知足如下三個條件纔會被檢索到：

分詞後全部詞項都要出如今該字段中
字段中的詞項順序要一致
各搜索詞之間必須緊鄰

一樣上面的例子，咱們搜索」董事會會議「，文檔會被搜索到。若是分詞順序不一致或者沒有緊密相鄰都不能被搜索到。

GET telegraph/_search
{
  "query": {
    "match_phrase": {
      "title":{
        "query": "董事會會議"
      }
    }
  }
}

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1.1507283,
    "hits": [
      {
        "_index": "telegraph",
        "_type": "msg",
        "_id": "AZetp2QBW8hrYY3zGJk7",
        "_score": 1.1507283,
        "_source": {
          "title": "週五召開董事會會議 審議及批准更新後的一季報",
          "content": "以審議及批准更新後的2018年第一季度報告",
          "author": "中興通信",
          "pubdate": "2018-07-17T12:33:11"
        }
      }
    ]
  }
}

4.match_phrase_prefix

match_phrase_prefix與match_phrase比較相近，只是match_phrase_prefix容許搜索詞的最後一個分詞的前綴匹配上便可。

上面的例子中文檔的分詞集合中有」召開「、」董事會「這兩個緊鄰的分詞。咱們使用match_phrase_prefix搜索時只須要搜索詞中包含」召開「以及」董事會「的前綴就能匹配上。

GET telegraph/_search
{
  "query": {
    "match_phrase_prefix": {
      "title": {
        "query": "召開董"
      }
    }
  }
}

{
  "took": 10,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.8630463,
    "hits": [
      {
        "_index": "telegraph",
        "_type": "msg",
        "_id": "AZetp2QBW8hrYY3zGJk7",
        "_score": 0.8630463,
        "_source": {
          "title": "週五召開董事會會議 審議及批准更新後的一季報",
          "content": "以審議及批准更新後的2018年第一季度報告",
          "author": "中興通信",
          "pubdate": "2018-07-17T12:33:11"
        }
      }
    ]
  }
}

5.multi_match

當咱們想對多個字段進行匹配，其中一個字段包含分詞就被文檔就被搜索到時，能夠用multi_match。

咱們搜索」聚焦成交「，只要」title「、」content「任意一個字段中包含

GET telegraph/_search
{
  "query": {
    "multi_match": {
      "query": "聚焦成交",
      "fields": ["title","content"]
    }
  }
}

{
  "took": 9,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1.0806551,
    "hits": [
      {
        "_index": "telegraph",
        "_type": "msg",
        "_id": "Apetp2QBW8hrYY3zGJk7",
        "_score": 1.0806551,
        "_source": {
          "title": "長生生物再次跌停 三機構拋售近1000萬元",
          "content": "長生生物再次一字跌停，報收19.89元，成交1432萬元",
          "author": "長生生物",
          "pubdate": "2018-07-17T10:03:11"
        }
      },
      {
        "_index": "telegraph",
        "_type": "msg",
        "_id": "BJetp2QBW8hrYY3zGJk7",
        "_score": 0.99277425,
        "_source": {
          "title": "河北聚焦十大行業推動國際產能合做",
          "content": "河北省政府近日出臺積極參與「一帶一路」建設推動國際產能合做實施方案",
          "author": "財聯社",
          "pubdate": "2018-07-17T14:14:55"
        }
      }
    ]
  }
}

6.common_terms

7.query_string

8.simple_query_string

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。