【ES】match_phrase與regexp

時間 2019-11-13

標籤 match phrase regexp 简体版

原文原文鏈接

剛開始接觸es，因爲弄不清楚match_phrase和regexp致使不少查詢結果與預想的不一樣。在這整理一下。正則表達式

regexp：針對的是單個詞項spa

match_phrase：針對的是多個詞項的相對位置code

它們的查詢結果跟分析器分詞的方式有很大關係。regexp

好比，我有兩個字符串"HELLO-world" 和 "hello.WORLD"，字段名稱是title。blog

針對"HELLO-world"，看下面兩個語句。第二個是能夠匹配的，第一個不能夠。token

{ "regexp": { "title": "hello-w.*" }} 
{ "match_phrase": { "title": "hello world" }}

分析一下，能夠看到，HELLO-world被分爲了兩個單詞，hello和world。字符串

-GET _analyze
{        
    "field": "title",
    "text": "HELLO-world"
}
---------------------------
{
  "tokens" : [
    {
      "token" : "hello",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "world",
      "start_offset" : 6,
      "end_offset" : 11,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}

首先，es是沒有大寫的，全部的字符都被轉換成了小寫。其次，"-"字符丟失了。it

regexp是針對單個詞項的，不管是hello仍是world，都不符合正則條件，故沒有匹配。io

match_phrase是針對多個詞項的。首先match_phrase的"hello world"被分爲了hello和world兩個單詞，而後這兩個單詞在title的分詞中均可以找到，而且相對位置知足條件，故語句能夠匹配。class

再看 "hello.WORLD"

{ "regexp": { "title": "hello\\.w.*" }} 
{ "match_phrase": { "title": "hello world" }}

結果是，第一個能夠匹配，而第二個不能。

緣由看分詞結果：

-GET_analyze
{        
    "field": "title",
    "text": "hello.WORLD"
}
-------------------------------
{
  "tokens" : [
    {
      "token" : "hello.world",
      "start_offset" : 0,
      "end_offset" : 11,
      "type" : "<ALPHANUM>",
      "position" : 0
    }
  ]
}