elasticsearch實現like查詢

時間 2019-11-08

標籤 elasticsearch 實現查詢欄目日誌分析简体版

原文原文鏈接

問題

elasticsearch查詢須要實現相似於mysql的like查詢效果，例如值爲hello中國233的記錄，便可以經過中國查詢出記錄，也能夠經過llo查詢出記錄。mysql

可是elasticsearch的查詢都是基於分詞查詢，hello中國233會默認分詞爲hello、中、國、233。當使用hello查詢時能夠匹配到該記錄，可是使用llo查詢時，匹配不到該記錄。sql

解決

因爲記錄內容分詞的結果的粒度不夠細，致使分詞查詢匹配不到記錄，所以解決方案是將記錄內容以每一個字符進行分詞。即把hello中國233分詞爲h、e、l、o、中、國、2、3。bash

elasticsearch默認沒有如上效果的分詞器，能夠經過自定義分詞器實現該效果：經過字符過濾器，將字符串的每個字符間添加一個空格，再使用空格分詞器將字符串拆分紅字符。app

效果

默認分詞

PUT /like_search
{
  "mappings": {
    "like_search_type": {
      "properties": {
        "name": {
          "type": "text"
        }
      }
    }
  }
}

PUT /like_search/like_search_type/1
{
  "name": "hello中國233"
}
複製代碼

分詞效果elasticsearch

GET /like_search/_analyze
{
  "text": [
    "hello中國233"
    ]
}
複製代碼

{
  "tokens": [
    {
      "token": "hello",
      "start_offset": 0,
      "end_offset": 5,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "中",
      "start_offset": 5,
      "end_offset": 6,
      "type": "<IDEOGRAPHIC>",
      "position": 1
    },
    {
      "token": "國",
      "start_offset": 6,
      "end_offset": 7,
      "type": "<IDEOGRAPHIC>",
      "position": 2
    },
    {
      "token": "233",
      "start_offset": 7,
      "end_offset": 10,
      "type": "<NUM>",
      "position": 3
    }
  ]
}
複製代碼

elasticsearch默認使用standard分詞器，以下經過llo查詢不到hello中國233的記錄。spa

GET /like_search/_search
{
  "query": {
    "match_phrase": {
      "name": "llo"
    }
  }
}
複製代碼

自定義分詞

PUT /like_search
{
  "settings": {
    "analysis": {
      "analyzer": {
        "char_analyzer": {
          "char_filter": [
            "split_by_whitespace_filter"
          ],
          "tokenizer": "whitespace"
        }
      },
      "char_filter": {
        "split_by_whitespace_filter": {
          "type": "pattern_replace",
          "pattern": "(.+?)",
          "replacement": "$1 "
        }
      }
    }
  },
  "mappings": {
    "like_search_type": {
      "properties": {
        "name": {
          "type": "text",
          "analyzer": "char_analyzer"
        }
      }
    }
  }
}

PUT /like_search/like_search_type/1
{
  "name": "hello中國233"
}
複製代碼

分詞效果code

GET /like_search/_analyze
{
  "analyzer": "char_analyzer", 
  "text": [
    "hello中國233"
    ]
}
複製代碼

{
  "tokens": [
    {
      "token": "h",
      "start_offset": 0,
      "end_offset": 0,
      "type": "word",
      "position": 0
    },
    {
      "token": "e",
      "start_offset": 1,
      "end_offset": 1,
      "type": "word",
      "position": 1
    },
    {
      "token": "l",
      "start_offset": 2,
      "end_offset": 2,
      "type": "word",
      "position": 2
    },
    {
      "token": "l",
      "start_offset": 3,
      "end_offset": 3,
      "type": "word",
      "position": 3
    },
    {
      "token": "o",
      "start_offset": 4,
      "end_offset": 4,
      "type": "word",
      "position": 4
    },
    {
      "token": "中",
      "start_offset": 5,
      "end_offset": 5,
      "type": "word",
      "position": 5
    },
    {
      "token": "國",
      "start_offset": 6,
      "end_offset": 6,
      "type": "word",
      "position": 6
    },
    {
      "token": "2",
      "start_offset": 7,
      "end_offset": 7,
      "type": "word",
      "position": 7
    },
    {
      "token": "3",
      "start_offset": 8,
      "end_offset": 8,
      "type": "word",
      "position": 8
    },
    {
      "token": "3",
      "start_offset": 9,
      "end_offset": 9,
      "type": "word",
      "position": 9
    }
  ]
}
複製代碼

使用自定義的分詞器，以下經過llo能夠查詢到hello中國233的記錄。token

GET /like_search/_search
{
  "query": {
    "match_phrase": {
      "name": "llo"
    }
  }
}
複製代碼

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。