elasticsearch 中文分詞插件Synonym-Analysis

elasticsearch 版本 7.3java

安裝同義詞插件git

插件對應的版本須要和elasticsearch的版本一致github

插件下載地址app

https://github.com/bells/elasticsearch-analysis-dynamic-synonymelasticsearch

安裝步驟以下:maven

  1. 在elasticsearch-7.3.0/plugins目錄下新建 analysis-dynamic-synonym文件夾
  2. 若是是最新版master須要使用maven編譯後將jar放入剛剛建好的文件夾中,若是是其餘已經編譯好的版本直接複製放入便可,本例須要編譯的版本
  3. 同時加入plugin-descriptor.properties 和 plugin-security.policy,內容詳見下文
  4. 重啓集羣

遠程同義詞庫一樣是根據http header 中的 Last-Modified 和 ETag 判斷是否須要更新,具體能夠參考個人另外一篇博客方法是同樣的elasticsearch 中文分詞插件IK-Analyze測試

編譯後僅獲得一個jar包,其實elasticsearch安裝插件還須要配置文件plugin-descriptor.properties,沒有此文件會報錯,文件內容以下.net

description=Analysis-plugin for synonym

version=5.1.1

name=analysis-dynamic-synonym

classname=com.bellszhu.elasticsearch.plugin.DynamicSynonymPlugin

java.version=1.8

elasticsearch.version=7.3.0

還須要放入配置文件 plugin-security.policy插件

grant {
  // needed because of the hot reload functionality
  permission java.net.SocketPermission "*", "connect,resolve";
};

若是elasticsearch報錯提示code

failed to get synonyms : http://10.0.11.1:10002/elasticsearch/synonymDict
access denied ("java.net.SocketPermission" "10.0.11.1:10002" "connect,resolve")

緣由是缺乏plugin-security.policy文件

commons httpclient 這些jar能夠下載一個低版本的插件從中獲取

插件全部文件以下:

測試同義詞插件

建立一個索引,指定遠程同義詞庫

PUT /full_text_test123
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "ik_syno_smart": {
            "tokenizer": "ik_max_word",
            "type": "custom",
            "filter": [
              "remote_synonym"
            ]
          }
        },
        "filter": {
          "remote_synonym": {
            "type": "dynamic_synonym",
            "synonyms_path": "http://10.0.11.1:10002/elasticsearch/synonymDict",
            "interval": 30
          }
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "analyzer": "ik_syno_smart",
        "search_analyzer": "ik_smart"
      }
    }
  }
}

添加同義詞

執行解析器

POST /full_text_test123/_analyze
{
  "text": ["西紅柿"],
  "analyzer": "ik_syno_smart"
}

結果

{
  "tokens" : [
    {
      "token" : "西紅柿",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "番茄",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "SYNONYM",
      "position" : 0
    }
  ]
}
相關文章
相關標籤/搜索