Elasticsearch添加中文分詞

安裝IK分詞插件

GitHub上下載項目(我下載到了/tmp下),並解壓git

cd /tmp
wget https://github.com/medcl/elasticsearch-analysis-ik/archive/master.zip
unzip master.zip

進入elasticsearch-analysis-ik-mastergithub

cd elasticsearch-analysis-ik/

而後使用mvn命令,編譯出jar包,elasticsearch-analysis-ik-1.4.0.jar,這個過程可能須要多嘗試幾回才能成功app

mvn package

順便說一下,mvn須要安裝maven,在Ubuntu上,安裝maven的命令以下curl

apt-cache search maven
sudo apt-get install maven
mvn -version

elasticsearch-analysis-ik-master/下的ik文件夾複製到${ES_HOME}/config/elasticsearch

elasticsearch-analysis-ik-master/target下的elasticsearch-analysis-ik-1.4.0.jar複製到${ES_HOME}/libmaven

${ES_HOME}/config/下的配置文件elasticsearch.yml中增長ik的配置,在最後增長ide

index:
  analysis:                   
    analyzer:      
      ik:
          alias: [ik_analyzer]
          type: org.elasticsearch.index.analysis.IkAnalyzerProvider
      ik_max_word:
          type: ik
          use_smart: false
      ik_smart:
          type: ik
          use_smart: true
index.analysis.analyzer.default.type: ik

同時,還須要在${ES_HOME}/lib中引入httpclient-4.3.5.jarhttpcore-4.3.2.jar測試

IK分詞測試

建立一個索引,名爲indexurl

curl -XPUT http://localhost:9200/index

爲索引index建立mapping插件

curl -XPOST http://localhost:9200/index/fulltext/_mapping -d ' 
{
        "fulltext": {
             "_all": {
            "analyzer": "ik"
        },
       "properties": {
            "content": {
                "type" : "string",
                "boost" : 8.0,
                "term_vector" : "with_positions_offsets",
                "analyzer" : "ik",
                "include_in_all" : true
            }
        }
    }
}'

測試

curl -XGET 'localhost:9200/index/_analyze?analyzer=ik&pretty=true' -d '
{
     測試Elasticsearch分詞器
}'

{
  "tokens" : [ {
    "token" : "測試",
    "start_offset" : 9,
    "end_offset" : 11,
    "type" : "CN_WORD",
    "position" : 1
  }, {
    "token" : "elasticsearch",
    "start_offset" : 11,
    "end_offset" : 24,
    "type" : "ENGLISH",
    "position" : 2
  }, {
    "token" : "分詞器",
    "start_offset" : 24,
    "end_offset" : 27,
    "type" : "CN_WORD",
    "position" : 3
  }, {
    "token" : "分詞",
    "start_offset" : 24,
    "end_offset" : 26,
    "type" : "CN_WORD",
    "position" : 4
  }, {
    "token" : "詞",
    "start_offset" : 25,
    "end_offset" : 26,
    "type" : "CN_WORD",
    "position" : 5
  }, {
    "token" : "器",
    "start_offset" : 26,
    "end_offset" : 27,
    "type" : "CN_CHAR",
    "position" : 6
  } ]
}
相關文章
相關標籤/搜索