Elasticsearch之中文分詞器插件es-ik(博主推薦)

 

 

 

 

前提html

什麼是倒排索引?

Elasticsearch之分詞器的做用

Elasticsearch之分詞器的工做流程

Elasticsearch之停用

Elasticsearch之中文分詞器

Elasticsearch之幾個重要的分詞器

 

 

 

 

 

 

 

 

elasticsearch官方默認的分詞插件git

  一、elasticsearch官方默認的分詞插件,對中文分詞效果不理想。github

  好比,我如今,拿個具體實例來展示下,驗證爲何,es官網提供的分詞插件對中文分詞而言,效果差apache

[hadoop@HadoopMaster elasticsearch-2.4.3]$ jps
2044 Jps
1979 Elasticsearch
[hadoop@HadoopMaster elasticsearch-2.4.3]$ pwd
/home/hadoop/app/elasticsearch-2.4.3
[hadoop@HadoopMaster elasticsearch-2.4.3]$ curl 'http://192.168.80.10:9200/zhouls/_analyze?pretty=true' -d '{"text":"這裏是好記性不如爛筆頭感嘆號的博客園"}'
{
"tokens" : [ {
"token" : "",
"start_offset" : 0,
"end_offset" : 1,
"type" : "<IDEOGRAPHIC>",
"position" : 0
}, {
"token" : "",
"start_offset" : 1,
"end_offset" : 2,
"type" : "<IDEOGRAPHIC>",
"position" : 1
}, {
"token" : "",
"start_offset" : 2,
"end_offset" : 3,
"type" : "<IDEOGRAPHIC>",
"position" : 2
}, {
"token" : "",
"start_offset" : 3,
"end_offset" : 4,
"type" : "<IDEOGRAPHIC>",
"position" : 3
}, {
"token" : "",windows

 

"start_offset" : 4,
"end_offset" : 5,
"type" : "<IDEOGRAPHIC>",
"position" : 4
}, {
"token" : "",
"start_offset" : 5,
"end_offset" : 6,
"type" : "<IDEOGRAPHIC>",
"position" : 5
}, {
"token" : "",
"start_offset" : 6,
"end_offset" : 7,
"type" : "<IDEOGRAPHIC>",
"position" : 6
}, {
"token" : "",
"start_offset" : 7,
"end_offset" : 8,
"type" : "<IDEOGRAPHIC>",
"position" : 7
}, {
"token" : "",
"start_offset" : 8,
"end_offset" : 9,
"type" : "<IDEOGRAPHIC>",
"position" : 8
}, {
"token" : "",瀏覽器

"start_offset" : 9,
"end_offset" : 10,
"type" : "<IDEOGRAPHIC>",
"position" : 9
}, {
"token" : "",
"start_offset" : 10,
"end_offset" : 11,
"type" : "<IDEOGRAPHIC>",
"position" : 10
}, {
"token" : "",
"start_offset" : 11,
"end_offset" : 12,
"type" : "<IDEOGRAPHIC>",
"position" : 11
}, {
"token" : "",
"start_offset" : 12,
"end_offset" : 13,
"type" : "<IDEOGRAPHIC>",
"position" : 12
}, {
"token" : "",
"start_offset" : 13,
"end_offset" : 14,
"type" : "<IDEOGRAPHIC>",
"position" : 13
}, {
"token" : "",app

"start_offset" : 14,
"end_offset" : 15,
"type" : "<IDEOGRAPHIC>",
"position" : 14
}, {
"token" : "",
"start_offset" : 15,
"end_offset" : 16,
"type" : "<IDEOGRAPHIC>",
"position" : 15
}, {
"token" : "",
"start_offset" : 16,
"end_offset" : 17,
"type" : "<IDEOGRAPHIC>",
"position" : 16
}, {
"token" : "",
"start_offset" : 17,
"end_offset" : 18,
"type" : "<IDEOGRAPHIC>",
"position" : 17
} ]
}
[hadoop@HadoopMaster elasticsearch-2.4.3]$curl

 

 

 

總結elasticsearch

     若是直接使用Elasticsearch的朋友在處理中文內容的搜索時,肯定會遇到很尷尬的問題——中文詞語被分紅了一個一個的漢字,當用Kibana做圖的時候,按照term來分組,結果一個漢字被分紅了一組。maven

     這是由於使用了Elasticsearch中默認的標準分詞器,這個分詞器在處理中文的時候會把中文單詞切分紅一個一個的漢字,所以引入es之中文的分詞器插件es-ik就能解決這個問題

 

 

 

 

 

 

 

 

 

如何集成IK分詞工具

   總的流程以下:

第一步:下載es的IK插件https://github.com/medcl/elasticsearch-analysis-ik/tree/2.x

第二步:使用maven對下載的es-ik源碼進行編譯(mvn clean package -DskipTests)

第三步:把編譯後的target/releases下的elasticsearch-analysis-ik-1.10.3.zip文件拷貝到ES_HOME/plugins/ik目錄下面,而後使用unzip命令解壓

    若是unzip命令不存在,則安裝:yum install -y unzip

第四步:重啓es服務

第五步:測試分詞效果: curl 'http://your ip:9200/zhouls/_analyze?analyzer=ik_max_word&pretty=true' -d '{"text":"這裏是好記性不如爛筆頭感嘆號的博客們"}'

   注意:若你是單節點的es集羣的話,則只需在一臺部署es-ik。若好比像我這裏的話,是3臺,則需在三臺都部署es-ik,且配置要同樣。

 

 

 

 

elasticsearch-analysis-ik-1.10.0.zip  對應於  elasticsearch-2.4.0

elasticsearch-analysis-ik-1.10.3.zip  對應於  elasticsearch-2.4.3

 

 

 

 

  我這裏,已經給你們準備好了,如下是個人CSDN帳號。下載好了,你們能夠去下載。

 

http://download.csdn.net/detail/u010106732/9890897


http://download.csdn.net/detail/u010106732/9890918

 

 

 

 

 

 

 

https://github.com/medcl/elasticsearch-analysis-ik/tree/v1.10.0

 

 

 

 

 

 

  

 

 

 

 

 

 

  第一步: 在瀏覽器裏,輸入https://github.com/

 

 

 

 

  第二步https://github.com/search?utf8=%E2%9C%93&q=elasticsearch-ik

 

 

 

  

  第三步https://github.com/medcl/elasticsearch-analysis-ik  ,點擊2.x 。固然也有一些人在用2.4.0版本,都適用。若你是使用5.X,則本身對號入座便可,這個很簡單。

 

 

 

 

 

  第四步https://github.com/medcl/elasticsearch-analysis-ik/tree/2.x 獲得

 

 

 

  第五步:找到以後,點擊,下載,這裏選擇離線安裝

  

 

 

 

  第六步:將Elasticsearch之中文分詞器插件es-ik的壓縮包解壓下,初步認識下其目錄結構,好比我這裏放到D盤下來認識下。併爲後續的maven編譯作基礎。

 

  

 

 

  第七步:用本地安裝好的maven來編譯

 

Microsoft Windows [版本 6.1.7601]
版權全部 (c) 2009 Microsoft Corporation。保留全部權利。

C:\Users\Administrator>cd D:\elasticsearch-analysis-ik-2.x

C:\Users\Administrator>d:

D:\elasticsearch-analysis-ik-2.x>mvn

 

 

   獲得,

 

 

 

 

D:\elasticsearch-analysis-ik-2.x>mvn clean package -DskipTests
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building elasticsearch-analysis-ik 1.10.4
[INFO] ------------------------------------------------------------------------
Downloading: http://maven.aliyun.com/nexus/content/repositories/central/org/apac
he/maven/plugins/maven-enforcer-plugin/1.0/maven-enforcer-plugin-1.0.pom
Downloaded: http://maven.aliyun.com/nexus/content/repositories/central/org/apach
e/maven/plugins/maven-enforcer-plugin/1.0/maven-enforcer-plugin-1.0.pom (7 KB at
2.5 KB/sec)
Downloading: http://maven.aliyun.com/nexus/content/repositories/central/org/apac
he/maven/enforcer/enforcer/1.0/enforcer-1.0.pom
Downloaded: http://maven.aliyun.com/nexus/content/repositories/central/org/apach
e/maven/enforcer/enforcer/1.0/enforcer-1.0.pom (12 KB at 19.5 KB/sec)
Downloading: http://maven.aliyun.com/nexus/content/repositories/central/org/apac
he/maven/maven-parent/17/maven-parent-17.pom
Downloaded: http://maven.aliyun.com/nexus/content/repositories/central/org/apach
e/maven/maven-parent/17/maven-parent-17.pom (25 KB at 41.9 KB/sec)
Downloading: http://maven.aliyun.com/nexus/content/repositories/central/org/apac
he/maven/plugins/maven-enforcer-plugin/1.0/maven-enforcer-plugin-1.0.jar
Downloaded: http://maven.aliyun.com/nexus/content/repositories/central/org/apach
e/maven/plugins/maven-enforcer-plugin/1.0/maven-enforcer-plugin-1.0.jar (22 KB a
t 44.2 KB/sec)
Downloading: http://maven.aliyun.com/nexus/content/repositories/central/org/apac
he/maven/plugins/maven-compiler-plugin/3.5.1/maven-compiler-plugin-3.5.1.pom
Downloaded: http://maven.aliyun.com/nexus/content/repositories/central/org/apach
e/maven/plugins/maven-compiler-plugin/3.5.1/maven-compiler-plugin-3.5.1.pom (10
KB at 35.3 KB/sec)
Downloading: http://maven.aliyun.com/nexus/content/repositories/central/org/apac
he/maven/plugins/maven-plugins/28/maven-plugins-28.pom
Downloaded: http://maven.aliyun.com/nexus/content/repositories/central/org/apach
e/maven/plugins/maven-plugins/28/maven-plugins-28.pom (12 KB at 42.1 KB/sec)
Downloading: http://maven.aliyun.com/nexus/content/repositories/central/org/apac
he/maven/maven-parent/27/maven-parent-27.pom
Downloaded: http://maven.aliyun.com/nexus/content/repositories/central/org/apach
e/maven/maven-parent/27/maven-parent-27.pom (40 KB at 94.0 KB/sec)
Downloading: http://maven.aliyun.com/nexus/content/repositories/central/org/apac
he/apache/17/apache-17.pom
Downloaded: http://maven.aliyun.com/nexus/content/repositories/central/org/apach

 

 

 

   須要等待一下子,這個根據本身的網速快慢。

 

Downloaded: http://maven.aliyun.com/nexus/content/repositories/central/org/apach
e/maven/maven-archiver/2.4/maven-archiver-2.4.jar (20 KB at 19.8 KB/sec)
Downloading: http://maven.aliyun.com/nexus/content/repositories/central/org/apac
he/maven/shared/maven-repository-builder/1.0-alpha-2/maven-repository-builder-1.
0-alpha-2.jar
Downloaded: http://maven.aliyun.com/nexus/content/repositories/central/org/apach
e/maven/maven-project/2.0.4/maven-project-2.0.4.jar (107 KB at 84.7 KB/sec)
Downloaded: http://maven.aliyun.com/nexus/content/repositories/central/org/codeh
aus/plexus/plexus-utils/2.0.1/plexus-utils-2.0.1.jar (217 KB at 158.7 KB/sec)
Downloaded: http://maven.aliyun.com/nexus/content/repositories/central/org/apach
e/maven/shared/maven-repository-builder/1.0-alpha-2/maven-repository-builder-1.0
-alpha-2.jar (23 KB at 16.4 KB/sec)
Downloaded: http://maven.aliyun.com/nexus/content/repositories/central/org/apach
e/maven/maven-model/2.0.4/maven-model-2.0.4.jar (79 KB at 54.3 KB/sec)
Downloaded: http://maven.aliyun.com/nexus/content/repositories/central/org/apach
e/maven/maven-artifact/2.0.4/maven-artifact-2.0.4.jar (79 KB at 52.9 KB/sec)
[INFO] Reading assembly descriptor: D:\elasticsearch-analysis-ik-2.x/src/main/as
semblies/plugin.xml
[INFO] Building zip: D:\elasticsearch-analysis-ik-2.x\target\releases\elasticsea
rch-analysis-ik-1.10.4.zip
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:22 min
[INFO] Finished at: 2017-02-25T14:48:40+08:00
[INFO] Final Memory: 35M/609M
[INFO] ------------------------------------------------------------------------

D:\elasticsearch-analysis-ik-2.x>

 

 

 

 

   成功,獲得。

 

  這裏,須要本地(即windows系統)裏,提早安裝好maven,須要來編譯。若沒安裝的博友,請移步,見

Eclipse下Maven新建項目、自動打依賴jar包(包含普通項目和Web項目)

  

 

 

 

      最後獲得是,

 

 

  第八步將最後編譯好的,分別上傳到3臺機器裏。$ES_HOME/plugins/ik 目錄下,注意須要新建ik目錄。

[hadoop@HadoopSlave1 elasticsearch-2.4.3]$ pwd
/home/hadoop/app/elasticsearch-2.4.3
[hadoop@HadoopSlave1 elasticsearch-2.4.3]$ ll
total 56
drwxrwxr-x. 2 hadoop hadoop 4096 Feb 22 01:37 bin
drwxrwxr-x. 3 hadoop hadoop 4096 Feb 22 22:43 config
drwxrwxr-x. 3 hadoop hadoop 4096 Feb 22 07:07 data
drwxrwxr-x. 2 hadoop hadoop 4096 Feb 22 01:37 lib
-rw-rw-r--. 1 hadoop hadoop 11358 Aug 24 2016 LICENSE.txt
drwxrwxr-x. 2 hadoop hadoop 4096 Feb 25 05:15 logs
drwxrwxr-x. 5 hadoop hadoop 4096 Dec 8 00:41 modules
-rw-rw-r--. 1 hadoop hadoop 150 Aug 24 2016 NOTICE.txt
drwxrwxr-x. 4 hadoop hadoop 4096 Feb 22 06:02 plugins
-rw-rw-r--. 1 hadoop hadoop 8700 Aug 24 2016 README.textile
[hadoop@HadoopSlave1 elasticsearch-2.4.3]$ cd plugins/
[hadoop@HadoopSlave1 plugins]$ ll
total 8
drwxrwxr-x. 5 hadoop hadoop 4096 Feb 22 06:02 head
drwxrwxr-x. 8 hadoop hadoop 4096 Feb 22 06:02 kopf
[hadoop@HadoopSlave1 plugins]$ mkdir ik
[hadoop@HadoopSlave1 plugins]$ pwd
/home/hadoop/app/elasticsearch-2.4.3/plugins
[hadoop@HadoopSlave1 plugins]$ ll
total 12
drwxrwxr-x. 5 hadoop hadoop 4096 Feb 22 06:02 head
drwxrwxr-x. 2 hadoop hadoop 4096 Feb 25 06:18 ik
drwxrwxr-x. 8 hadoop hadoop 4096 Feb 22 06:02 kopf

[hadoop@HadoopSlave1 plugins]$ cd ik/
[hadoop@HadoopSlave1 ik]$ pwd
/home/hadoop/app/elasticsearch-2.4.3/plugins/ik
[hadoop@HadoopSlave1 ik]$ rz

[hadoop@HadoopSlave1 ik]$ ll
total 4400
-rw-r--r--. 1 hadoop hadoop 4505518 Jan 15 08:59 elasticsearch-analysis-ik-1.10.3.zip
[hadoop@HadoopSlave1 ik]$

 

 

 

  第九步:關閉es服務進程

[hadoop@HadoopSlave1 ik]$ jps
1874 Elasticsearch
2078 Jps
[hadoop@HadoopSlave1 ik]$ kill -9 1874
[hadoop@HadoopSlave1 ik]$ jps
2089 Jps
[hadoop@HadoopSlave1 ik]$

 

 

  第十步:使用unzip命令解壓,若是unzip命令不存在,則安裝:yum install -y unzip。

 

[hadoop@HadoopSlave1 ik]$ unzip elasticsearch-analysis-ik-1.10.3.zip
Archive: elasticsearch-analysis-ik-1.10.3.zip
inflating: elasticsearch-analysis-ik-1.10.3.jar
inflating: httpclient-4.5.2.jar
inflating: httpcore-4.4.4.jar
inflating: commons-logging-1.2.jar
inflating: commons-codec-1.9.jar
inflating: plugin-descriptor.properties
creating: config/
creating: config/custom/
inflating: config/custom/ext_stopword.dic
inflating: config/custom/mydict.dic
inflating: config/custom/single_word.dic
inflating: config/custom/single_word_full.dic
inflating: config/custom/single_word_low_freq.dic
inflating: config/custom/sougou.dic
inflating: config/IKAnalyzer.cfg.xml
inflating: config/main.dic
inflating: config/preposition.dic
inflating: config/quantifier.dic
inflating: config/stopword.dic
inflating: config/suffix.dic
inflating: config/surname.dic
[hadoop@HadoopSlave1 ik]$ ll
total 5828
-rw-r--r--. 1 hadoop hadoop 263965 Dec 1 2015 commons-codec-1.9.jar
-rw-r--r--. 1 hadoop hadoop 61829 Dec 1 2015 commons-logging-1.2.jar
drwxr-xr-x. 3 hadoop hadoop 4096 Jan 1 12:46 config
-rw-r--r--. 1 hadoop hadoop 55998 Jan 1 13:27 elasticsearch-analysis-ik-1.10.3.jar
-rw-r--r--. 1 hadoop hadoop 4505518 Jan 15 08:59 elasticsearch-analysis-ik-1.10.3.zip
-rw-r--r--. 1 hadoop hadoop 736658 Jan 1 13:26 httpclient-4.5.2.jar
-rw-r--r--. 1 hadoop hadoop 326724 Jan 1 13:07 httpcore-4.4.4.jar
-rw-r--r--. 1 hadoop hadoop 2667 Jan 1 13:27 plugin-descriptor.properties

[hadoop@HadoopSlave1 ik]$ 

   

  同理,其餘兩臺也是。

 

 

 

 

  第十一步:重啓三臺機器的es服務進程

 

 

 

 

 

   其實,若想更具體地,看得,es安裝中文分詞器es-ik以後,的變化狀況,直接,在$ES_HOME下,執行bin/elasticsearch。固然,我這裏只是給你展現下而已,仍是用bin/elasticsearch -d在後臺啓動吧!

 

 

 

 

 

 

 

   第十二步:測試,安裝了es中文分詞插件es-ik以後的對中文分詞效果

  ik_max_word方式來分詞測試

[hadoop@HadoopMaster elasticsearch-2.4.3]$ pwd
/home/hadoop/app/elasticsearch-2.4.3
[hadoop@HadoopMaster elasticsearch-2.4.3]$ curl 'http://192.168.80.10:9200/zhouls/_analyze?analyzer=ik_max_word&pretty=true' -d '{"text":"這裏是好記性不如爛筆頭感嘆號的博客園"}'
{
"tokens" : [ {
"token" : "這裏是",
"start_offset" : 0,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 0
}, {
"token" : "這裏",
"start_offset" : 0,
"end_offset" : 2,
"type" : "CN_WORD",
"position" : 1
}, {
"token" : "",
"start_offset" : 1,
"end_offset" : 2,
"type" : "CN_WORD",
"position" : 2
}, {
"token" : "好記",
"start_offset" : 3,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 3
}, {
"token" : "記性",
"start_offset" : 4,
"end_offset" : 6,
"type" : "CN_WORD",

"position" : 4
}, {
"token" : "不如",
"start_offset" : 6,
"end_offset" : 8,
"type" : "CN_WORD",
"position" : 5
}, {
"token" : "",
"start_offset" : 8,
"end_offset" : 9,
"type" : "CN_CHAR",
"position" : 6
}, {
"token" : "筆頭",
"start_offset" : 9,
"end_offset" : 11,
"type" : "CN_WORD",
"position" : 7
}, {
"token" : "",
"start_offset" : 9,
"end_offset" : 10,
"type" : "CN_WORD",
"position" : 8
}, {
"token" : "",
"start_offset" : 10,
"end_offset" : 11,
"type" : "CN_CHAR",

"position" : 9
}, {
"token" : "感嘆號",
"start_offset" : 11,
"end_offset" : 14,
"type" : "CN_WORD",
"position" : 10
}, {
"token" : "感嘆",
"start_offset" : 11,
"end_offset" : 13,
"type" : "CN_WORD",
"position" : 11
}, {
"token" : "歎號",
"start_offset" : 12,
"end_offset" : 14,
"type" : "CN_WORD",
"position" : 12
}, {
"token" : "",
"start_offset" : 12,
"end_offset" : 13,
"type" : "CN_WORD",
"position" : 13
}, {
"token" : "",
"start_offset" : 13,
"end_offset" : 14,
"type" : "CN_CHAR",

"position" : 14
}, {
"token" : "博客園",
"start_offset" : 15,
"end_offset" : 18,
"type" : "CN_WORD",
"position" : 15
}, {
"token" : "博客",
"start_offset" : 15,
"end_offset" : 17,
"type" : "CN_WORD",
"position" : 16
}, {
"token" : "",
"start_offset" : 17,
"end_offset" : 18,
"type" : "CN_CHAR",
"position" : 17
} ]
}
[hadoop@HadoopMaster elasticsearch-2.4.3]$

 

 

 

 

[hadoop@HadoopMaster elasticsearch-2.4.3]$ curl 'http://192.168.80.10:9200/zhouls/_analyze?analyzer=ik_max_word&pretty=true' -d '{"text":"咱們是大數據開發技術人員"}'
{
"tokens" : [ {
"token" : "咱們",
"start_offset" : 0,
"end_offset" : 2,
"type" : "CN_WORD",
"position" : 0
}, {
"token" : "大數",
"start_offset" : 3,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 1
}, {
"token" : "數據",
"start_offset" : 4,
"end_offset" : 6,
"type" : "CN_WORD",
"position" : 2
}, {
"token" : "開發",
"start_offset" : 6,
"end_offset" : 8,
"type" : "CN_WORD",
"position" : 3
}, {
"token" : "",
"start_offset" : 7,
"end_offset" : 8,
"type" : "CN_WORD",
"position" : 4
}, {

"token" : "技術人員",
"start_offset" : 8,
"end_offset" : 12,
"type" : "CN_WORD",
"position" : 5
}, {
"token" : "技術",
"start_offset" : 8,
"end_offset" : 10,
"type" : "CN_WORD",
"position" : 6
}, {
"token" : "人員",
"start_offset" : 10,
"end_offset" : 12,
"type" : "CN_WORD",
"position" : 7
} ]
}
[hadoop@HadoopMaster elasticsearch-2.4.3]$

 

    能夠看出,成功分詞了且效果更好!

 

 

 

   其實,啊,爲何「是」沒有了呢?是由於es的中文分詞器插件es-ik的過濾中止詞的貢獻!請移步,以下

Elasticsearch之IKAnalyzer的過濾中止詞

 

 

 

 

 

 

es官方文檔提供的ik_max_word和ik_smart解釋

      https://github.com/medcl/elasticsearch-analysis-ik/tree/2.x

 

 

 

 

 

ik_smart方式來分詞測試

[hadoop@HadoopMaster elasticsearch-2.4.3]$ curl 'http://192.168.80.10:9200/zhouls/_analyze?analyzer=ik_smart&pretty=true' -d '{"text":"這裏是好記性不如爛筆頭感嘆號的博客園"}'
{
"tokens" : [ {
"token" : "這裏是",
"start_offset" : 0,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 0
}, {
"token" : "好",
"start_offset" : 3,
"end_offset" : 4,
"type" : "CN_CHAR",
"position" : 1
}, {
"token" : "記性",
"start_offset" : 4,
"end_offset" : 6,
"type" : "CN_WORD",
"position" : 2
}, {
"token" : "不如",
"start_offset" : 6,
"end_offset" : 8,
"type" : "CN_WORD",
"position" : 3
}, {
"token" : "爛",
"start_offset" : 8,
"end_offset" : 9,
"type" : "CN_CHAR",
"position" : 4
}, {

 

"token" : "筆頭",
"start_offset" : 9,
"end_offset" : 11,
"type" : "CN_WORD",
"position" : 5
}, {
"token" : "感嘆號",
"start_offset" : 11,
"end_offset" : 14,
"type" : "CN_WORD",
"position" : 6
}, {
"token" : "博客園",
"start_offset" : 15,
"end_offset" : 18,
"type" : "CN_WORD",
"position" : 7
} ]
}
[hadoop@HadoopMaster elasticsearch-2.4.3]$

 

 

 

 

 

 

 

[hadoop@HadoopMaster elasticsearch-2.4.3]$ curl 'http://192.168.80.10:9200/zhouls/_analyze?analyzer=ik_smart&pretty=true' -d '{"text":"咱們是大數據開發技術人員"}'
{
"tokens" : [ {
"token" : "咱們",
"start_offset" : 0,
"end_offset" : 2,
"type" : "CN_WORD",
"position" : 0
}, {
"token" : "大",
"start_offset" : 3,
"end_offset" : 4,
"type" : "CN_CHAR",
"position" : 1
}, {
"token" : "數據",
"start_offset" : 4,
"end_offset" : 6,
"type" : "CN_WORD",
"position" : 2
}, {
"token" : "開發",
"start_offset" : 6,
"end_offset" : 8,
"type" : "CN_WORD",
"position" : 3
}, {
"token" : "技術人員",
"start_offset" : 8,
"end_offset" : 12,
"type" : "CN_WORD",
"position" : 4
} ]

}[hadoop@HadoopMaster elasticsearch-2.4.3]$

相關文章
相關標籤/搜索