刪除以前的實驗索引git
curl -XDELETE http://127.0.0.1:9200/synctest/article
output:
{"acknowledged":true}
複製代碼
建立新mappinggithub
curl -XPUT 'http://127.0.0.1:9200/servcie/_mapping/massage' -d ' { "massage":{ "properties":{ "location":{ "type":"geo_point" }, "name":{ "type":"string" }, "age":{ "type":"integer" }, "address":{ "type":"string" }, "price":{ "type":"double", "index":"not_analyzed" }, "is_open":{ "type":"boolean" } } } }'
複製代碼
查看新建立的mappingbash
curl -XGET http://127.0.0.1:9200/servcie/massage/_mapping?pretty
{
"servcie" : {
"mappings" : {
"massage" : {
"properties" : {
"address" : {
"type" : "string"
},
"age" : {
"type" : "integer"
},
"is_open" : {
"type" : "boolean"
},
"location" : {
"type" : "geo_point"
},
"name" : {
"type" : "string"
},
"price" : {
"type" : "double"
}
}
}
}
}
}
複製代碼
curl -XPOST 'http://127.0.0.1:9200/_analyze?pretty' -d '{"text":"波多菠蘿蜜"}'
{
"tokens" : [ {
"token" : "波",
"start_offset" : 0,
"end_offset" : 1,
"type" : "<IDEOGRAPHIC>",
"position" : 0
}, {
"token" : "多",
"start_offset" : 1,
"end_offset" : 2,
"type" : "<IDEOGRAPHIC>",
"position" : 1
}, {
"token" : "菠",
"start_offset" : 2,
"end_offset" : 3,
"type" : "<IDEOGRAPHIC>",
"position" : 2
}, {
"token" : "蘿",
"start_offset" : 3,
"end_offset" : 4,
"type" : "<IDEOGRAPHIC>",
"position" : 3
}, {
"token" : "蜜",
"start_offset" : 4,
"end_offset" : 5,
"type" : "<IDEOGRAPHIC>",
"position" : 4
} ]
}
複製代碼
分詞器 是由一個分解器(tokenizer)、零個或多個詞元過濾器(token filters)組成app
curl -XPOST 'http://127.0.0.1:9200/_analyze?pretty' -d '{"text":"abc dsf,sdsf"}'
複製代碼
若是使用中文檢索,還必須使用中文分詞,平時使用最多的可能就要屬IK分詞器了。curl
./bin/plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v1.9.3/elasticsearch-analysis-ik-1.9.3.zip
複製代碼
重啓後查看插件(是否加載成功)elasticsearch
curl -XGET http://localhost:9200/_cat/plugins
Marrow analysis-ik 1.9.3 j
複製代碼
使用ik分詞分析測試
curl -XPOST 'http://127.0.0.1:9200/_analyze?pretty' -d '{"analyzer":"ik","text":"波多菠蘿蜜"}'
{
"tokens" : [ {
"token" : "波",
"start_offset" : 0,
"end_offset" : 1,
"type" : "CN_WORD",
"position" : 0
}, {
"token" : "多",
"start_offset" : 1,
"end_offset" : 2,
"type" : "CN_CHAR",
"position" : 1
}, {
"token" : "菠蘿蜜",
"start_offset" : 2,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 2
}, {
"token" : "菠蘿",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 3
}, {
"token" : "菠",
"start_offset" : 2,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 4
}, {
"token" : "蘿",
"start_offset" : 3,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 5
}, {
"token" : "蜜",
"start_offset" : 4,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 6
} ]
}
複製代碼
能夠看到已經多 菠蘿、菠蘿蜜進行了分詞url
隨着社會發展和不一樣的業務術語, 有些新的詞彙,並無收錄到咱們的IK分詞器, 即便使用match_pharse等查詢也存在檢索不到數據狀況,那咱們該怎麼辦呢?spa
舉個例子, 好比咱們但願能檢索出 「吊炸天」 這個詞(1.9.3版本的IK並無被收錄)插件
curl -XPOST 'http://127.0.0.1:9200/_analyze?pretty' -d '{"analyzer":"ik","text":"吊炸每天不容"}'
{
"tokens" : [ {
"token" : "吊",
"start_offset" : 0,
"end_offset" : 1,
"type" : "CN_WORD",
"position" : 0
}, {
"token" : "炸",
"start_offset" : 1,
"end_offset" : 2,
"type" : "CN_CHAR",
"position" : 1
}, {
"token" : "每天",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 2
}, {
"token" : "不容",
"start_offset" : 4,
"end_offset" : 6,
"type" : "CN_WORD",
"position" : 3
} ]
}
複製代碼
若是必須的話, 這個時候咱們就須要 修改IK的詞庫了
咱們 修改analysis-ik/config/ik/custom 下 mydict.dic 文件, 這個文件是專門爲咱們拓展詞彙準備的, 再最後面添加好新詞後保存並重啓es便可
curl -XPOST 'http://127.0.0.1:9200/_analyze?pretty' -d '{"analyzer":"ik","text":"吊炸每天不容"}'
{
"tokens" : [ {
"token" : "吊炸天",
"start_offset" : 0,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 0
}, {
"token" : "吊",
"start_offset" : 0,
"end_offset" : 1,
"type" : "CN_WORD",
"position" : 1
}, {
"token" : "炸",
"start_offset" : 1,
"end_offset" : 2,
"type" : "CN_CHAR",
"position" : 2
}, {
"token" : "每天",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 3
}, {
"token" : "不容",
"start_offset" : 4,
"end_offset" : 6,
"type" : "CN_WORD",
"position" : 4
} ]
}
複製代碼
咱們能夠看到已經對「吊炸天」進行了單獨的分詞.