這是我參與8月更文挑戰的第8天,活動詳情查看:8月更文挑戰html
若是❤️個人文章有幫助,歡迎點贊、關注。這是對我繼續技術創做最大的鼓勵。更多往期文章在個人我的專欄git
分詞器是專門處理分詞的組件,由三部分組成github
analyzer 分析器:正則表達式
GET _analyze
{
"analyzer": "standard",
"text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
}
=================== 結果 V ===================
{
"tokens" : [
{
"token" : "2",
"start_offset" : 0,
"end_offset" : 1,
"type" : "<NUM>",
"position" : 0
},
{
"token" : "running",
"start_offset" : 2,
"end_offset" : 9,
"type" : "<ALPHANUM>",
"position" : 1
},
......
{
"token" : "evening",
"start_offset" : 62,
"end_offset" : 69,
"type" : "<ALPHANUM>",
"position" : 12
}
]
}
複製代碼
GET _analyze
{
"analyzer": "stop",
"text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
}
=================== 結果 V ===================
{
"tokens" : [
{
"token" : "running",
"start_offset" : 2,
"end_offset" : 9,
"type" : "word",
"position" : 0
},
{
"token" : "quick",
"start_offset" : 10,
"end_offset" : 15,
"type" : "word",
"position" : 1
},
......
{
"token" : "evening",
"start_offset" : 62,
"end_offset" : 69,
"type" : "word",
"position" : 11
}
]
}
複製代碼
#simpe
GET _analyze
{
"analyzer": "simple",
"text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
}
GET _analyze
{
"analyzer": "stop",
"text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
}
#stop
GET _analyze
{
"analyzer": "whitespace",
"text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
}
#keyword
GET _analyze
{
"analyzer": "keyword",
"text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
}
GET _analyze
{
"analyzer": "pattern",
"text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
}
#english
GET _analyze
{
"analyzer": "english",
"text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
}
POST _analyze
{
"analyzer": "icu_analyzer",
"text": "他說的確實在理」"
}
POST _analyze
{
"analyzer": "standard",
"text": "他說的確實在理」"
}
POST _analyze
{
"analyzer": "icu_analyzer",
"text": "這個蘋果不大好吃"
}
複製代碼
須要注意的是,
icu_analyzer
分析器; 包括ik
分析器; 並不是 Elasticsearch 7.8.0 自帶分析器.
須要執行命令:./bin/elasticsearch-plugin install analysis-icu
自行安裝並重啓 elasticsearch 才能使用bash
支持自定義詞庫,支持熱更新分詞 gitee.com/mirrors/ela…markdown
清華大學天然語言處理和社會人文計算實驗室的一套中文分詞器 gitee.com/puremilk/TH…app