一、默認的分詞器html
standard 分詞器app
standard tokenizer:以單詞邊界進行切分
standard token filter:什麼都不作
lowercase token filter:將全部字母轉換爲小寫
stop token filer(默認被禁用):移除停用詞,好比a the it等等spa
二、修改分詞器的設置code
啓用english停用詞token filterhtm
PUT /my_index { "settings": { "analysis": { "analyzer": { "es_std": { "type": "standard", "stopwords": "_english_" } } } } } GET /my_index/_analyze { "analyzer": "standard", "text": "a dog is in the house" } GET /my_index/_analyze { "analyzer": "es_std", "text":"a dog is in the house" }
三、定製化本身的分詞器blog
1.&字符轉換token
2.停用某些詞ip
3.大小寫轉換it
PUT /my_index { "settings": { "analysis": { "char_filter": { "&_to_and": { "type": "mapping", "mappings": ["&=> and"] } }, "filter": { "my_stopwords": { "type": "stop", "stopwords": ["the", "a"] } }, "analyzer": { "my_analyzer": { "type": "custom", "char_filter": ["html_strip", "&_to_and"], "tokenizer": "standard", "filter": ["lowercase", "my_stopwords"] } } } } } GET /my_index/_analyze { "text": "tom&jerry are a friend in the house, <a>, HAHA!!", "analyzer": "my_analyzer" } PUT /my_index/_mapping/my_type { "properties": { "content": { "type": "text", "analyzer": "my_analyzer" } } }