爲coreseek添加mmseg分詞

時間 2019-11-20

原文原文鏈接

因爲在工做中遇到了林書豪這個詞，原來的分詞庫裏面沒有就找了下文檔，本身整理了一份，以交流備忘。 spa

1.準備好須要添加的詞表，通常都是每行一詞，注意要保存爲utf-8；
    例如：
    --
    林書豪
    --
2.利用UltraEdit的查找替換功能，使詞表格式符合mmseg的要求；
    例如：
    打開UltraEdit的正則替換功能，將「^p」替換爲「^t1^px:1^p」
    結果是：
    --
    林書豪[tab]1
    x:1
    -- 索引

其餘的也行
3.將生成的符合格式要求的詞表粘貼到原詞表unigram.txt末尾，保存爲unigram_new.txt，並拷貝到mmseg所在的目錄下；

4.生成新的uni utf-8

/usr/local/mmseg3/bin/mmseg -u /usr/local/mmseg3/etc/unigram_new.txt 文檔

    就會生成新的詞典文件unigram_new.txt.uni

5.將新的unigram_new.txt.uni 替換原有的uni.lib
    mv /usr/local/mmseg3/etc/unigram_new.txt.uni /usr/local/mmseg3/etc/uni.lib

6.從新創建索引庫、重啓searchd
     /usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/c.conf --all --pidfile --rotate
    關閉searchd
      ps auxww | grep searchd
      kill 923230
    啓動searchd
    /usr/local/coreseek/bin/searchd -c /usr/local/coreseek/etc/c.conf --console --pidfile

    搜索試下就出來了。 get

注意：須要重啓searchd 博客

小弟我的博客http://www.iczerd.com,歡迎交流,友鏈 it

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。