上一篇文章提到過方法,本文單獨拿出來做爲一個主題。java
架構以下:redis
這裏ansj分詞器爲了支持動態添加詞彙,使用了Redis組件。架構
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~app
首先要明白動態支持意味着:elasticsearch
1)內存中支持動態增長/刪除ide
2)文件中支持動態增長/刪除測試
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~spa
先解決第2個問題:文件動態支持日誌
從AddTermRedisPubSub 類中知道文件支持是由FileUtils類支持的。code
FiltUtils添加以下兩個方法:
public static void appendStopWord(String content) { try { File file = new File( AnsjElasticConfigurator.environment.configFile(), AnsjElasticConfigurator.DEFAULT_STOP_FILE_LIB_PATH); // "ansj/stopLibrary.dic"); appendFile(content, file); } catch (IOException e) { logger.error("read exception", e, new Object[0]); e.printStackTrace(); } } public static void removeStopWord(String content) { try { File file = new File( AnsjElasticConfigurator.environment.configFile(), AnsjElasticConfigurator.DEFAULT_STOP_FILE_LIB_PATH); // "ansj/stopLibrary.dic"); removeFile(content, file, false); } catch (FileNotFoundException e) { logger.error("file not found $ES_HOME/config/ansj/stopLibrary.dic"); e.printStackTrace(); } catch (IOException e) { logger.error("read exception", e, new Object[0]); e.printStackTrace(); } }
測試過程當中發現:添加一個停詞,會打出一些沒必要要的日誌:
[2014-06-16 11:59:13,847][INFO ][ansj-redis-msg-file ] match is true text iswill [2014-06-16 11:59:13,847][INFO ][ansj-redis-msg-file ] match is true text iswith [2014-06-16 11:59:13,847][INFO ][ansj-redis-msg-file ] match is true text iswithin [2014-06-16 11:59:13,847][INFO ][ansj-redis-msg-file ] match is true text iswithout
因而將FileUtils類的removeFile方法的
logger.info("match is {} text is{}", new Object[] { Boolean.valueOf(match(content, text, head)), text });
註釋掉便可。
AddTermRedisPubSub類添加:
else if ("stop".equals(msg[0])) { if ("c".equals(msg[1])) { // add one stopWord into memory AnsjElasticConfigurator.filter.add(msg[2]); // add one stopWord into file FileUtils.appendStopWord(msg[2]); } else if ("d".equals(msg[1])) { // remove one stopWord from memory AnsjElasticConfigurator.filter.remove(msg[2]); // remove one stopWod from file FileUtils.removeStopWord(msg[2]); } }
最後就是stopLibrary.dic的最後一行要添加一個換行符,不然後面添加的單詞會跟原先最後一個單詞位於同一行。
這樣,就完成了動態支持redis添加停詞的功能。
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
下面介紹ansj如何添加同義詞功能!
在Lucene4.6中經過lucene-analyzers-common-4.6.1.jar內的SynonymFilterFactory實現中文同義詞很是方便,
只需幾行代碼和一個同義詞詞典。
~~~~~~~~~~~~~~~~~~~
首先,修改啓動類:AnsjElasticConfigurator
public static SynonymFilterFactory factory = null; public static String DEFAULT_SYNONYM_FILE_LIB_PATH = "ansj/synonyms.dic"; public static void loadSynonymFilter(Settings settings) { Version ver = Version.LUCENE_46; Map<String, String> filterArgs = new HashMap<String, String>(); filterArgs.put("luceneMatchVersion", ver.toString()); File path = new File(environment.configFile(), settings.get("synonyms", DEFAULT_SYNONYM_FILE_LIB_PATH)); filterArgs.put("synonyms", path.getAbsolutePath()); logger.info("synonyms.dict absolute path: " + path.getAbsolutePath()); filterArgs.put("expand", "true"); factory = new SynonymFilterFactory(filterArgs); try { factory.inform(new FilesystemResourceLoader()); } catch (Exception e) { // Exception happens here! logger.info("load ansj/synonyms.dic fail,detail is as follows:" + e.toString()); } } public static void init(Settings indexSettings, Settings settings) { if (isLoaded()) { return; } environment = new Environment(indexSettings); initConfigPath(settings); loadFilter(settings); loadSynonymFilter(settings); try { preheat(); logger.info("ansj preheat done! It can be used now!"); } catch (Exception e) { logger.error("ansj preheat fail,please check file path."); } initRedis(settings); setLoaded(true); }
編譯成功。
將編譯好的2個class文件放入到elasticsearch-analysis-ansj-0.2.jar中,替換相應的文件便可。
緊接着修改:AnsjIndexAnalysis.java
@Override protected TokenStreamComponents createComponents(String fieldName, final Reader reader) { // TODO Auto-generated method stub Tokenizer tokenizer = new AnsjTokenizer(new IndexAnalysis( new BufferedReader(reader)), reader, filter, pstemming); return new TokenStreamComponents(tokenizer, AnsjElasticConfigurator.factory.create(tokenizer)); }
AnsjAnalysis.java
@Override protected TokenStreamComponents createComponents(String fieldName, final Reader reader) { // TODO Auto-generated method stub Tokenizer tokenizer = new AnsjTokenizer(new ToAnalysis( new BufferedReader(reader)), reader, filter, pstemming); // add by smallblack return new TokenStreamComponents(tokenizer, AnsjElasticConfigurator.factory.create(tokenizer)); }
編譯成功後放入ansj_lucene4_plug-1.3.jar,替換相應文件便可。
而後啓動es以前務必在ansj下添加synonyms.dic文件。
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~可是目前只是靜態支持,咱們但願動態支持。
先修改FileUtils.java文件
public static void appendSynonymWord(String content) { try { File file = new File( AnsjElasticConfigurator.environment.configFile(), AnsjElasticConfigurator.DEFAULT_SYNONYM_FILE_LIB_PATH); // "ansj/stopLibrary.dic"); appendFile(content, file); } catch (IOException e) { logger.error("read ansj/synonyms.dic exception", e, new Object[0]); e.printStackTrace(); } } public static void removeSynonymWord(String content) { try { File file = new File( AnsjElasticConfigurator.environment.configFile(), AnsjElasticConfigurator.DEFAULT_SYNONYM_FILE_LIB_PATH); // "ansj/stopLibrary.dic"); removeFile(content, file, false); } catch (FileNotFoundException e) { logger.error("file not found $ES_HOME/config/ansj/synonyms.dic"); e.printStackTrace(); } catch (IOException e) { logger.error("read exception", e, new Object[0]); e.printStackTrace(); } }
而後修改AddTermRedisPubSub.java文件
} else if ("stop".equals(msg[0])) { if ("c".equals(msg[1])) { AnsjElasticConfigurator.filter.add(msg[2]); FileUtils.appendStopWord(msg[2]); } else if ("d".equals(msg[1])) { AnsjElasticConfigurator.filter.remove(msg[2]); FileUtils.removeStopWord(msg[2]); } } else if ("syn".equals(msg[0])) { if ("c".equals(msg[1])) { FileUtils.appendSynonymWord(msg[2]); } else if ("d".equals(msg[1])) { FileUtils.removeSynonymWord(msg[2]); } AnsjElasticConfigurator.factory .inform(new FilesystemResourceLoader()); }
編譯,加入到elasticsearch-analysis-ansj-0.2.jar.
測試結果:
而後添加同義詞
再查看效果:
再嘗試下同義詞的動態刪除
再查看分詞效果
又回來了。
任務解決!