lucene拼寫檢查模塊

時間 2019-11-13

標籤 lucene 拼寫檢查模塊简体版

原文原文鏈接

Lucene是Apache發佈的開源搜索引擎開發工具包，不只提供了核心的搜索功能，還提供了許多其餘功能插件，例如：拼寫檢查功能模塊。apache

搜索拼寫檢查模塊實現類在lucene-suggest-x.xx.x.jar包中，package名爲org.apache.lucene.search.spell，其中拼寫檢查功能的核心實現有3個類，工具

分別爲：SpellChecker、DirectSpellChecker、WordBreakSpellChecker;源碼分析

3個類提供了不一樣的拼寫檢查方式，區別以下：開發工具

SpellChecker：提供了原始的拼寫檢查功能，在拼寫檢查前須要從新創建索引（根據txt字典文件創建索引或者已有索引文件的某個字段創建索引），而後才能夠進行拼寫檢查；網站

SpellChecker源碼分析查看以下網站：http://www.tuicool.com/articles/naIBjmui

DirectSpellChecker：提供了改進的拼寫檢查功能，能夠直接利用已有索引文件進行拼寫檢查，不須要從新創建索引（solr系統默認採用此種方式進行拼寫檢查）；搜索引擎

WordBreakSpellChecker：也不須要從新建索引，能夠利用已有索引進行拼寫檢查。spa

SpellChecker使用：插件

創建索引有三種方式：code

PlainTextDictionary：用txt文件初始化索引

LuceneDictionary：用現有索引的某一個字段初始化索引

HighFrequencyDictionary：用現有索引的某個字段初始化索引，但每一個索引條目必須知足必定的出現率

 1 //新索引目錄
 2 String spellIndexPath = 「D:\\newPath」；
 3 //已有索引目錄
 4 String oriIndexPath = "D:\\oriPath";
 5 //字典文件
 6 String dicFilePath = 「D:\\txt\\dic.txt」；
 7 
 8 //目錄
 9 Directory directory = FSDirectory.open((new File(spellIndexPath)).toPath());
10 
11 SpellChecker spellChecker = new SpellChecker(directory);
12 
13 //如下幾步用來初始化索引
14 IndexReader reader = DirectoryReader.open(FSDirectory.open((new File(oriIndexPath)).toPath()));
15 //利用已有索引
16 Dictionary dictionary = new LuceneDictionary(reader, fieldName);
17 //或者利用txt字典文件
18 //Dictionary dictionary = new PlainTextDictionary((new File(dicFilePath)).toPath());
19 IndexWriterConfig config = new IndexWriterConfig(new StandardAnalyzer());
20 spellChecker.indexDictionary(dictionary, config, true);
21 
22 String queryWord = "beijink";
23 int numSug = 10;
24 //拼寫檢查
25 String[] suggestions = spellChecker.suggestSimilar(queryWord, numSug);
26 
27 reader.close();
28 spellChecker.close();
29 directory.close();

DirectSpellChecker使用：

1 DirectSpellChecker checker = new DirectSpellChecker();
2 String readerPath = "D:\\path";
3 IndexReader reader = DirectoryReader.open(FSDirectory.open(
4                     (new File(readerPath)).toPath()));
5 Term term = new Term("fieldname", "querytext");
6 int numSug = 10;
7 SuggestWord[] suggestions = checker.suggestSimilar(term, numSug, reader);

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。