本文記錄Lucene+Paoding的使用方法圖解:
1、下載Lucene(官網:http://archive.apache.org/dist/lucene/java/)本文中使用的是:2.9.4,下載後解壓,Lucene所須要的基本jar文件以下列表:
lucene-core-2.9.4.jar Lucene核心jar
lucene-analyzers-2.9.4.jar Lucene分詞jar
lucene-highlighter-2.9.4.jar Lucene高亮顯示jar
paoding-analysis.jar Lucene針對中文分詞須要jar
commons-logging.jar 日誌文件
{PADODING_HOME}/dic 皰丁解牛詞典目錄(PAODING_HOME:表明解壓後的paoding目錄)
3、打開Eclipse並建立一個Java Project(項目名稱和項目所在的路徑不能包含空格),本例中Project Name:Paoding
1_1:在Paoding Project 建立一個Folder--lib(用於存放全部的jar),把前面所說的jar文件拷貝到lib目錄下,並把lib下全部的jar添加到項目ClassPath下.
1_2:拷貝{PAODING_HOME}/dic目錄 至 Paoding項目/src下整個項目結構圖以下:
4、建立TestFileIndex.java類,實現功能是:把d:\data\*.txt全部文件讀入內存中,並寫入索引目錄(d:\luceneindex)下
TestFileIndex.java
package com.lixing.paoding.index;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStreamReader;
import net.paoding.analysis.analyzer.PaodingAnalyzer;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
public
class TestFileIndex {
public
static
void main(String[] args)
throws Exception {
String dataDir=
"d:/data";
String indexDir=
"d:/luceneindex";
File[] files=
new File(dataDir).listFiles();
System.out.println(files.length);
Analyzer analyzer=
new PaodingAnalyzer();
Directory dir=FSDirectory.open(
new File(indexDir));
IndexWriter writer=
new IndexWriter(dir, analyzer, IndexWriter.MaxFieldLength.UNLIMITED);
for(
int i=0;i<files.length;i++){
StringBuffer strBuffer=
new StringBuffer();
String line="";
FileInputStream is=
new FileInputStream(files[i].getCanonicalPath());
BufferedReader reader=
new BufferedReader(
new InputStreamReader(is,
"gb2312"));
line=reader.readLine();
while(line !=
null){
strBuffer.append(line);
strBuffer.append(
"\n");
line=reader.readLine();
}
Document doc=
new Document();
doc.add(
new Field(
"fileName", files[i].getName(), Field.Store.YES, Field.Index.ANALYZED));
doc.add(
new Field(
"contents", strBuffer.toString(), Field.Store.YES, Field.Index.ANALYZED));
writer.addDocument(doc);
reader.close();
is.close();
}
writer.optimize();
writer.close();
dir.close();
System.out.println(
"ok");
}
}
5、建立TestFileSearcher.java,實如今的功能是:讀取索引中的內容:
TestFileSearcerh.java
package com.lixing.paoding.index;
import java.io.File;
import net.paoding.analysis.analyzer.PaodingAnalyzer;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;
public
class TestFileSearcher {
public
static
void main(String[] args)
throws Exception {
String indexDir =
"d:/luceneindex";
Analyzer analyzer =
new PaodingAnalyzer();
Directory dir = FSDirectory.open(
new File(indexDir));
IndexSearcher searcher =
new IndexSearcher(dir,
true);
QueryParser parser =
new QueryParser(Version.LUCENE_29,
"contents",analyzer);
Query query = parser.parse(
"呼救");
//Term term=new Term("fileName", "大學");
//TermQuery query=new TermQuery(term);
TopDocs docs=searcher.search(query, 1000);
ScoreDoc[] hits=docs.scoreDocs;
System.out.println(hits.length);
for(
int i=0;i<hits.length;i++){
Document doc=searcher.doc(hits[i].doc);
System.out.print(doc.get(
"fileName")+
"--:\n");
System.out.println(doc.get(
"contents")+
"\n");
}
searcher.close();
dir.close();
}
}
本文出自 「李新博客」 博客,請務必保留此出處http://kinglixing.blog.51cto.com/3421535/702663java