Lucene快速開發

時間 2019-12-06

標籤 lucene 快速開發简体版

原文原文鏈接

1.環境配置

導入jar包

lucene-4.10.4

1.analysis數據庫

2.core數組

3.highlighter數據結構

4.queryparser搜索引擎

2.建立索引

2.1建立文檔對象

//建立文檔對象
	Document doc = new Document();

	//添加模擬數據
	//添加id
	doc.add(new StringField("id", "11", Store.NO));

	//添加title
	doc.add(new TextField("title", "三分鐘學會lucene",Store.YES));

	//添加context
	doc.add(new TextField("content", "倒排索引（也稱爲倒排文件）是一種存儲了來自文本"
		+ "中的映射的索引數據結構。好比單詞或者數字，對應到它們在數據庫、一個文件或"
		+ "者一組文件中的位置。它是在文檔檢索系統中使用的最流行的數據結構，在搜索引擎"
		+ "中有大規模使用案例例如咱們使用新華字典查詢漢字，新華字典有偏旁部首的目錄（索引），"
		+ "咱們查字首先查這個目錄，找到這個目錄中對應的偏旁部首，就能夠經過這個目錄中的"
		+ "偏旁部首找到這個字所在的位置（文檔）。",Store.NO));

真實開發中來自文檔，數據庫，網頁

場景1：搜索word文檔code

讀取文檔，將數據變成文檔對象，解析成單詞，存儲索引庫，搜索索引庫數據，搜索到文檔

場景2：查詢數據庫對象

把數據變成文檔對象，解析成單詞，放入索引庫

場景3：爬蟲爬取網頁索引

解析網頁，把網頁數據變成文檔對象，索引放入庫

2.2建立創建索引的對象

//Lucene管理索引存儲空間
	FSDirectory directory = FSDirectory.open(new File("E:\\Java\\TEMP"));
	//建立分詞器
	Analyzer analyzer = new IKAnalyzer();
	//建立索引庫，引入核心配置文件
	IndexWriterConfig writerConfig = new IndexWriterConfig(Version.LUCENE_4_10_3, analyzer);
	//建立索引庫，引入核心文件
	IndexWriter indexWriter = new IndexWriter(directory, writerConfig);

建立索引庫寫入對象核心配置對象資源

參數1：指定使用lucene版本開發

參數2：指定建立索引庫使用分詞器文檔

2.3寫入索引庫，提交，並關閉資源

//寫入索引庫
	indexWriter.addDocument(doc);
	//提交
	indexWriter.commit();
	//關閉資源
	indexWriter.close();

3.查詢索引庫

3.1建立查詢索引庫核心對象

//指定索引庫存儲位置
	File file = new File("E:\\Java\\TEMP");
	//讀取索引庫索引
	DirectoryReader directoryReader = DirectoryReader.open(FSDirectory.open(file));
	//建立查詢索引庫核心對象
	IndexSearcher searcher = new IndexSearcher(directoryReader);

3.2建立查詢解析器，解析查詢關鍵詞

//指定關鍵詞
	String key = new String("三分鐘");
	//建立查詢解析器，解析查詢關鍵字
	QueryParser queryParser = new QueryParser("title",new IKAnalyzer());
	//對關鍵詞分詞
	Query parse = queryParser.parse(key);

QueryParser

參數1：指定查詢字段

參數2：使用索引建立時的分詞器

3.3使用indexSearcher查詢

//查詢
	TopDocs topDocs = searcher.search(parse, 10);
	//得到文檔總記錄數
	int totalHits = topDocs.totalHits;
	System.out.println("文檔總記錄數："+totalHits);
	// 得到文檔id，得分數組
	ScoreDoc[] scoreDocs = topDocs.scoreDocs;

返回文檔概要信息

TopDocs：文檔總記錄數，文檔id，文檔得分

返回得分最高的10條記錄

匹配度越高，得分越高

獲取查詢文檔總記錄數

獲取文檔id，文檔得分數組

3.4循環數組得到單個文檔id，得到文檔對象

//遍歷數組
	for (ScoreDoc scoreDoc : scoreDocs) {
		//得到文檔id
		int docId = scoreDoc.doc;
		System.out.println("文檔ID："+docId);
		//得到文檔得分
		float score = scoreDoc.score;
		System.out.println("文檔得分："+score);
		//根據id查詢文檔對象
		Document doc = searcher.doc(docId);
		//得到文檔對象id
		String id = doc.get("id");
		System.out.println("文檔域id："+id);
		//得到文檔對象title
		String title = doc.get("title");
		System.out.println("文檔域title："+title);
		//得到文檔對象content
		String content = doc.get("content");
		System.out.println("文檔域content："+content);
	}

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。