Lucene系列（二）luke使用及索引文檔的基本操做

時間 2019-11-09

標籤 lucene 系列 luke 使用索引文檔基本简体版

原文原文鏈接

系列文章：java

Lucene系列（一）快速入門git

Lucene系列（二）luke使用及索引文檔的基本操做github

Lucene系列（三）查詢及高亮面試

luke入門

簡介：

github地址：github.com/DmitryKey/l…apache

下載地址：github.com/DmitryKey/l… 微信

Luke是一個用於Lucene/Solr/Elasticsearch 搜索引擎的，方便開發和診斷的 GUI（可視化）工具。

它有如下功能：elasticsearch

查看文檔並分析其內容（用於存儲字段）
在索引中搜索
執行索引維護：索引運行情況檢查；索引優化（運行前須要備份）
從hdfs讀取索引
將索引或其部分導出爲XML格式
測試定製的Lucene分析工具
建立本身的插件

luke適用的搜索引擎

Apache Lucene. 大多數狀況下，luke能夠打開由純Lucene生成的lucene索引。如今人們作出純粹的Lucene索引嗎？
Apache Solr. Solr和Lucene共享相同的代碼庫，因此luke很天然能夠打開Solr生成的Lucene索引。
Elasticsearch. Elasticsearch使用Lucene做爲其最低級別的搜索引擎基礎。因此luke也能夠打開它的索引！

下載安裝與簡單使用

下載安裝工具

索引文檔的CRUD操做

建立項目並添加Maven依賴

<dependency>
			<groupId>junit</groupId>
			<artifactId>junit</artifactId>
			<version>4.12</version>
			<scope>test</scope>
		</dependency>
		<!-- https://mvnrepository.com/artifact/org.apache.lucene/lucene-core -->
		<!-- Lucene核心庫 -->
		<dependency>
			<groupId>org.apache.lucene</groupId>
			<artifactId>lucene-core</artifactId>
			<version>7.2.1</version>
		</dependency>
		<!-- Lucene解析庫 -->
		<dependency>
			<groupId>org.apache.lucene</groupId>
			<artifactId>lucene-queryparser</artifactId>
			<version>7.2.1</version>
		</dependency>
		<!-- Lucene附加的分析庫 -->
		<dependency>
			<groupId>org.apache.lucene</groupId>
			<artifactId>lucene-analyzers-common</artifactId>
			<version>7.2.1</version>
		</dependency>
複製代碼

咱們下面要用到單元測試，因此這裏咱們添加了Junit單元測試的依賴（版本爲4.12，2018/3/30日最新的版本）post

相關測試代碼

主方法：單元測試

package lucene_index_crud;

import java.nio.file.Paths;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.Term;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.junit.Test;

public class Txt1 {
	// 下面是測試用到的數據
	private String ids[] = { "1", "2", "3" };
	private String citys[] = { "qingdao", "nanjing", "shanghai" };
	private String descs[] = { "Qingdao is a beautiful city.", "Nanjing is a city of culture.",
			"Shanghai is a bustling city." };
    //Directory對象 
	private Directory dir;
}
複製代碼

相關測試方法編寫：

1)測試建立索引

/** * 建立索引 * @throws Exception */
	@Test
	public void testWriteIndex() throws Exception {
		//寫入索引文檔的路徑
		dir = FSDirectory.open(Paths.get("D:\\lucene\\index_crud\\indexdata"));
		IndexWriter writer = getWriter();
		for (int i = 0; i < ids.length; i++) {
			//建立文檔對象，文檔是索引和搜索的單位。
			Document doc = new Document();
			doc.add(new StringField("id", ids[i], Field.Store.YES));
			doc.add(new StringField("city", citys[i], Field.Store.YES));
			doc.add(new TextField("desc", descs[i], Field.Store.NO));
			// 添加文檔
			writer.addDocument(doc); 
		}
		writer.close();
	}
複製代碼

經過luke查看相關信息：

注意： 建立索引以後，後續測試方法才能正確運行。

2)測試寫入了幾個文檔：

/** * 測試寫了幾個文檔 * * @throws Exception */
	@Test
	public void testIndexWriter() throws Exception {
		//寫入索引文檔的路徑
		dir = FSDirectory.open(Paths.get("D:\\lucene\\index_crud\\indexdata"));
		IndexWriter writer = getWriter();
		System.out.println("寫入了" + writer.numDocs() + "個文檔");
		writer.close();
	}
複製代碼

3)測試讀取了幾個文檔：

/** * 測試讀取了幾個文檔 * * @throws Exception */
	@Test
	public void testIndexReader() throws Exception {
		//寫入索引文檔的路徑
		dir = FSDirectory.open(Paths.get("D:\\lucene\\index_crud\\indexdata"));
		IndexReader reader = DirectoryReader.open(dir);
		System.out.println("最大文檔數：" + reader.maxDoc());
		System.out.println("實際文檔數：" + reader.numDocs());
		reader.close();
	}
複製代碼

4)測試刪除在合併前：

/** * 測試刪除 在合併前 * * @throws Exception */
	@Test
	public void testDeleteBeforeMerge() throws Exception {
		//寫入索引文檔的路徑
		dir = FSDirectory.open(Paths.get("D:\\lucene\\index_crud\\indexdata"));
		IndexWriter writer = getWriter();
		System.out.println("刪除前：" + writer.numDocs());
		writer.deleteDocuments(new Term("id", "1"));
		writer.commit();
		System.out.println("writer.maxDoc()：" + writer.maxDoc());
		System.out.println("writer.numDocs()：" + writer.numDocs());
		writer.close();
	}
複製代碼

5)測試刪除在合併後：

咱們這裏先把dataindex目錄下的文件刪除，而後運行上面的testWriteIndex() 方法以後再測試。

/** * 測試刪除 在合併後 * * @throws Exception */
	@Test
	public void testDeleteAfterMerge() throws Exception {
	       //寫入索引文檔的路徑
		dir = FSDirectory.open(Paths.get("D:\\lucene\\index_crud\\indexdata"));
		IndexWriter writer = getWriter();
		System.out.println("刪除前：" + writer.numDocs());
		writer.deleteDocuments(new Term("id", "1"));
		writer.forceMergeDeletes(); // 強制刪除
		writer.commit();
		System.out.println("writer.maxDoc()：" + writer.maxDoc());
		System.out.println("writer.numDocs()：" + writer.numDocs());
		writer.close();
	}
複製代碼

6)測試更新操做：

咱們這裏先把dataindex目錄下的文件刪除，而後運行上面的testWriteIndex() 方法以後再測試。

/** * 測試更新 * * @throws Exception */
	@Test
	public void testUpdate() throws Exception {
		// 寫入索引文檔的路徑
		dir = FSDirectory.open(Paths.get("D:\\lucene\\index_crud\\indexdata"));
		IndexWriter writer = getWriter();
		Document doc = new Document();
		doc.add(new StringField("id", "1", Field.Store.YES));
		doc.add(new StringField("city", "beijing", Field.Store.YES));
		doc.add(new TextField("desc", "beijing is a city.", Field.Store.NO));
		writer.updateDocument(new Term("id", "1"), doc);
		writer.close();
	}
複製代碼

歡迎關注個人微信公衆號（分享各類Java學習資源，面試題，以及企業級Java實戰項目回覆關鍵字免費領取）：

Lucene我想暫時先更新到這裏，僅僅這三篇文章想掌握Lucene是遠遠不夠的。另外我這裏三篇文章都用的最新的jar包，Lucene更新太快，5系列後的版本和以前的有些地方仍是有挺大差距的，就好比爲文檔域設置權值的setBoost方法6.6之後已經被廢除了等等。由於時間有限，因此我就草草的看了一下Lucene的官方文檔，大多數內容仍是看java1234網站的這個視頻來學習的，而後在版本和部分代碼上作了改進。截止2018/4/1，上述代碼所用的jar包皆爲最新。

最後推薦一下本身以爲還不錯的Lucene學習網站/博客：

官方網站：[Welcome to Apache Lucene](Welcome to Apache Lucene)

Github:Apache Lucene and Solr

Lucene專欄

搜索系統18：lucene索引文件結構

Lucene6.6的介紹和使用