lucene_創建索引

時間 2019-11-10

標籤 lucene 創建索引简体版

原文原文鏈接

創建索引

爲了對文檔進行索引，Lucene 提供了五個基礎的類，他們分別是 Document, Field, IndexWriter, Analyzer, Directory。下面咱們分別介紹一下這五個類的用途：數據庫

Documentspa

Document 是用來描述文檔的，這裏的文檔能夠指一個 HTML 頁面，一封電子郵件，或者是一個文本文件。一個 Document 對象由多個 Field 對象組成的。能夠把一個 Document 對象想象成數據庫中的一個記錄，而每一個 Field 對象就是記錄的一個字段。code

Fieldorm

Field 對象是用來描述一個文檔的某個屬性的，好比一封電子郵件的標題和內容能夠用兩個 Field 對象分別描述。對象

Analyzer索引

在一個文檔被索引以前，首先須要對文檔內容進行分詞處理，這部分工做就是由 Analyzer 來作的。Analyzer 類是一個抽象類，它有多個實現。針對不一樣的語言和應用須要選擇適合的 Analyzer。Analyzer 把分詞後的內容交給 IndexWriter 來創建索引。內存

IndexWriter文檔

IndexWriter 是 Lucene 用來建立索引的一個核心的類，他的做用是把一個個的 Document 對象加到索引中來。get

Directoryit

這個類表明了 Lucene 的索引的存儲的位置，這是一個抽象類，它目前有兩個實現，第一個是 FSDirectory，它表示一個存儲在文件系統中的索引的位置。第二個是 RAMDirectory，它表示一個存儲在內存當中的索引的位置。

public class TxtFileIndexer { 
     public static void main(String[] args) throws Exception{ 
     //indexDir is the directory that hosts Lucene's index files 
     File   indexDir = new File("D:\\luceneIndex"); 
     //dataDir is the directory that hosts the text files that to be indexed 
     File   dataDir  = new File("D:\\luceneData"); 
     Analyzer luceneAnalyzer = new StandardAnalyzer(); 
     File[] dataFiles  = dataDir.listFiles(); 
     IndexWriter indexWriter = new IndexWriter(indexDir,luceneAnalyzer,true); 
     long startTime = new Date().getTime(); 
     for(int i = 0; i < dataFiles.length; i++){ 
          if(dataFiles[i].isFile() && dataFiles[i].getName().endsWith(".txt")){
               System.out.println("Indexing file " + dataFiles[i].getCanonicalPath()); 
               Document document = new Document(); 
               Reader txtReader = new FileReader(dataFiles[i]); 
               document.add(Field.Text("path",dataFiles[i].getCanonicalPath())); 
               document.add(Field.Text("contents",txtReader)); 
               indexWriter.addDocument(document); 
          } 
     } 
     indexWriter.optimize(); 
     indexWriter.close(); 
     long endTime = new Date().getTime(); 
        
     System.out.println("It takes " + (endTime - startTime) 
         + " milliseconds to create index for the files in directory "
         + dataDir.getPath());        
     } 
}

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。