lucene基礎（1）

時間 2019-11-11

標籤 lucene 基礎简体版

原文原文鏈接

lucene使用步驟分爲兩步，一是創建索引，二是搜索索引文檔；html

1、創建索引java

先了解必須的五個基礎類；mysql

一、Document:由過個Field組成。至關於數據庫的一條記錄，Field對象至關於記錄的字段；sql

二、Field:用來描述文檔的某個屬性；數據庫

選項	描述
Field.Store.Yes	用於存儲字段值。適用於顯示搜索結果的字段 — 例如，文件路徑和 URL。
Field.Store.No	沒有存儲字段值 — 例如，電子郵件消息正文。
Field.Index.No	適用於未搜索的字段 — 僅用於存儲字段，好比文件路徑。
Field.Index.ANALYZED	用於字段索引和分析 — 例如，電子郵件消息正文和標題。
Field.Index.NOT_ANALYZED	用於編制索引但不分析的字段。它在總體中保留字段的原值 — 例如，日期和我的名稱。

三、Analyzer:在被索引前，文檔須要對內容進行分詞處理。分詞後交由IndexWriter創建索引；apache

四、IndexWriter:做用是把Document加到索引中；spa

五、Directory:表明索引存儲的位置；code

如下是基於lucene 6.2的demohtm

/**
     * 對文件創建索引
     * @throws IOException
     */
    @Test
    public void createFileIndexTest() throws IOException {
        File fileDir = new File("E:\\doc\\mysql_");
        //File indexDir = new File("E:\\luceneIndex"); //索引文件路徑
        Directory indexDir = FSDirectory.open(Paths.get("E:\\luceneIndex"));//lucene6.0
        Analyzer luceneAnalyzer = new StandardAnalyzer();
        File[] dataFiles = fileDir.listFiles();
        IndexWriter indexWriter = new IndexWriter(indexDir, indexWriterConfig);
        long start = System.currentTimeMillis();
        for(int i =0 ; i < dataFiles.length; i++) {
            if(dataFiles[i].isFile() && dataFiles[i].getName().endsWith(".txt")) {
                System.out.println("Indexing file " + dataFiles[i].getCanonicalPath());
                Document document = new Document();
                Reader reader = new FileReader(dataFiles[i]);
                document.add(new StringField("path", dataFiles[i].getCanonicalPath(),Field.Store.YES));
                document.add(new TextField("content", reader));

                indexWriter.addDocument(document);
            }
        }
        indexWriter.close();
        System.out.println("It takes " + (System.currentTimeMillis() - start)
                + " milliseconds to create index for the files in directory "
                + fileDir.getPath());
    }

2、搜索文檔對象

先了解5個基礎類

一、Query:有多種實現，目標是把用戶輸入封裝成lucene能識別的query;

二、Term:是搜索的基本單位；如: new Term("field","queryStr");第一個參數是表明哪一個Field,第二個參數是查詢關鍵字；

三、TermQuery:Query的具體實現；

四、IndexSearcher:用來在文檔上搜索，只讀方式；

五、Hits:用來保存搜索的結果(6.0以上是TopDocs);

如下是基於lucene6.2的demo

/**
     * 搜索文件內容
     * @throws IOException
     */
    @Test
    public void searchFileIndexTest() throws IOException {
        String queryStr = "select";
        Directory indexDir = FSDirectory.open(Paths.get("E:\\luceneIndex"));
        IndexSearcher searcher = new IndexSearcher(DirectoryReader.open(indexDir));//lucene 6.0
        Term term = new Term("content",queryStr.toLowerCase());
        TermQuery luceneQuery = new TermQuery(term);
        TopDocs docs = searcher.search(luceneQuery,10); //>lucene 6.0
        ScoreDoc[] scoreDocs = docs.scoreDocs;
        for (ScoreDoc scoreDoc : scoreDocs) {
            System.out.println(searcher.doc(scoreDoc.doc));
        }
    }

參考：

https://www.ibm.com/developerworks/cn/java/j-lo-lucene1/

http://codepub.cn/2016/05/20/Lucene-6-0-in-action-2-All-kinds-of-Field-and-sort-operations/

http://www.ibm.com/developerworks/cn/opensource/os-apache-lucenesearch/index.html