lucene

1 全文檢索工具,方便實現全文檢索功能。服務器

2 全文檢索, 先對要搜索的文檔進行分詞,造成索引,根據索引經行檢索。ide

3 全文檢索流程工具

       索引流程:採集數據, 處理數據,建立索引post

       搜索流程:輸入查詢條件,Lucene查詢器查詢索引, 索引庫取出結果spa

IndexWriter是索引過程的核心組件,經過IndexWriter能夠建立新索引、更新索引、刪除索引操做。IndexWriter須要經過Directory對索引進行存儲操做。code

Directory描述了索引的存儲位置,底層封裝了I/O操做,負責對索引進行存儲。它是一個抽象類,它的子類經常使用的包括FSDirectory(在文件系統存儲索引)、RAMDirectory(在內存存儲索引)。xml

public class IndexManager {

    @Test
    public void createIndex() throws Exception {
        BookDao bookDao = new BookDaoImpl();
        List<Book> books = bookDao.queryBooks();
        List<Document> documents = new ArrayList<>();
        
        Document document = null;
        for (Book book : books) {
            document = new Document();
            Field id = new TextField("id", book.getId().toString(), Store.YES);
            Field name = new TextField("name", book.getName(), Store.YES);
            Field price = new TextField("price", book.getPrice().toString(), Store.YES);
            Field detail = new TextField("detail", book.getDetail(), Store.YES);
            document.add(id);
            document.add(name);
            document.add(price);
            document.add(detail);
            documents.add(document);
        }
        
        Analyzer analyzer = new StandardAnalyzer();
        IndexWriter indexWriter = null;
        IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_4_10_3, analyzer);
        Directory directory = FSDirectory.open(new File("E:\\index\\"));
        indexWriter = new IndexWriter(directory, config);
        
        for (Document d : documents) {
            indexWriter.addDocument(d);
        }
        indexWriter.close();
    }
}
View Code

 

5 搜索輸入語法    and or  not  大寫blog

    public void indexSearch() throws Exception {
        QueryParser queryParser = new QueryParser("detail", new StandardAnalyzer());
        Query query = queryParser.parse("detail:好 AND 大");
        Directory directory = FSDirectory.open(new File("E:\\index\\"));
        IndexReader indexReader = DirectoryReader.open(directory);
        IndexSearcher searcher = new IndexSearcher(indexReader);
        TopDocs docs = searcher.search(query, 10);
        ScoreDoc[] scoreDocs = docs.scoreDocs;
        for (ScoreDoc scoreDoc : scoreDocs) {
            int docId = scoreDoc.doc;
            Document document = searcher.doc(docId);
            System.out.println(document.get("id"));
            System.out.println(document.get("name"));
            System.out.println(document.get("detail"));
        }
        indexReader.close();
    }
View Code

5 field 屬性排序

         1 是否分詞  tokenized 分詞爲了索引,(商品名稱,描述,價格),不分詞也能夠索引(商品id)索引

          2 是否索引ndexed

         3 是否存儲 stored 是否將field存到文檔域中,存儲目的顯示。 名稱,價格,id,圖片地址

@Test
    public void createIndex() throws Exception {
        BookDao bookDao = new BookDaoImpl();
        List<Book> books = bookDao.queryBooks();
        List<Document> documents = new ArrayList<>();
        
        Document document = null;
        for (Book book : books) {
            document = new Document();
            Field id = new StringField("id", book.getId().toString(), Store.YES);
            Field name = new TextField("name", book.getName(), Store.YES);
            Field price = new FloatField("price", book.getPrice(), Store.YES);
            Field detail = new TextField("detail", book.getDetail(), Store.NO);
            document.add(id);
            document.add(name);
            document.add(price);
            document.add(detail);
            documents.add(document);
        }
        
        Analyzer analyzer = new StandardAnalyzer();
        IndexWriter indexWriter = null;
        IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_4_10_3, analyzer);
        Directory directory = FSDirectory.open(new File("E:\\index\\"));
        indexWriter = new IndexWriter(directory, config);
        
        for (Document d : documents) {
            indexWriter.addDocument(d);
        }
        indexWriter.close();
    }
}
View Code

   6 修改索引

    @Test
    public void updateIndex() throws Exception {
        Analyzer analyzer = new StandardAnalyzer();
        IndexWriter indexWriter = null;
        IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_4_10_3, analyzer);
        Directory directory = FSDirectory.open(new File("E:\\index\\"));
        indexWriter = new IndexWriter(directory, config);
        Document document = new Document();
        document.add(new TextField("name", "fdrr", Store.YES));
        indexWriter.updateDocument(new Term("name", "fddd"), document);
        indexWriter.close();
    }
View Code

 7 相關度排序

        就是查詢關鍵字和查詢結構的匹配相關度,匹配度越高越靠前,經過打分經行排序

        打分兩個步驟:1 計算詞的權重  2 根據權重打分

       詞的權重:詞就是term , 一個term對一個文檔的重要性就是權重

       影響詞的權重     1 tf 詞在同一個文檔出現頻率,tf越高詞的權重越高

                                   2 df 詞在多個文檔出現頻率,tf越高詞的權重越低

    8   設置boost值影響打分。

         boost 加權值 默認。1.0f     能夠在建立索引時,也能夠在查詢時。

        在MultiFieldQueryParser建立時設置boost值。

 

 solr

  1 基於Lucene的全文檢索服務器,

       索引: solr客戶端向solr服務器發送post請求,請求內容包括field信息的xml文檔,經過文檔實現對索引維護。

       搜索:                                               get請求,服務器返回一個xml文檔

相關文章
相關標籤/搜索