Lucene源碼解析--索引建立過程

時間 2019-11-16

標籤 lucene 源碼解析索引建立過程简体版

原文原文鏈接

文檔的索引過程是經過DocumentsWriter的內部數據處理鏈完成的，DocumentsWriter能夠實現同時添加多個文檔並將它們寫入一個臨時的segment中，完成後再由IndexWriter和SegmentMerger合併到統一的segment中去。

DocumentsWriter支持多線程處理，即多個線程同時添加文檔，它會爲每一個請求分配一個DocumentsWriterThreadState對象來監控此處理過程。處理時經過DocumentsWriter初始化時創建的DocFieldProcessor管理的索引處理鏈來完成的，依次處理爲DocFieldConsumers、DocInverter、TermsHash、FreqProxTermsWriter、TermVectorsTermsWriter、NormsWriter以及StoredFieldsWriter等

源碼調用過程：

1.IndexWriter:addDocument(doc);

2.IndexWriter:addDocument(doc,analyzer);

3.IndexWriter:updateDocument(term,doc,analyzer);

4.DocumentsWriter:updateDocument(doc,analyzer,term);

5.DocumentsWriter:preUpdate()

判斷當前線程是該進行合併操做，仍是進行新增操做

6.DocumentsWriterPerThread:updateDocument()

public void updateDocument(Iterable<? extends IndexableField> doc, Analyzer analyzer, Term delTerm) throws IOException {
    assert writer.testPoint("DocumentsWriterPerThread addDocument start");
    assert deleteQueue != null;
    docState.doc = doc;
    docState.analyzer = analyzer;
    docState.docID = numDocsInRAM;
    if (segmentInfo == null) {
      initSegmentInfo();
    }
    if (INFO_VERBOSE && infoStream.isEnabled("DWPT")) {
      infoStream.message("DWPT", Thread.currentThread().getName() + " update delTerm=" + delTerm + " docID=" + docState.docID + " seg=" + segmentInfo.name);
    }
    boolean success = false;
    try {
      try {
        consumer.processDocument(fieldInfos);
      } finally {
        docState.clear();
      }
      success = true;
    } finally {
      if (!success) {
        if (!aborting) {
          // mark document as deleted
          deleteDocID(docState.docID);
          numDocsInRAM++;
        } else {
          abort();
        }
      }
    }
    success = false;
    try {
      consumer.finishDocument();
      success = true;
    } finally {
      if (!success) {
        abort();
      }
    }
    finishDocument(delTerm);
}

7.DocumentsWriterPerThread:initSegmentInfo()

初始化段的基本信息

8.DocFieldProcessor:processDocument()

該方法是處理一個文檔的調度函數，負責整理文檔的各個fields數據,，並建立相應的DocFieldProcessorPerField對象來依次處理每個field.

該方法首先調用索引鏈表的startDocument()來初始化各項數據，而後依次遍歷每個fields，將它們創建一個以field名字計算的hash值爲key的hash表，值爲DocFieldProcessorPerField類型。若是hash表中已存在該field，則更新該FieldInfo（調用FieldInfo.update()方法），若是不存在則建立一個新的DocFieldProcessorPerField來加入hash表中。注意，該hash表會存儲包括當前添加文檔的全部文檔的fields信息，並根據FieldInfo.update()來合併相同field名字的域設置信息。

9.DocInverter:startDocument()

10.TermHash:startDocument()

11.FreqProxTermsWriter:startDocument()

12.TermVectorsConsumer:startDocument()

13.NormsConsumer:startDocument()

14.TwoStoredFieldsConsumers:startDocument()

15.StoredFieldsProcessor:startDocument()

public void reset() {     numStoredFields = 0;     storedFields = new IndexableField[1];     fieldInfos = new FieldInfo[1]; } 16.DocValuesProcessor:startDocument() 17.FieldInfos:public FieldInfo addOrUpdate(String name, IndexableFieldType fieldType) 18.DocFieldProcessorPerField:public void addField(IndexableField field) 19.TwoStoredFieldsConsumers:public void addField(int docID, IndexableField field, FieldInfo fieldInfo) 20.StoredFieldsProcessor: public void addField(int docID, IndexableField field, FieldInfo fieldInfo) 21.DocValuesProcessor:public void addField(int docID, IndexableField field, FieldInfo fieldInfo) 22.DocInverterPerField:processFields(final IndexableField[] fields,final int count) 23.Field:tokenStream() 24.TermsHashPerField:start(IndexableField f) 25.FreqProxTermsWriterPerField:void start(IndexableField f) 26.TermsHashPerField:void add() throws IOException 27.FreqProxTermsWriterPerField:void newTerm(final int termID) 28.TermsHashPerField:void finish() 29.FreqProxTermsWriterPerField:void finish() 30.NormsConsumerPerField:void finish() 31.DocFieldProcessor:void finishDocument() 32.TwoStoredFieldsConsumers:void finishDocument() 33.StoredFieldsProcessor:void finishDocument() 34.CompressingStoredFieldsWriter:public void finishDocument() 35.DocValuesProcessor:void finishDocument() 36.DocInverter:void finishDocument() 37.NormsConsumer:void finishDocument() 38.TermsHash:void finishDocument() 39.TermVectorsConsumer:void finishDocument(TermsHash termsHash) 40.DocumentsWriterPerThread:private void finishDocument(Term delTerm) 41.DocumentsWriterFlushControl: 42:DocumentsWriter:postUpdate 43:IndexWriter:close(true) 44:IndexWriter:private void closeInternal(boolean waitForMerges, boolean doFlush) 45:DocumentsWriter:void close() 46.IndexWriter:protected final void flush(boolean triggerMerge, boolean applyAllDeletes) IndexWriter:doFlush(boolean applyAllDeletes) DocumentsWriter:flushAllThreads() 47.DocumentsWriter:private boolean doFlush(DocumentsWriterPerThread flushingDWPT) DocumentsWriterPerThread:FlushedSegment flush() 48.DocFieldProcessor:public void flush(SegmentWriteState state) 49.TwoStoredFieldsConsumers:void flush(SegmentWriteState state) 50.StoredFieldsProcessor:public void flush(SegmentWriteState state) 51.DocValuesProcessor:void flush(SegmentWriteState state) 52.DocInverter:void flush(Map<String, DocFieldConsumerPerField> fieldsToFlush, SegmentWriteState state) 53.TermsHash:void flush(Map<String,InvertedDocConsumerPerField> fieldsToFlush, final SegmentWriteState state) 54.FreqProxTermsWriter:public void flush(Map<String,TermsHashConsumerPerField> fieldsToFlush, final SegmentWriteState state) 55.NormsConsumer:public void flush(Map<String,InvertedDocEndConsumerPerField> fieldsToFlush, SegmentWriteState state) 56.DocumentsWriterPerThread:doAfterFlush() 57.DocFieldProcessor:void doAfterFlush() 58.IndexWriter: protected void doAfterFlush()

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。