最近,須要對項目進行lucene版本升級。而原來項目時基於lucene 3.0的,很古老的一個版本的了。在老版本中中,咱們主要用了幾個lucene的東西:html
一、查詢lucene多目錄索引。java
二、構建RAMDirectory,把索引放到內存中,以提升檢索效率。算法
三、構建Lucene自定義分詞。apache
四、修改Lucene默認的打分算法。ide
下面,將代碼改造前和改造後作一對比:函數
1. 搜索多索引目錄測試
3.0 構建多索引目錄: spa
1 // 初始化全國索引 2 private boolean InitGlobal(String strRootPath) { 3 try { 4 5 IndexSearcher[] searchers = new IndexSearcher[2]; 6 7 MultiSearcher globalSearcher = null; 8 if (Configution.IsMMap.equalsIgnoreCase("true")) { 9 10 searchers[0] = new IndexSearcher(new RAMDirectory(FSDirectory 11 .open(new File(strRootPath + "/" + GLABOL_INDEX)))); 12 searchers[1] = new IndexSearcher(new RAMDirectory(FSDirectory 13 .open(new File(strRootPath + "/" + BUS_INDEX)))); 14 // searchers[2] = new IndexSearcher(new RAMDirectory(FSDirectory 15 // .open(new File(strRootPath + "/" + LU_INDEX)))); 16 globalSearcher = new MultiSearcher(searchers); 17 } else { 18 searchers[0] = new IndexSearcher(FSDirectory.open(new File( 19 strRootPath + "/" + GLABOL_INDEX))); 20 searchers[1] = new IndexSearcher(FSDirectory.open(new File( 21 strRootPath + "/" + BUS_INDEX))); 22 // searchers[2] = new IndexSearcher(FSDirectory.open(new File( 23 // strRootPath + "/" + LU_INDEX))); 24 25 globalSearcher = new MultiSearcher(searchers); 26 } 27 System.out.println("finish Global"); 28 29 m_mapIndexName2Searcher.put("0", globalSearcher); 30 m_mapAdmin2IndexName.put("0", "0"); 31 32 return true; 33 34 } catch (Exception e) { 35 e.printStackTrace(); 36 SearchLog.SearchLog.error("全國索引初始化異常"); 37 return false; 38 } 39 }
Ok,使用MultiSearcher,這是lucene低版本搜索多索引的解決方案。可是在高版本,MutiSearcher這個類自己都刪除了,折騰我很長時間。可見以版本帝著稱的Lucene代碼設計不是太好。整個lucene代碼,接口使用不多,大可能是類和抽象類。設計
4.x 構建多索引目錄: code
// 初始化全國索引 private boolean InitGlobal(String strRootPath) { try { IndexSearcher globalSearcher = null; if (Configution.IsMMap.equalsIgnoreCase("true")) { IndexReader irGlobal = DirectoryReader.open(new RAMDirectory(FSDirectory .open(new File(strRootPath + "/" + GLABOL_INDEX)),new IOContext())); IndexReader irBus = DirectoryReader.open(new RAMDirectory(FSDirectory .open(new File(strRootPath + "/" + BUS_INDEX)),new IOContext())); MultiReader mr = new MultiReader(irGlobal,irBus); globalSearcher = new IndexSearcher(mr);//new MultiSearcher(searchers); } else { IndexReader irGlobal = DirectoryReader.open(FSDirectory .open(new File(strRootPath + "/" + GLABOL_INDEX))); IndexReader irBus = DirectoryReader.open(FSDirectory .open(new File(strRootPath + "/" + BUS_INDEX))); MultiReader mr = new MultiReader(irGlobal,irBus); globalSearcher = new IndexSearcher(mr);//new MultiSearcher(searchers); } System.out.println("finish Global"); m_mapIndexName2Searcher.put("0", globalSearcher); m_mapAdmin2IndexName.put("0", "0"); return true; } catch (Exception e) { e.printStackTrace(); SearchLog.SearchLog.error("全國索引初始化異常"); return false; } }
ok 改造後,直接用IndexSearcher替代MultiSearcher,經過傳入MultiReader來檢索多個索引目錄。
二、構建RAMDirectory,將索引放入內存中。
3.0 構建內存索引目錄:
searchers[0] = new IndexSearcher(new RAMDirectory(FSDirectory .open(new File(strRootPath + "/" + GLABOL_INDEX)))); searchers[1] = new IndexSearcher(new RAMDirectory(FSDirectory .open(new File(strRootPath + "/" + BUS_INDEX))));
直接將Diretory做爲RAMDirectory的構造函數,注意這個動做有坑,若是數據量大,你要等好久的!
4.x 構建內存索引目錄:
IndexReader irGlobal = DirectoryReader.open(new RAMDirectory(FSDirectory .open(new File(strRootPath + "/" + GLABOL_INDEX)),new IOContext())); IndexReader irBus = DirectoryReader.open(new RAMDirectory(FSDirectory .open(new File(strRootPath + "/" + BUS_INDEX)),new IOContext())); MultiReader mr = new MultiReader(irGlobal,irBus);
在4.x中,安裝3.0構造方法是不行的,還須要傳入一個IOContext對象,汗~~~~~~~~~~~~~~~~
三、自定義分詞:
3.0 自定義分詞:
public class SingleAnalyzer extends Analyzer { /** * @param args */ public TokenStream tokenStream(String fieldName, Reader reader){ TokenStream result = null; if(fieldName.equals("name")) { result = new SingleTokenizer(reader); } if(fieldName.equals("totalcity")) { result = new IKTokenizer(reader, false); } // result = new StandardFilter(result); // result = new LowerCaseFilter(result); // result = new StopFilter(result, stopSet); return result; } public static void main(String[] args) { // TODO Auto-generated method stub } }
重寫tokenStream方法便可,很簡單。
4.x自定義分詞:
public class SingleAnalyzer extends Analyzer { /** * @param args */ // public TokenStream tokenStream(String fieldName, Reader reader){ // TokenStream result = null; // if(fieldName.equals("name")) // { // result = new SingleTokenizer(reader); // } // if(fieldName.equals("totalcity")) // { // result = new IKTokenizer(reader, false); // } // //// result = new StandardFilter(result); //// result = new LowerCaseFilter(result); // // result = new StopFilter(result, stopSet); // return result; // } @Override protected TokenStreamComponents createComponents(String fieldName, Reader reader) { // TODO Auto-generated method stub // final Tokenizer source = new ChineseTokenizer(reader); // return new TokenStreamComponents(source, new ChineseFilter(source)); Tokenizer source = null; if(fieldName.equals("name")){ source = new SingleTokenizer(reader); }else if(fieldName.equals("totalcity")){ source = new IKTokenizer(reader, false); } return new TokenStreamComponents(source, source); } }
OK,在4.x中你須要重寫createComponents方法。
四、打分算法:
3.x和4.x打分算法變化不大,可是命名空間發生了變化,汗~~~~~~~~~~~~
3.x 命名空間:引入:import org.apache.lucene.search.DefaultSimilarity,命名空間在:org.apache.lucene.search
4.x命名空間:引入:import org.apache.lucene.search.similarities.*,命名空間在:org.apache.lucene.search.similarities。
五、查詢表達式:主要體如今TermRangeQuery上,3.x版本的一個參數是string類型,可是在4.x版本變成了包了string一層的BytesRef,還有其餘不少細節變化
3.x TermRangerQuery:
1 String left = Long 2 .toString((long) (rcBound.m_dLeft * COORDINATE_SCALE_FACTOR)); 3 String right = Long 4 .toString((long) (rcBound.m_dRight * COORDINATE_SCALE_FACTOR)); 5 String top = Long 6 .toString((long) (rcBound.m_dTop * COORDINATE_SCALE_FACTOR)); 7 String bottom = Long 8 .toString((long) (rcBound.m_dBottom * COORDINATE_SCALE_FACTOR)); 9 10 11 12 TermRangeQuery query1 = new TermRangeQuery("lon", left, right, 13 true, true); 14 TermRangeQuery query2 = new TermRangeQuery("lat", bottom, top, 15 true, true); 16 searchQuery.add(query1, BooleanClause.Occur.MUST); 17 searchQuery.add(query2, BooleanClause.Occur.MUST);
4.x TermRangerQuery:
String left = Long .toString((long) (rcBound.m_dLeft * COORDINATE_SCALE_FACTOR)); String right = Long .toString((long) (rcBound.m_dRight * COORDINATE_SCALE_FACTOR)); String top = Long .toString((long) (rcBound.m_dTop * COORDINATE_SCALE_FACTOR)); String bottom = Long .toString((long) (rcBound.m_dBottom * COORDINATE_SCALE_FACTOR)); BytesRef brLeft = new BytesRef(left); BytesRef brRight = new BytesRef(right); BytesRef brBottom = new BytesRef(bottom); BytesRef brTop = new BytesRef(top); TermRangeQuery query1 = new TermRangeQuery("lon", brLeft, brRight, true, true); TermRangeQuery query2 = new TermRangeQuery("lat", brBottom, brTop, true, true); searchQuery.add(query1, BooleanClause.Occur.MUST); searchQuery.add(query2, BooleanClause.Occur.MUST);
六、關閉IndexSearcher
3.x 關閉IndexSearcher直接調用close方法便可:
1 public void UnInit() { 2 if (!m_bIsInit) 3 return; 4 5 Iterator iter = m_mapIndexName2Searcher.keySet().iterator(); 6 7 while (iter.hasNext()) { 8 9 String key = (String) iter.next(); 10 11 MultiSearcher val = (MultiSearcher) m_mapIndexName2Searcher 12 .get(key); 13 14 try { 15 16 val.close();//關閉IndexSearcher 17 } catch (IOException e) { 18 e.printStackTrace(); 19 SearchLog.SearchLog.error("分級索引關閉異常"); 20 } 21 } 22 23 m_mapIndexName2Searcher.clear(); 24 m_mapAdmin2IndexName.clear(); 25 m_mapIndexName2Searcher = null; 26 m_mapAdmin2IndexName = null; 27 m_bIsInit = false; 28 }
4.x 關閉IndexSearcher 沒有直接close的方法,須要getIndexReader 而後調用IndexReader的close方法:
1 public void UnInit() { 2 if (!m_bIsInit) 3 return; 4 5 Iterator iter = m_mapIndexName2Searcher.keySet().iterator(); 6 7 while (iter.hasNext()) { 8 9 String key = (String) iter.next(); 10 11 IndexSearcher val = (IndexSearcher) m_mapIndexName2Searcher 12 .get(key); 13 14 try { 15 val.getIndexReader().close();//關閉IndexSearcher 16 } catch (IOException e) { 17 e.printStackTrace(); 18 SearchLog.SearchLog.error("分級索引關閉異常"); 19 } 20 } 21 22 m_mapIndexName2Searcher.clear(); 23 m_mapAdmin2IndexName.clear(); 24 m_mapIndexName2Searcher = null; 25 m_mapAdmin2IndexName = null; 26 m_bIsInit = false; 27 }
總之,lucene版本變化很大,若是升級不少方法發送變化,您須要細緻觀察,多試試,才能升級。升級完成後,最好進行一次功能測試,有些功能可能發生變化甚至錯誤。升級Lucene不是一件好差事~~~~~~~~~
文章轉載請註明出處:http://www.cnblogs.com/likehua/p/4387700.html