Lucene系列-搜索

Lucene搜索的時候就要構造查詢語句,本篇就介紹下各類Query。IndexSearcher是搜索主類,提供的經常使用查詢接口有:java

TopDocs search(Query query, int n);//find the top n hits for query
TopDocs search(Query query, Filter filter, int n);// find the top n hits for query, applying filter if no-null

Query

quey在org.apache.lucene.search上,表明最終的查詢語法樹,傳入IndexSearcher進行查找。

TermQuery:在某個Field上查找一個詞條apache

Term t = new Term("bookname", "Lucene");//詞條所在Field,詞條內容
Query q = new TermQuery(t);

BooleanQuery:由多個子句組成,子句間由「與、或、非」這樣的布爾邏輯鏈接。BooleanClause.Occur是個枚舉,包括MUST/MUST_NOT/SHOULD。經常使用的組合有:緩存

MUST和MUST:求交集,MUST和MUST_NOT:求差集,SHOULD和SHOULD:求並集。安全

void add(Query query, BooleanClause.Occur occur)

NumericRangeQuery/TermRangeQuery:範圍查詢,範圍能夠是日期、時間、數字,若是不設上限或下限,對應的邊界設爲null,inclusive設爲false。app

TermRangeQuery(String field, String lowerTerm, String upperTerm, boolean includeLower, boolean includeUpper);
//NumericRangeQuery
static NumericRangeQuery<Double>	newDoubleRange(String field, Double min, Double max, boolean minInclusive, boolean maxInclusive);
static NumericRangeQuery<Float>  newFloatRange(String field, Float min, Float max, boolean minInclusive, boolean maxInclusive); 
static NumericRangeQuery<Integer>	newIntRange(String field, Integer min, Integer max, boolean minInclusive, boolean maxInclusive);
static NumericRangeQuery<integer>	newIntRange(String field, int precisionStep, Integer min, Integer max, boolean minInclusive, boolean maxInclusive);

PhraseQuery:短語搜索,一個以上的關鍵字組成的短語,如中國,鋼鐵。能夠設置slop,容許短語中的字之間有其餘字的個數,默認爲0spa

void add(Term term);//add a term to the end of the query phrase
void setSlop(int s);//set the number of other words between words in the query phrase
//sample,bookname包含"中國"的會被搜到,其餘組合都不會被搜到
PhraseQuery query = new PhraseQuery();
query.add(new Term("bookname", "中"));
query.add(new Term("bookname", "國"));

MultiPhraseQuery:一些短語有相同的前綴,或後綴,或中間詞,如中國好聲音和美國好聲音blog

void	add(Term term);//Add a single term at the next position in the phrase.
void	add(Term[] terms);//Add multiple terms at the next position in the phrase.
//sample
MultiPhraseQuery query = new MultiPhraseQuery();
query.add(new Term[]{new Term("bookname", "中"), new Term("bookname", "美")});
query.add(new Term("song", "國"));
query.add(new Term("song", "好"));
query.add(new Term("song", "聲"));
query.add(new Term("song", "音"));

PrefixQuery:前綴匹配繼承

PrefixQuery query = new PrefixQuery(new Term("bookname","鋼"));//查找以鋼開頭的bookname

FuzzyQuery:模糊匹配,比較兩個字符串時,執行一個串轉變爲另外一個串的操做(增、刪、改變字母),每執行一次轉變就扣除必定分數,最終獲得二者的距離(模糊度)索引

FuzzyQuery(Term term);
FuzzyQuery(Term term, int maxEdits);//maxEdits-an edit distance fo at most maxEdits to term
FuzzyQuery(Term term, int maxEdits, int prefixLength);//prefixLength-length of common (no-fuzzy) prefix

WildcardQuery:使用'?'和'*'通配符接口

WildcardQuery query = new WildcardQuery(new Term("bookname", "?o*")); 

Filter

filter至關因而一個搜索必須條件,用於對搜索結果進行限制,如返回的文檔安全級別限制。全部過濾器都繼承org.apache.lucene.search.Filter,由於Filter條件大多與query無關,不須要每次都執行一次索引遍歷,因此lucene引入了緩存技術,避免一遍遍重複的搜索索引過濾文檔。

經常使用的有NumericRangeFilter、PrefixFilter、TermRangeFilter,封裝Filter以實現緩存的CachingWrapperFilter,針對某個Field進行緩存的FieldCacheRangeFilter、FieldCacheTermsFilter。

QueryParser

org.apache.lucene.queryParser用於解析子句生成Query。支持的語法規則以下

Query  ::= ( Clause )*
Clause ::= ["+", "-"] [<TERM> ":"] ( <TERM> | "(" Query ")" )

+ 必須,- 排除,: 表示針對某個Field搜索,通配符?*。舉例

+bookname:java -bookname:structs,搜bookname中包含java不包含structs的doc
publishdate:[1990 TO 1998],第一版日期在1990和1998之間
bookname:work~0.5,模糊查詢
bookname:"apache lucene"~5,鬆散短語查詢,bookname必須包含apache和lucene,但二者距離要在5個詞內
"God helps",加引號表示不分詞,做爲完整的一個短語查詢
bookname:(java search),空格隔開的多個詞須要加括號,不然後面一個詞"search"不會被認爲是在bookname上的搜索,會認爲是default field上的搜索 

經常使用方法有:

Query parse(String query);
QueryParser(Version matchVersion, String f, Analyzer a)//分詞器應該與建索引的分詞器保持一致

注:

構造好Query後,想看下實際的查詢內容,能夠用query.toString()

相關文章
相關標籤/搜索