edismax支持boost函數與score相乘做爲,而dismax只能使用bf做用效果是相加,因此在處理多個維度排序時,score其實也應該是其中一個維度 ,用相加的方式處理調整麻煩。java
而dismax的實現代碼邏輯比較簡單,看起來比較易理解,edismax是它的增強版,實際上是改變了很多。。好比在如下:api
先看看dismax的解析主要實現思路:app
首先取出搜索字段名qfide
將最終解析成一個BooleanQuery函數
先解析主mainQuery:ui
- 用戶主要是搜索串的解析
- altQuery解析處理,看是否使用用戶定義的後備搜索串
- PhraseQuery解析組裝
再解析bq查詢,主要是額外加分的查詢,不會影響搜索結果數,只會影響排序
再則是bf解析,函數搜索最後會以加的方式做用於文檔評分
看主要代碼更清晰:this
- @Override
- public Query parse() throws ParseException {
- SolrParams solrParams = SolrParams.wrapDefaults(localParams, params);
-
- queryFields = SolrPluginUtils.parseFieldBoosts(solrParams.getParams(DisMaxParams.QF));
- if (0 == queryFields.size()) {
- queryFields.put(req.getSchema().getDefaultSearchFieldName(), 1.0f);
- }
-
-
-
-
- BooleanQuery query = new BooleanQuery(true);
-
- boolean notBlank = addMainQuery(query, solrParams);
- if (!notBlank)
- return null;
- addBoostQuery(query, solrParams);
- addBoostFunctions(query, solrParams);
-
- return query;
- }
edismax的主要實現思路跟dismax差很少,如下是一些主要差異之處:spa
edismax解析含有+,OR,NOT,-語法時,就會忽略掉使用MM。.net
如下是主要代碼實現:orm
統計搜索串中+,OR ,NOT,-語法元個數
-
-
- List<Clause> clauses = null;
- int numPluses = 0;
- int numMinuses = 0;
- int numOR = 0;
- int numNOT = 0;
-
- clauses = splitIntoClauses(userQuery, false);
- for (Clause clause : clauses) {
- if (clause.must == '+') numPluses++;
- if (clause.must == '-') numMinuses++;
- if (clause.isBareWord()) {
- String s = clause.val;
- if ("OR".equals(s)) {
- numOR++;
- } else if ("NOT".equals(s)) {
- numNOT++;
- } else if (lowercaseOperators && "or".equals(s)) {
- numOR++;
- }
- }
- }
/////當搜索串裏包含有+,OR ,NOT,-這四種時候,mm就會失效
- boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0;
- (parsedUserQuery != null && doMinMatched) {
- String minShouldMatch = solrParams.get(DisMaxParams.MM, "100%");
- if (parsedUserQuery instanceof BooleanQuery) {
- SolrPluginUtils.setMinShouldMatch((BooleanQuery)parsedUserQuery, minShouldMatch);
- }
- }
短語查詢,先找出普通的查詢,原來就是短語查詢的、或者屬於「OR」,「AND」,「NOT」,’TO‘類型的都不要。因爲edismax支持解析符合lucene語法的搜索串,因此不像dismax那樣,只須要簡單的將搜索串去掉\「,而後加個「」括起來就行
// find non-field clauses
List<Clause>normalClauses =new ArrayList<Clause>(clauses.size());
for (Clauseclause :clauses) {
if (clause.field !=null ||clause.isPhrase)continue;
// check for keywords "AND,OR,TO"
if (clause.isBareWord()) {
String s =clause.val.toString();
// avoid putting explict operators in the phrase query
if ("OR".equals(s) ||"AND".equals(s) ||"NOT".equals(s) || "TO".equals(s))continue;
}
normalClauses.add(clause);
}
// full phrase...
addShingledPhraseQueries(query, normalClauses, phraseFields, 0,
tiebreaker,pslop);
// shingles...
addShingledPhraseQueries(query, normalClauses, phraseFields2, 2,
tiebreaker,pslop);
addShingledPhraseQueries(query, normalClauses, phraseFields3, 3,
tiebreaker,pslop);
////下面是dismax獲取短語查詢的做法:
- protected Query getPhraseQuery(String userQuery, SolrPluginUtils.DisjunctionMaxQueryParser pp) throws ParseException {
- String userPhraseQuery = userQuery.replace("\"", "");
- return pp.parse("\"" + userPhraseQuery + "\"");
- }
下面是edismax的做法:
- private void addShingledPhraseQueries(final BooleanQuery mainQuery,
- final List<Clause> clauses,
- final Map<String,Float> fields,
- int shingleSize,
- final float tiebreaker,
- final int slop)
- throws ParseException {
- if (null == fields || fields.isEmpty() ||
- null == clauses || clauses.size() <= shingleSize )
- return;
- if (0 == shingleSize) shingleSize = clauses.size();
- final int goat = shingleSize-1;
- StringBuilder userPhraseQuery = new StringBuilder();
- for (int i=0; i < clauses.size() - goat; i++) {
- userPhraseQuery.append('"');
- for (int j=0; j <= goat; j++) {
- userPhraseQuery.append(clauses.get(i + j).val);
- userPhraseQuery.append(' ');
- }
- userPhraseQuery.append('"');
- userPhraseQuery.append(' ');
- }
- ExtendedSolrQueryParser pp =
- new ExtendedSolrQueryParser(this, IMPOSSIBLE_FIELD_NAME);
- pp.addAlias(IMPOSSIBLE_FIELD_NAME, tiebreaker, fields);
- pp.setPhraseSlop(slop);
- pp.setRemoveStopFilter(true);
- pp.makeDismax = true;
- pp.minClauseSize = 2;
- Query phrase = pp.parse(userPhraseQuery.toString());
- if (phrase != null) {
- mainQuery.add(phrase, BooleanClause.Occur.SHOULD);
- }
- }
edismax技術另外一個重要的boost查詢,
boost查詢也是不會影響搜索結果數,可是影響排序,主要做用是將最後得分以相乘的方式做用於score,函數的解析跟bf差很少。
-
-
-
- Query topQuery = query;
- multBoosts = solrParams.getParams("boost");
- if (multBoosts!=null && multBoosts.length>0) {
-
- List<ValueSource> boosts = new ArrayList<ValueSource>();
- for (String boostStr : multBoosts) {
- if (boostStr==null || boostStr.length()==0) continue;
- Query boost = subQuery(boostStr, FunctionQParserPlugin.NAME).getQuery();
- ValueSource vs;
- if (boost instanceof FunctionQuery) {
- vs = ((FunctionQuery)boost).getValueSource();
- } else {
- vs = new QueryValueSource(boost, 1.0f);
- }
- boosts.add(vs);
- }
-
- if (boosts.size()>1) {
- ValueSource prod = new ProductFloatFunction(boosts.toArray(new ValueSource[boosts.size()]));
- topQuery = new BoostedQuery(query, prod);
- } else if (boosts.size() == 1) {
- topQuery = new BoostedQuery(query, boosts.get(0));
- }
- }
能夠看到最後不是一個BooleanQuery,而是一個BoostedQuery。
它就是簡單處理子查詢的分值再與函數查詢的分值相乘返回 :主要的score方法以下:
- public float score() throws IOException {
- float score = qWeight * scorer.score() * vals.floatVal(scorer.docID());
- return score>Float.NEGATIVE_INFINITY ? score : -Float.MAX_VALUE;
- }
轉貼請聲明來源:http://blog.csdn.net/duck_genuine/article/details/8060026