前面已經寫到,solr查詢是經過http發送命令,solr servlet接受並進行處理。因此solr的查詢流程從SolrDispatchsFilter的dofilter開始。dofilter包含了對http的各個請求的操做。Solr的查詢方式有不少,好比q,fq等,本章只關注select和q。頁面下發的查詢請求以下:http://localhost:8080/solr/test/select?q=code%3A%E8%BE%BD*+AND+last_modified%3A%5B0+TO+1408454600265%5D+AND+id%3Acheng&wt=json&indent=truejava
1 @Override 2 public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException { 3 doFilter(request, response, chain, false); 4 }
因爲只關注select,實際的查詢是從以下代碼開始:this.execute()是查詢的入口函數。這裏須要注意下writeResponse()函數。execute只是獲取了符合查詢條件的doc id,最後在writeResponse()中會根據doc id獲取stored屬性的字段信息,並寫入返回結果。json
1 // With a valid handler and a valid core... 2 if( handler != null ) { 3 // if not a /select, create the request 4 if( solrReq == null ) { 5 solrReq = parser.parse( core, path, req ); 6 } 7 8 if (usingAliases) { 9 processAliases(solrReq, aliases, collectionsList); 10 } 11 12 final Method reqMethod = Method.getMethod(req.getMethod()); 13 HttpCacheHeaderUtil.setCacheControlHeader(config, resp, reqMethod); 14 // unless we have been explicitly told not to, do cache validation 15 // if we fail cache validation, execute the query 16 if (config.getHttpCachingConfig().isNever304() || 17 !HttpCacheHeaderUtil.doCacheHeaderValidation(solrReq, req, reqMethod, resp)) { 18 SolrQueryResponse solrRsp = new SolrQueryResponse(); 19 /* even for HEAD requests, we need to execute the handler to 20 * ensure we don't get an error (and to make sure the correct 21 * QueryResponseWriter is selected and we get the correct 22 * Content-Type) 23 */ 24 SolrRequestInfo.setRequestInfo(new SolrRequestInfo(solrReq, solrRsp)); 25 this.execute( req, handler, solrReq, solrRsp ); 26 HttpCacheHeaderUtil.checkHttpCachingVeto(solrRsp, resp, reqMethod); 27 // add info to http headers 28 //TODO: See SOLR-232 and SOLR-267. 29 /*try { 30 NamedList solrRspHeader = solrRsp.getResponseHeader(); 31 for (int i=0; i<solrRspHeader.size(); i++) { 32 ((javax.servlet.http.HttpServletResponse) response).addHeader(("Solr-" + solrRspHeader.getName(i)), String.valueOf(solrRspHeader.getVal(i))); 33 } 34 } catch (ClassCastException cce) { 35 log.log(Level.WARNING, "exception adding response header log information", cce); 36 }*/ 37 QueryResponseWriter responseWriter = core.getQueryResponseWriter(solrReq); 38 writeResponse(solrRsp, response, responseWriter, solrReq, reqMethod); 39 }
進入excute後會進入SolrCore的excute(), preDecorateResponse 對結果的頭信息好比進行預處理,postDecorateResponse對將時間、返回結果寫入response中。handleRequest繼續進行查詢操做。緩存
1 public void execute(SolrRequestHandler handler, SolrQueryRequest req, SolrQueryResponse rsp) { 2 if (handler==null) { 3 String msg = "Null Request Handler '" + 4 req.getParams().get(CommonParams.QT) + "'"; 5 6 if (log.isWarnEnabled()) log.warn(logid + msg + ":" + req); 7 8 throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, msg); 9 } 10 11 preDecorateResponse(req, rsp); 12 13 // TODO: this doesn't seem to be working correctly and causes problems with the example server and distrib (for example /spell) 14 // if (req.getParams().getBool(ShardParams.IS_SHARD,false) && !(handler instanceof SearchHandler)) 15 // throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,"isShard is only acceptable with search handlers"); 16 17 18 handler.handleRequest(req,rsp); 19 postDecorateResponse(handler, req, rsp); 20 21 if (log.isInfoEnabled() && rsp.getToLog().size() > 0) { 22 log.info(rsp.getToLogAsString(logid)); 23 } 24 }
RequestHandlerBase.handleRequest(SolrQueryRequest req, SolrQueryResponse rsp)再次調用了SearchHandle.handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp),這是時候才真正開始加載QueryComponents。less
如下語句會加載查詢有關的組件,包括QueryComponents,FacetComponents,MoreLikeThisComponent,HighlightComponent,StatsComponent,ide
DebugComponent,ExpandComponent。本文只關注查詢,因此進入的QueryComponent.java.函數
for( SearchComponent c : components ) { c.process(rb); }
暫且不提QueryComponent.java中的關於Query的處理(查詢的細節將在後面章節中說明,本章只做總述),QueryComponent.process源碼分析
(ResponseBuilder rb) 會調用SolrindexSearch.search(QueryResult qr, QueryCommand cmd)進行查詢,並在後續代碼中對返回的結果進行處理,主要包括doFieldSortValues(rb, searcher);和doPrefetch(rb);post
1 // normal search result 2 searcher.search(result,cmd); 3 rb.setResult( result ); 4 5 ResultContext ctx = new ResultContext(); 6 ctx.docs = rb.getResults().docList; 7 ctx.query = rb.getQuery(); 8 rsp.add("response", ctx); 9 rsp.getToLog().add("hits", rb.getResults().docList.matches()); 10 11 if ( ! rb.req.getParams().getBool(ShardParams.IS_SHARD,false) ) { 12 if (null != rb.getNextCursorMark()) { 13 rb.rsp.add(CursorMarkParams.CURSOR_MARK_NEXT, 14 rb.getNextCursorMark().getSerializedTotem()); 15 } 16 } 17 doFieldSortValues(rb, searcher); 18 doPrefetch(rb);
SolrindexSearch.search函數比較簡單,只是調用了SolrindexSearch.getDocListC.顧名思義,該函數返回了查詢結果的doc id 的list。這時候纔是真正的查詢開始。查詢以前,Solr會從queryResultCache緩存裏面讀取該條件的結果,queryResultCache裏面存放了查詢條件和查詢結果的鍵值對。若是queryResultCache裏面有這個查詢條件,那Solr就會直接返回查詢條件的值。若是沒有該查詢條件,則會進行正常查詢,並把查詢條件和查詢命令寫入queryResultCache的鍵值對裏。queryResultCache具備容量大小,能夠在solrconfig的緩存配置裏進行配置。fetch
1 // we can try and look up the complete query in the cache. 2 // we can't do that if filter!=null though (we don't want to 3 // do hashCode() and equals() for a big DocSet). 4 if (queryResultCache != null && cmd.getFilter()==null 5 && (flags & (NO_CHECK_QCACHE|NO_SET_QCACHE)) != ((NO_CHECK_QCACHE|NO_SET_QCACHE))) 6 { 7 // all of the current flags can be reused during warming, 8 // so set all of them on the cache key. 9 key = new QueryResultKey(q, cmd.getFilterList(), cmd.getSort(), flags); 10 if ((flags & NO_CHECK_QCACHE)==0) { 11 superset = queryResultCache.get(key); 12 13 if (superset != null) { 14 // check that the cache entry has scores recorded if we need them 15 if ((flags & GET_SCORES)==0 || superset.hasScores()) { 16 // NOTE: subset() returns null if the DocList has fewer docs than 17 // requested 18 out.docList = superset.subset(cmd.getOffset(),cmd.getLen()); 19 } 20 } 21 if (out.docList != null) { 22 // found the docList in the cache... now check if we need the docset too. 23 // OPT: possible future optimization - if the doclist contains all the matches, 24 // use it to make the docset instead of rerunning the query. 25 if (out.docSet==null && ((flags & GET_DOCSET)!=0) ) { 26 if (cmd.getFilterList()==null) { 27 out.docSet = getDocSet(cmd.getQuery()); 28 } else { 29 List<Query> newList = new ArrayList<>(cmd.getFilterList().size()+1); 30 newList.add(cmd.getQuery()); 31 newList.addAll(cmd.getFilterList()); 32 out.docSet = getDocSet(newList); 33 } 34 } 35 return; 36 } 37 } 38 39 // If we are going to generate the result, bump up to the 40 // next resultWindowSize for better caching. 41 42 if ((flags & NO_SET_QCACHE) == 0) { 43 // handle 0 special case as well as avoid idiv in the common case. 44 if (maxDocRequested < queryResultWindowSize) { 45 supersetMaxDoc=queryResultWindowSize; 46 } else { 47 supersetMaxDoc = ((maxDocRequested -1)/queryResultWindowSize + 1)*queryResultWindowSize; 48 if (supersetMaxDoc < 0) supersetMaxDoc=maxDocRequested; 49 } 50 } else { 51 key = null; // we won't be caching the result 52 } 53 }
若是沒有複合的緩存,那麼將進行正常的查詢。這裏查詢會走排序和非排序的查詢分支(兩個分支的差異將在後續文章中寫道)。最後查詢會進入getDocListNC(qr,cmd)函數繼續進行查詢。superset.subset()會對查詢結果進行截斷,好比我查詢的結果start=20,row=40,那麼Solr查詢實際的結果是start=0,row=60,也就是至少說會查(start+row)個結果,而後再獲取第20到第60的結果集。優化
if (useFilterCache) { // now actually use the filter cache. // for large filters that match few documents, this may be // slower than simply re-executing the query. if (out.docSet == null) { out.docSet = getDocSet(cmd.getQuery(),cmd.getFilter()); DocSet bigFilt = getDocSet(cmd.getFilterList()); if (bigFilt != null) out.docSet = out.docSet.intersection(bigFilt); } // todo: there could be a sortDocSet that could take a list of // the filters instead of anding them first... // perhaps there should be a multi-docset-iterator sortDocSet(qr, cmd); } else { // do it the normal way... if ((flags & GET_DOCSET)!=0) { // this currently conflates returning the docset for the base query vs // the base query and all filters. DocSet qDocSet = getDocListAndSetNC(qr,cmd); // cache the docSet matching the query w/o filtering if (qDocSet!=null && filterCache!=null && !qr.isPartialResults()) filterCache.put(cmd.getQuery(),qDocSet); } else { getDocListNC(qr,cmd); } assert null != out.docList : "docList is null"; } if (null == cmd.getCursorMark()) { // Kludge... // we can't use DocSlice.subset, even though it should be an identity op // because it gets confused by situations where there are lots of matches, but // less docs in the slice then were requested, (due to the cursor) // so we have to short circuit the call. // None of which is really a problem since we can't use caching with // cursors anyway, but it still looks weird to have to special case this // behavior based on this condition - hence the long explanation. superset = out.docList; out.docList = superset.subset(cmd.getOffset(),cmd.getLen()); } else { // sanity check our cursor assumptions assert null == superset : "cursor: superset isn't null"; assert 0 == cmd.getOffset() : "cursor: command offset mismatch"; assert 0 == out.docList.offset() : "cursor: docList offset mismatch"; assert cmd.getLen() >= supersetMaxDoc : "cursor: superset len mismatch: " + cmd.getLen() + " vs " + supersetMaxDoc; }
SolrIndexSearch.getDocListNC(qr,cmd)裏面定義了許多Collector的內部類,不過暫時與本章節無關,因此直接查看如下這段代碼。首先Solr會建立TopDocsCollector,它會存放全部複合查詢條件的結果集。若是查詢的時候設置了timeAllowed開關,那麼查詢就會走TimeLimitingCollector分支。TimeLimitingCollector是Collector的子類,當timeAllowed設定一個數字時,好比200ms,若是Solr查詢一旦獲取到結果就會在200ms內返回,無論查詢的結果是否已經完整。能夠看見最後查詢過程最後調用了Lucene IndexSearch.Search(),這層開始進入Lucene.最後Solr會對TopDocsCollector的結果總數以及優先級隊列進行處理。
1 final TopDocsCollector topCollector = buildTopDocsCollector(len, cmd); 2 Collector collector = topCollector; 3 if (terminateEarly) { 4 collector = new EarlyTerminatingCollector(collector, cmd.len); 5 } 6 if( timeAllowed > 0 ) { 7 collector = new TimeLimitingCollector(collector, TimeLimitingCollector.getGlobalCounter(), timeAllowed); 8 } 9 if (pf.postFilter != null) { 10 pf.postFilter.setLastDelegate(collector); 11 collector = pf.postFilter; 12 } 13 try { 14 super.search(query, luceneFilter, collector); 15 if(collector instanceof DelegatingCollector) { 16 ((DelegatingCollector)collector).finish(); 17 } 18 } 19 catch( TimeLimitingCollector.TimeExceededException x ) { 20 log.warn( "Query: " + query + "; " + x.getMessage() ); 21 qr.setPartialResults(true); 22 } 23 24 totalHits = topCollector.getTotalHits(); 25 TopDocs topDocs = topCollector.topDocs(0, len); 26 populateNextCursorMarkFromTopDocs(qr, cmd, topDocs); 27 28 maxScore = totalHits>0 ? topDocs.getMaxScore() : 0.0f; 29 nDocsReturned = topDocs.scoreDocs.length; 30 ids = new int[nDocsReturned]; 31 scores = (cmd.getFlags()&GET_SCORES)!=0 ? new float[nDocsReturned] : null; 32 for (int i=0; i<nDocsReturned; i++) { 33 ScoreDoc scoreDoc = topDocs.scoreDocs[i]; 34 ids[i] = scoreDoc.doc; 35 if
進入Lucene的IndexSearch.Search()後,Solr開始對全部Segment進行遍歷,AtomicReaderContext包含了Segment的全部信息,包括docbase,doc的個數。
遍歷完後,會調用Weight.bulkScore()對多個條件進行重組,好比多個OR的條件組成一個條件,多個AND的查詢條件再組成一個List。Weight.bulkScore()會對這個List按照查詢條件的詞頻進行排序。對條件處理好之後,就是會從segment裏面獲取全部符合查詢條件的doc id(具體的獲取方法,在後續的文章裏會詳細介紹),這就是scorer.score(collector);的做用了。
1 /** 2 * Lower-level search API. 3 * 4 * <p> 5 * {@link Collector#collect(int)} is called for every document. <br> 6 * 7 * <p> 8 * NOTE: this method executes the searches on all given leaves exclusively. 9 * To search across all the searchers leaves use {@link #leafContexts}. 10 * 11 * @param leaves 12 * the searchers leaves to execute the searches on 13 * @param weight 14 * to match documents 15 * @param collector 16 * to receive hits 17 * @throws BooleanQuery.TooManyClauses If a query would exceed 18 * {@link BooleanQuery#getMaxClauseCount()} clauses. 19 */ 20 protected void search(List<AtomicReaderContext> leaves, Weight weight, Collector collector) 21 throws IOException { 22 23 // TODO: should we make this 24 // threaded...? the Collector could be sync'd? 25 // always use single thread: 26 for (AtomicReaderContext ctx : leaves) { // search each subreader 27 try { 28 collector.setNextReader(ctx); 29 } catch (CollectionTerminatedException e) { 30 // there is no doc of interest in this reader context 31 // continue with the following leaf 32 continue; 33 } 34 BulkScorer scorer = weight.bulkScorer(ctx, !collector.acceptsDocsOutOfOrder(), ctx.reader().getLiveDocs()); 35 if (scorer != null) { 36 try { 37 scorer.score(collector); 38 } catch (CollectionTerminatedException e) { 39 // collection was terminated prematurely 40 // continue with the following leaf 41 } 42 } 43 } 44 }
到這一步已經獲取到符合查詢條件的全部doc id了,可是咱們的查詢結果是須要顯示多有的字段的,因此也就是說Solr後面仍是會根據doc id再次取segment獲取全部字段信息,至於這是在哪裏實現的,在後續文章中會詳細描述。
總結: Solr的查詢過程仍是比較繞的,且有不少能夠優化的地方。本文主要簡述了Solr查詢的流程,對查詢過程當中的細節將在後續的文章裏面具體闡述。