假設如今你要查詢第100頁的10條數據,可是對於es來講,from=1000000,size=100,這時 es須要從各個分片上查詢出來10000100條數據,而後彙總計算後從其中取出100條。若是有5個分片則須要查詢出來5*10000100條數據,若是如今有一個100個查詢請求呢,50億左右的數據,一條數據2KB,就須要9000G左右的內存,什麼樣的機器可以支持這麼龐大的查詢,因此若是你在使用es的分頁查詢過程當中,剛開始翻頁可能速度比較快,可能到第一百頁查詢就須要4-5s,翻到1000頁之後,直接報錯了。java
NativeSearchQueryBuilder query = new NativeSearchQueryBuilder(); if(!StringUtils.isEmpty(ulqBean.getStartTime()) && !StringUtils.isEmpty(ulqBean.getEndTime())) { query.withQuery(QueryBuilders.rangeQuery("logTime").from(ulqBean.getStartTime()).to(ulqBean.getEndTime())); } if(!StringUtils.isEmpty(ulqBean.getSearch())) { BoolQueryBuilder shouldQuery = QueryBuilders.boolQuery() .should(QueryBuilders.wildcardQuery("content", "*" + ulqBean.getSearch() + "*")) .should(QueryBuilders.wildcardQuery("code", "*" + ulqBean.getSearch() + "*")) .should(QueryBuilders.wildcardQuery("name", "*" + ulqBean.getSearch() + "*")); query.withQuery(shouldQuery); } query.withSort(new FieldSortBuilder("logTime").order(SortOrder.DESC)); if(ulqBean.getPageNo() != null && ulqBean.getPageSize() != null) { //es結果從第0頁開始算 query.withPageable(new PageRequest(ulqBean.getPageNo() - 1, ulqBean.getPageSize())); } NativeSearchQuery build = query.build(); org.springframework.data.domain.Page<ConductAudits> conductAuditsPage = template.queryForPage(build, ConductAudits.class); ulqBean.getPagination().setTotal((int) conductAuditsPage.getTotalElements()); ulqBean.getPagination().setList(conductAuditsPage.getContent());
[root@localhost elasticsearch-2.4.6]# curl -XGET 'http://11.12.84.126:9200/_audit_0102/_log_0102/_search?size=2&from=10000&pretty=true' { "error" : { "root_cause" : [ { "type" : "query_phase_execution_exception", "reason" : "Result window is too large, from + size must be less than or equal to: [10000] but was [10002]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level parameter." } ], "type" : "search_phase_execution_exception", "reason" : "all shards failed", "phase" : "query", "grouped" : true, "failed_shards" : [ { "shard" : 0, "index" : "_audit_0102", "node" : "f_CQitYESZedx8ZbyZ6bHA", "reason" : { "type" : "query_phase_execution_exception", "reason" : "Result window is too large, from + size must be less than or equal to: [10000] but was [10002]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level parameter." } } ] }, "status" : 500 }
若是你的數據大小在你的控制範圍內,想要進一步深度分頁,你能夠經過以下命令修改窗口大小:node
curl -XPUT "http://11.12.84.126:9200/_audit_0102/_settings" -d '{ "index": { "max_result_window": 100000 } }'
可是這只是容許你更進一步深度分頁,卻沒有從根本上解決深度分頁的問題,並且隨着頁碼的增長,系統資源佔用成指數級上升,很容易就會出現OOM。spring
這時若是你的產品經理要求你按照常規的作法去分頁,你能夠很明確的告訴他,你的系統不支持這麼深度的分頁,翻的越深,性能也就越差。數據庫
不過這種深度分頁場景在現實中確實存在,有些場景下,咱們能夠說服產品經理不多有人會翻看好久以前的歷史數據,可是有些場景下可能一天都產生幾百萬。這個時候咱們能夠根據具體場景具體分析。api
scroll查詢原理是在第一次查詢的時候一次性生成一個快照,根據上一次的查詢的id來進行下一次的查詢,這個就相似於關係型數據庫的遊標,而後每次滑動都是根據產生的遊標id進行下一次查詢,這種性能比上面說的分頁性能要高出不少,基本都是毫秒級的。 注意:scroll不支持跳頁查詢。 使用場景:對實時性要求不高的查詢,例如微博或者頭條滾動查詢。 具體java代碼實現緩存
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery(); QueryBuilder builder = QueryBuilders.queryStringQuery("123456").field("code"); boolQueryBuilder.must(QueryBuilders.termQuery("logType", "10")) .must(builder);
SearchResponse response1 = client.prepareSearch("_audit_0221").setTypes("_log_0221") .setQuery(boolQueryBuilder) .setSearchType(.setSearchType(SearchType.DEFAULT)) .setSize(10).setScroll(TimeValue.timeValueMinutes(5)) .addSort("logTime", SortOrder.DESC) .execute().actionGet();//第一次查詢 for (SearchHit searchHit : response1.getHits().hits()) { biz handle....; }
while (response1.getHits().hits().length>0) { for (SearchHit searchHit : response1.getHits().hits()) { System.out.println(searchHit.getSource().toString()); } response1 = client.prepareSearchScroll(response1.getScrollId()).setScroll(TimeValue.timeValueMinutes(5)) .execute().actionGet(); }
若是是一次性的搜索,能夠清除查詢結果,畢竟能夠減小對內存的消耗。less
ClearScrollRequest request = new ClearScrollRequest(); request.addScrollId(scrollId); client.clearScroll(request);
使用場景:我有500w用戶,須要遍歷全部用戶發送數據,而且對順序沒有要求,這個時候咱們可使用scroll-scan。dom
具體使用方式:curl
SearchResponse response = client.prepareSearch("_audit_0221").setTypes("_log_0221") .setQuery(boolQueryBuilder) .setSearchType(SearchType.SCAN) .setSize(5).setScroll(TimeValue.timeValueMinutes(5)) .addSort("logTime", SortOrder.DESC) .execute().actionGet();
SearchResponse response1 = client.prepareSearchScroll(scrollId).setScroll(TimeValue.timeValueMinutes(5)) .execute().actionGet(); while (response1.getHits().hits().length>0) { for (SearchHit searchHit : response1.getHits().hits()) { System.out.println(searchHit.getSource().toString()); } response1 = client.prepareSearchScroll(response1.getScrollId()).setScroll(TimeValue.timeValueMinutes(5)) .execute().actionGet(); }
QueryBuilder builder = QueryBuilders.boolQuery().filter(QueryBuilders.termQuery("code", "123456")); SearchQuery searchQuery = new NativeSearchQueryBuilder().withIndices("_audit_0221") .withTypes("_log_0221").withQuery(builder).withPageable(new PageRequest(0, 2)).build(); String srollId = template.scan(searchQuery, 100000, false); while (true) { Page<ConductAudits> scroll = template.scroll(srollId, 1000, ConductAudits.class); if(scroll.getContent().size()==0) { break; } List<ConductAudits> content = scroll.getContent(); for (ConductAudits c: content ) { System.out.println(JSON.toJSONString(c)); } // System.out.println(JSON.toJSONString(scroll.getContent()+"\r\n")); for (ConductAudits conductAudits : scroll.getContent()) { System.out.println(JSON.toJSONString(conductAudits+"\r\n")); } }
PS:elasticSearch各個版本可能都稍有區別,可是原理相同。本文的不少代碼都是基於es 2.4.6elasticsearch