GET _search
{ "took": 1, "timed_out": false, "_shards": { "total": 16, "successful": 16, "failed": 0 }, "hits": { "total": 19, "max_score": 1, "hits": [ { "_index": ".kibana", "_type": "config", "_id": "5.2.0", "_score": 1, "_source": { "buildNum": 14695 } }, { "_index": "test_index", "_type": "test_type", "_id": "AWypxxLYFCl_S-ox4wvd", "_score": 1, "_source": { "test_content": "my test" } }, { "_index": "test_index", "_type": "test_type", "_id": "8", "_score": 1, "_source": { "test_field": "test client 2" } }, { "_index": "test_index", "_type": "test_doc", "_id": "10", "_score": 1, "_source": { "test_field": "test10 routing _id" } }, { "_index": "test_index", "_type": "test_doc", "_id": "11", "_score": 1, "_routing": "12", "_source": { "test_field": "test routing not _id" } }, { "_index": "ecommerce", "_type": "product", "_id": "2", "_score": 1, "_source": { "name": "jiajieshi yagao", "desc": "youxiao fangzhu", "price": 25, "producer": "jiajieshi producer", "tags": [ "fangzhu" ] } }, { "_index": "ecommerce", "_type": "product", "_id": "4", "_score": 1, "_source": { "name": "special yagao", "desc": "special meibai", "price": 50, "producer": "special yagao producer", "tags": [ "meibai" ] } }, { "_index": "test_index", "_type": "test_type", "_id": "6", "_score": 1, "_source": { "test_field": "test test" } }, { "_index": "test_index", "_type": "test_type", "_id": "4", "_score": 1, "_source": { "test_field": "test4" } }, { "_index": "test_index", "_type": "test_type", "_id": "2", "_score": 1, "_source": { "test_field": "replaces test2" } } ] } }
由於ES默認是沒有timeout的,因此先描述一下場景假設咱們有些搜索應用,對時間是很敏感的,好比電商網站,你不能讓用戶等個10分鐘,若是那樣的話,人家早就走了,不來買東西了。前端
因而咱們就須要有timeout機制,指定每一個shard,就只能在timeout時間範圍內,將搜索到的部分數據(也可能全都搜索到了),直接返回給客戶端,而不是等到全部數據全都搜索出來之後在返回。node
這樣就能夠確保說,一次搜索請求能夠在用戶指定的timeout時長內完成,爲一些時間敏感的搜索應用提供良好的支持。ide
注意:ES在默認狀況下是沒有所謂的timeout的,好比說若是你的搜索特別慢,每一個shard都要花好幾分鐘才能查詢出來全部的數據,那麼你的搜索請求也會等待好幾分鐘以後纔會返回。
下面畫圖簡單描述一下timeout機制性能
語法:網站
GET _search?timeout=10ms
先說明一下,低版本的ES一個index是支持多type的,因此就有multi-type這一種搜索模式,這裏不作詳細講解,由於和multi-index搜索模式是基本同樣的。並且高版本的ES會棄用type。ui
GET /_search
GET /test/_search
GET /test_index,test/_search
GET /test*/_search
GET /_all/_search
當客戶端發送查詢請求到ES時,會把請求打到全部的primary shard上去執行,由於每一個shard都包含部分數據,全部每一個shard均可能會包含搜索請求的結果,可是若是primary shard有replica shard,那麼請求也能夠打到replica shard上去。
以下圖所示:spa
在實際應用中,分頁是必不可少的,例如,前端頁面展現數據給用戶每每都是分頁進行展現的。code
Elasticsearch分頁搜索採用的是from+size。from表示查詢結果的起始下標,size表示從起始下標開始返回文檔的個數。
示例:blog
GET test_index/test_type/_search?from=0&size=3 { "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 9, "max_score": 1, "hits": [ { "_index": "test_index", "_type": "test_type", "_id": "AWypxxLYFCl_S-ox4wvd", "_score": 1, "_source": { "test_content": "my test" } }, { "_index": "test_index", "_type": "test_type", "_id": "8", "_score": 1, "_source": { "test_field": "test client 2" } }, { "_index": "test_index", "_type": "test_type", "_id": "6", "_score": 1, "_source": { "test_field": "test test" } } ] } }
什麼是深分頁(deep paging)?簡單來講,就是搜索的特別深,好比總共有60000條數據,三個primary shard,每一個shard上分了20000條數據,每頁是10條數據,這個時候,你要搜索到第1000頁,實際上要拿到的是10001~10010。排序
注意這裏千萬不要理解成每一個shard都是返回10條數據。這樣理解是錯誤的!
下面作一下詳細的分析:
請求首先多是打到一個不包含這個index的shard的node上去,這個node就是一個協調節點coordinate node,那麼這個coordinate node就會將搜索請求轉發到index的三個shard所在的node上去。好比說咱們以前說的狀況下,要搜索60000條數據中的第1000頁,實際上每一個shard都要將內部的20000條數據中的第10001~10010條數據,拿出來,不是才10條,是10010條數據。3個shard的每一個shard都返回10010條數據給協調節點coordinate node,coordinate node會收到總共30030條數據,而後在這些數據中進行排序,根據_score相關度分數,而後取到10001~10010這10條數據,就是咱們要的第1000頁的10條數據。
以下圖所示:
deep paging問題就是說from + size分頁太深,那麼每一個shard都要返回大量數據給coordinate node協調節點,會消耗大量的帶寬,內存,CPU。
GET /test_index/test_type/_search?q=test_field:test
{ "took": 7, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 3, "max_score": 0.843298, "hits": [ { "_index": "test_index", "_type": "test_type", "_id": "6", "_score": 0.843298, "_source": { "test_field": "test test" } }, { "_index": "test_index", "_type": "test_type", "_id": "8", "_score": 0.43445712, "_source": { "test_field": "test client 2" } }, { "_index": "test_index", "_type": "test_type", "_id": "7", "_score": 0.25316024, "_source": { "test_field": "test client 1" } } ] } }
GET /test_index/test_type/_search?q=+test_field:test
{ "took": 2, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 3, "max_score": 0.843298, "hits": [ { "_index": "test_index", "_type": "test_type", "_id": "6", "_score": 0.843298, "_source": { "test_field": "test test" } }, { "_index": "test_index", "_type": "test_type", "_id": "8", "_score": 0.43445712, "_source": { "test_field": "test client 2" } }, { "_index": "test_index", "_type": "test_type", "_id": "7", "_score": 0.25316024, "_source": { "test_field": "test client 1" } } ] } }
GET /test_index/test_type/_search?q=-test_field:test
{ "took": 2, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 6, "max_score": 1, "hits": [ { "_index": "test_index", "_type": "test_type", "_id": "AWypxxLYFCl_S-ox4wvd", "_score": 1, "_source": { "test_content": "my test" } }, { "_index": "test_index", "_type": "test_type", "_id": "4", "_score": 1, "_source": { "test_field": "test4" } }, { "_index": "test_index", "_type": "test_type", "_id": "2", "_score": 1, "_source": { "test_field": "replaces test2" } }, { "_index": "test_index", "_type": "test_type", "_id": "1", "_score": 1, "_source": { "test_field1": "test field1", "test_field2": "partial updated test1" } }, { "_index": "test_index", "_type": "test_type", "_id": "11", "_score": 1, "_source": { "num": 0, "tags": [] } }, { "_index": "test_index", "_type": "test_type", "_id": "3", "_score": 1, "_source": { "test_field": "test3" } } ] } }
對於query string只要掌握q=field:search content的語法,以及+和-的含義
GET /test_index/test_type/_search?q=test
{ "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 5, "max_score": 0.843298, "hits": [ { "_index": "test_index", "_type": "test_type", "_id": "6", "_score": 0.843298, "_source": { "test_field": "test test" } }, { "_index": "test_index", "_type": "test_type", "_id": "AWypxxLYFCl_S-ox4wvd", "_score": 0.3794414, "_source": { "test_content": "my test" } }, { "_index": "test_index", "_type": "test_type", "_id": "8", "_score": 0.31387395, "_source": { "test_field": "test client 2" } }, { "_index": "test_index", "_type": "test_type", "_id": "7", "_score": 0.18232156, "_source": { "test_field": "test client 1" } }, { "_index": "test_index", "_type": "test_type", "_id": "1", "_score": 0.16203022, "_source": { "test_field1": "test field1", "test_field2": "partial updated test1" } } ] } }
也就是在使用query string的時候,若是不指定field,那麼默認就是_all。_all元數據是在創建索引的時候產生的,咱們插入一條document,它裏面包含了多個field,此時ES會自動將多個field的值所有用字符串的方式串聯起來,變成一個長的字符串。這個長的字符串就是_all field的值。同時創建索引。
舉個例子:
對於一個document:
{ "name": "jack", "age": 26, "email": "jack@sina.com", "address": "guamgzhou" }
那麼"jack 26 jack@sina.com guamazhou",就會做爲這個document的_all fieldd的值,同時進行分詞後創建對應的倒排索引。
注意在生產環境中通常不會使用query string這種查詢方式。