Elasticsearch由淺入深(七)搜索引擎:_search含義、_multi-index搜索模式、分頁搜索以及深分頁性能問題、query string search語法以及_all metada

_search含義

_search查詢返回結果數據含義分析

GET _search
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 16,
    "successful": 16,
    "failed": 0
  },
  "hits": {
    "total": 19,
    "max_score": 1,
    "hits": [
      {
        "_index": ".kibana",
        "_type": "config",
        "_id": "5.2.0",
        "_score": 1,
        "_source": {
          "buildNum": 14695
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "AWypxxLYFCl_S-ox4wvd",
        "_score": 1,
        "_source": {
          "test_content": "my test"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "8",
        "_score": 1,
        "_source": {
          "test_field": "test client 2"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_doc",
        "_id": "10",
        "_score": 1,
        "_source": {
          "test_field": "test10 routing _id"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_doc",
        "_id": "11",
        "_score": 1,
        "_routing": "12",
        "_source": {
          "test_field": "test routing not _id"
        }
      },
      {
        "_index": "ecommerce",
        "_type": "product",
        "_id": "2",
        "_score": 1,
        "_source": {
          "name": "jiajieshi yagao",
          "desc": "youxiao fangzhu",
          "price": 25,
          "producer": "jiajieshi producer",
          "tags": [
            "fangzhu"
          ]
        }
      },
      {
        "_index": "ecommerce",
        "_type": "product",
        "_id": "4",
        "_score": 1,
        "_source": {
          "name": "special yagao",
          "desc": "special meibai",
          "price": 50,
          "producer": "special yagao producer",
          "tags": [
            "meibai"
          ]
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "6",
        "_score": 1,
        "_source": {
          "test_field": "test test"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "4",
        "_score": 1,
        "_source": {
          "test_field": "test4"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "2",
        "_score": 1,
        "_source": {
          "test_field": "replaces test2"
        }
      }
    ]
  }
}
View Code
  • took: 整個搜索請求花費了多少毫秒
  • timed_out:表示請求是否超時
  • hits:total:value表示返回結果的總數,relation表示關係 例如通常是eq表示相等
  • hits:max_score: 表示本次搜索的全部結果中,最大的相關度分數是多少,每一條document對於search的相關度,越相關,_score分數就越大,排位就越靠前
  • hits:hits: 表示查詢出來document的結果集合
  • shards:total表示打到的全部分片,
  • shards:successful表示打到的分片中查詢成功的分片,
  • shards:skipped表示打到的分片中跳過的分片,
  • shards:failed表示打到的分片中查詢失敗的分片

search timeout機制

由於ES默認是沒有timeout的,因此先描述一下場景假設咱們有些搜索應用,對時間是很敏感的,好比電商網站,你不能讓用戶等個10分鐘,若是那樣的話,人家早就走了,不來買東西了。前端

因而咱們就須要有timeout機制,指定每一個shard,就只能在timeout時間範圍內,將搜索到的部分數據(也可能全都搜索到了),直接返回給客戶端,而不是等到全部數據全都搜索出來之後在返回。node

這樣就能夠確保說,一次搜索請求能夠在用戶指定的timeout時長內完成,爲一些時間敏感的搜索應用提供良好的支持。ide

注意:ES在默認狀況下是沒有所謂的timeout的,好比說若是你的搜索特別慢,每一個shard都要花好幾分鐘才能查詢出來全部的數據,那麼你的搜索請求也會等待好幾分鐘以後纔會返回。
下面畫圖簡單描述一下timeout機制性能

語法:網站

GET _search?timeout=10ms

_multi-index&multi-type搜索模式

先說明一下,低版本的ES一個index是支持多type的,因此就有multi-type這一種搜索模式,這裏不作詳細講解,由於和multi-index搜索模式是基本同樣的。並且高版本的ES會棄用type。ui

multi-index搜索模式

  • /_search:全部索引下的全部數據都搜索出來
    GET /_search
  • /{index}/_search:指定一個index,搜索這個索引下的全部數據
    GET /test/_search
  • /index1,index2/_search:同時搜索兩個索引下的數據
    GET /test_index,test/_search
  • /1,2/_search: 經過通配符匹配多個索引,查詢多個索引下的數據
    GET /test*/_search
  • /_all/_search: 表明全部的index
    GET /_all/_search

搜索原理淺析

當客戶端發送查詢請求到ES時,會把請求打到全部的primary shard上去執行,由於每一個shard都包含部分數據,全部每一個shard均可能會包含搜索請求的結果,可是若是primary shard有replica shard,那麼請求也能夠打到replica shard上去。
以下圖所示:spa

分頁搜索以及deep paging性能揭祕

在實際應用中,分頁是必不可少的,例如,前端頁面展現數據給用戶每每都是分頁進行展現的。code

ES分頁搜索

Elasticsearch分頁搜索採用的是from+size。from表示查詢結果的起始下標,size表示從起始下標開始返回文檔的個數。
示例:blog

GET test_index/test_type/_search?from=0&size=3

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 9,
    "max_score": 1,
    "hits": [
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "AWypxxLYFCl_S-ox4wvd",
        "_score": 1,
        "_source": {
          "test_content": "my test"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "8",
        "_score": 1,
        "_source": {
          "test_field": "test client 2"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "6",
        "_score": 1,
        "_source": {
          "test_field": "test test"
        }
      }
    ]
  }
}

深分頁性能問題

什麼是深分頁(deep paging)?簡單來講,就是搜索的特別深,好比總共有60000條數據,三個primary shard,每一個shard上分了20000條數據,每頁是10條數據,這個時候,你要搜索到第1000頁,實際上要拿到的是10001~10010。排序

注意這裏千萬不要理解成每一個shard都是返回10條數據。這樣理解是錯誤的!

下面作一下詳細的分析:
請求首先多是打到一個不包含這個index的shard的node上去,這個node就是一個協調節點coordinate node,那麼這個coordinate node就會將搜索請求轉發到index的三個shard所在的node上去。好比說咱們以前說的狀況下,要搜索60000條數據中的第1000頁,實際上每一個shard都要將內部的20000條數據中的第10001~10010條數據,拿出來,不是才10條,是10010條數據。3個shard的每一個shard都返回10010條數據給協調節點coordinate node,coordinate node會收到總共30030條數據,而後在這些數據中進行排序,根據_score相關度分數,而後取到10001~10010這10條數據,就是咱們要的第1000頁的10條數據。
以下圖所示:

deep paging問題就是說from + size分頁太深,那麼每一個shard都要返回大量數據給coordinate node協調節點,會消耗大量的帶寬,內存,CPU。

query string search語法以及_all metadata原理

query string基礎語法

GET /test_index/test_type/_search?q=test_field:test
{
  "took": 7,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0.843298,
    "hits": [
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "6",
        "_score": 0.843298,
        "_source": {
          "test_field": "test test"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "8",
        "_score": 0.43445712,
        "_source": {
          "test_field": "test client 2"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "7",
        "_score": 0.25316024,
        "_source": {
          "test_field": "test client 1"
        }
      }
    ]
  }
}
View Code
GET /test_index/test_type/_search?q=+test_field:test
{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0.843298,
    "hits": [
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "6",
        "_score": 0.843298,
        "_source": {
          "test_field": "test test"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "8",
        "_score": 0.43445712,
        "_source": {
          "test_field": "test client 2"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "7",
        "_score": 0.25316024,
        "_source": {
          "test_field": "test client 1"
        }
      }
    ]
  }
}
View Code
GET /test_index/test_type/_search?q=-test_field:test
{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 6,
    "max_score": 1,
    "hits": [
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "AWypxxLYFCl_S-ox4wvd",
        "_score": 1,
        "_source": {
          "test_content": "my test"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "4",
        "_score": 1,
        "_source": {
          "test_field": "test4"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "2",
        "_score": 1,
        "_source": {
          "test_field": "replaces test2"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "1",
        "_score": 1,
        "_source": {
          "test_field1": "test field1",
          "test_field2": "partial updated test1"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "11",
        "_score": 1,
        "_source": {
          "num": 0,
          "tags": []
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "3",
        "_score": 1,
        "_source": {
          "test_field": "test3"
        }
      }
    ]
  }
}
View Code

對於query string只要掌握q=field:search content的語法,以及+和-的含義

  • +:表明包含這個篩選條件結果
  • -:表明不包含這個篩選條件的結果

_all metadata

GET /test_index/test_type/_search?q=test
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 5,
    "max_score": 0.843298,
    "hits": [
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "6",
        "_score": 0.843298,
        "_source": {
          "test_field": "test test"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "AWypxxLYFCl_S-ox4wvd",
        "_score": 0.3794414,
        "_source": {
          "test_content": "my test"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "8",
        "_score": 0.31387395,
        "_source": {
          "test_field": "test client 2"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "7",
        "_score": 0.18232156,
        "_source": {
          "test_field": "test client 1"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "1",
        "_score": 0.16203022,
        "_source": {
          "test_field1": "test field1",
          "test_field2": "partial updated test1"
        }
      }
    ]
  }
}
View Code

也就是在使用query string的時候,若是不指定field,那麼默認就是_all。_all元數據是在創建索引的時候產生的,咱們插入一條document,它裏面包含了多個field,此時ES會自動將多個field的值所有用字符串的方式串聯起來,變成一個長的字符串。這個長的字符串就是_all field的值。同時創建索引。
舉個例子:
對於一個document:

{
  "name": "jack",
  "age": 26,
  "email": "jack@sina.com",
  "address": "guamgzhou"
}

那麼"jack 26 jack@sina.com guamazhou",就會做爲這個document的_all fieldd的值,同時進行分詞後創建對應的倒排索引。
注意在生產環境中通常不會使用query string這種查詢方式。

相關文章
相關標籤/搜索