Elasticsearch由淺入深（七）搜索引擎：_search含義、_multi-index搜索模式、分頁搜索以及深分頁性能問題、query string search語法以及_all metada

時間 2019-11-06

標籤 elasticsearch 由淺入深搜索引擎 search 含義 multi index 搜索模式分頁以及性能問題 query string 語法 metada 欄目日誌分析简体版

原文原文鏈接

_search含義

_search查詢返回結果數據含義分析

GET _search

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 16,
    "successful": 16,
    "failed": 0
  },
  "hits": {
    "total": 19,
    "max_score": 1,
    "hits": [
      {
        "_index": ".kibana",
        "_type": "config",
        "_id": "5.2.0",
        "_score": 1,
        "_source": {
          "buildNum": 14695
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "AWypxxLYFCl_S-ox4wvd",
        "_score": 1,
        "_source": {
          "test_content": "my test"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "8",
        "_score": 1,
        "_source": {
          "test_field": "test client 2"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_doc",
        "_id": "10",
        "_score": 1,
        "_source": {
          "test_field": "test10 routing _id"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_doc",
        "_id": "11",
        "_score": 1,
        "_routing": "12",
        "_source": {
          "test_field": "test routing not _id"
        }
      },
      {
        "_index": "ecommerce",
        "_type": "product",
        "_id": "2",
        "_score": 1,
        "_source": {
          "name": "jiajieshi yagao",
          "desc": "youxiao fangzhu",
          "price": 25,
          "producer": "jiajieshi producer",
          "tags": [
            "fangzhu"
          ]
        }
      },
      {
        "_index": "ecommerce",
        "_type": "product",
        "_id": "4",
        "_score": 1,
        "_source": {
          "name": "special yagao",
          "desc": "special meibai",
          "price": 50,
          "producer": "special yagao producer",
          "tags": [
            "meibai"
          ]
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "6",
        "_score": 1,
        "_source": {
          "test_field": "test test"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "4",
        "_score": 1,
        "_source": {
          "test_field": "test4"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "2",
        "_score": 1,
        "_source": {
          "test_field": "replaces test2"
        }
      }
    ]
  }
}

View Code

took: 整個搜索請求花費了多少毫秒
timed_out:表示請求是否超時
hits:total:value表示返回結果的總數，relation表示關係例如通常是eq表示相等
hits:max_score: 表示本次搜索的全部結果中，最大的相關度分數是多少，每一條document對於search的相關度，越相關，_score分數就越大，排位就越靠前
hits:hits：表示查詢出來document的結果集合
shards:total表示打到的全部分片，
shards:successful表示打到的分片中查詢成功的分片,
shards:skipped表示打到的分片中跳過的分片,
shards:failed表示打到的分片中查詢失敗的分片

search timeout機制

由於ES默認是沒有timeout的，因此先描述一下場景假設咱們有些搜索應用，對時間是很敏感的，好比電商網站，你不能讓用戶等個10分鐘，若是那樣的話，人家早就走了，不來買東西了。前端

因而咱們就須要有timeout機制，指定每一個shard,就只能在timeout時間範圍內，將搜索到的部分數據（也可能全都搜索到了），直接返回給客戶端，而不是等到全部數據全都搜索出來之後在返回。node

這樣就能夠確保說，一次搜索請求能夠在用戶指定的timeout時長內完成，爲一些時間敏感的搜索應用提供良好的支持。ide

注意：ES在默認狀況下是沒有所謂的timeout的，好比說若是你的搜索特別慢，每一個shard都要花好幾分鐘才能查詢出來全部的數據，那麼你的搜索請求也會等待好幾分鐘以後纔會返回。
下面畫圖簡單描述一下timeout機制性能

語法：網站

GET _search?timeout=10ms

_multi-index&multi-type搜索模式

先說明一下，低版本的ES一個index是支持多type的，因此就有multi-type這一種搜索模式，這裏不作詳細講解，由於和multi-index搜索模式是基本同樣的。並且高版本的ES會棄用type。ui

multi-index搜索模式

/_search:全部索引下的全部數據都搜索出來
```
GET /_search
```
/{index}/_search：指定一個index,搜索這個索引下的全部數據
```
GET /test/_search
```
/index1,index2/_search:同時搜索兩個索引下的數據
```
GET /test_index,test/_search
```
/1,2/_search: 經過通配符匹配多個索引，查詢多個索引下的數據
```
GET /test*/_search
```
/_all/_search: 表明全部的index
```
GET /_all/_search
```

搜索原理淺析

當客戶端發送查詢請求到ES時，會把請求打到全部的primary shard上去執行，由於每一個shard都包含部分數據，全部每一個shard均可能會包含搜索請求的結果，可是若是primary shard有replica shard，那麼請求也能夠打到replica shard上去。
以下圖所示：spa

分頁搜索以及deep paging性能揭祕

在實際應用中，分頁是必不可少的，例如，前端頁面展現數據給用戶每每都是分頁進行展現的。code

ES分頁搜索

Elasticsearch分頁搜索採用的是from+size。from表示查詢結果的起始下標，size表示從起始下標開始返回文檔的個數。
示例：blog

GET test_index/test_type/_search?from=0&size=3

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 9,
    "max_score": 1,
    "hits": [
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "AWypxxLYFCl_S-ox4wvd",
        "_score": 1,
        "_source": {
          "test_content": "my test"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "8",
        "_score": 1,
        "_source": {
          "test_field": "test client 2"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "6",
        "_score": 1,
        "_source": {
          "test_field": "test test"
        }
      }
    ]
  }
}

深分頁性能問題

什麼是深分頁（deep paging）?簡單來講，就是搜索的特別深，好比總共有60000條數據，三個primary shard,每一個shard上分了20000條數據，每頁是10條數據，這個時候，你要搜索到第1000頁，實際上要拿到的是10001~10010。排序

注意這裏千萬不要理解成每一個shard都是返回10條數據。這樣理解是錯誤的！

下面作一下詳細的分析：
請求首先多是打到一個不包含這個index的shard的node上去，這個node就是一個協調節點coordinate node，那麼這個coordinate node就會將搜索請求轉發到index的三個shard所在的node上去。好比說咱們以前說的狀況下，要搜索60000條數據中的第1000頁，實際上每一個shard都要將內部的20000條數據中的第10001~10010條數據，拿出來，不是才10條，是10010條數據。3個shard的每一個shard都返回10010條數據給協調節點coordinate node，coordinate node會收到總共30030條數據，而後在這些數據中進行排序，根據_score相關度分數，而後取到10001~10010這10條數據，就是咱們要的第1000頁的10條數據。
以下圖所示：

deep paging問題就是說from + size分頁太深，那麼每一個shard都要返回大量數據給coordinate node協調節點，會消耗大量的帶寬，內存，CPU。

query string search語法以及_all metadata原理

query string基礎語法

GET /test_index/test_type/_search?q=test_field:test

{
  "took": 7,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0.843298,
    "hits": [
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "6",
        "_score": 0.843298,
        "_source": {
          "test_field": "test test"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "8",
        "_score": 0.43445712,
        "_source": {
          "test_field": "test client 2"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "7",
        "_score": 0.25316024,
        "_source": {
          "test_field": "test client 1"
        }
      }
    ]
  }
}

View Code

GET /test_index/test_type/_search?q=+test_field:test

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0.843298,
    "hits": [
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "6",
        "_score": 0.843298,
        "_source": {
          "test_field": "test test"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "8",
        "_score": 0.43445712,
        "_source": {
          "test_field": "test client 2"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "7",
        "_score": 0.25316024,
        "_source": {
          "test_field": "test client 1"
        }
      }
    ]
  }
}

View Code

GET /test_index/test_type/_search?q=-test_field:test

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 6,
    "max_score": 1,
    "hits": [
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "AWypxxLYFCl_S-ox4wvd",
        "_score": 1,
        "_source": {
          "test_content": "my test"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "4",
        "_score": 1,
        "_source": {
          "test_field": "test4"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "2",
        "_score": 1,
        "_source": {
          "test_field": "replaces test2"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "1",
        "_score": 1,
        "_source": {
          "test_field1": "test field1",
          "test_field2": "partial updated test1"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "11",
        "_score": 1,
        "_source": {
          "num": 0,
          "tags": []
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "3",
        "_score": 1,
        "_source": {
          "test_field": "test3"
        }
      }
    ]
  }
}

View Code

對於query string只要掌握q=field:search content的語法，以及+和-的含義

+：表明包含這個篩選條件結果
-：表明不包含這個篩選條件的結果

_all metadata

GET /test_index/test_type/_search?q=test

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 5,
    "max_score": 0.843298,
    "hits": [
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "6",
        "_score": 0.843298,
        "_source": {
          "test_field": "test test"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "AWypxxLYFCl_S-ox4wvd",
        "_score": 0.3794414,
        "_source": {
          "test_content": "my test"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "8",
        "_score": 0.31387395,
        "_source": {
          "test_field": "test client 2"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "7",
        "_score": 0.18232156,
        "_source": {
          "test_field": "test client 1"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "1",
        "_score": 0.16203022,
        "_source": {
          "test_field1": "test field1",
          "test_field2": "partial updated test1"
        }
      }
    ]
  }
}

View Code

也就是在使用query string的時候，若是不指定field，那麼默認就是_all。_all元數據是在創建索引的時候產生的，咱們插入一條document，它裏面包含了多個field,此時ES會自動將多個field的值所有用字符串的方式串聯起來，變成一個長的字符串。這個長的字符串就是_all field的值。同時創建索引。
舉個例子：
對於一個document：

{
  "name": "jack",
  "age": 26,
  "email": "jack@sina.com",
  "address": "guamgzhou"
}

那麼"jack 26 jack@sina.com guamazhou",就會做爲這個document的_all fieldd的值，同時進行分詞後創建對應的倒排索引。
注意在生產環境中通常不會使用query string這種查詢方式。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。