寫給本身的Elasticsearch使用指南

時間 2019-12-05

原文原文鏈接

寫給本身的Elasticsearch使用指南

ES在處理大數據搜索方面擁有關係型數據庫不可比擬的速度優點。這篇不是什麼專業的ES指南，也不是ES分析，這裏只是我在使用ES中遇到的一些問題，一些搜索方式。由於ES的文檔和API查詢起來比較困難，所以本身在查詢翻譯文檔時老是耗費不少時間，因而就想把本身遇到過的問題和搜索記錄下來，給本身概括一個簡單的經常使用的ES使用指南。html

什麼是ES的文檔？

ES中文檔是以key-value的json數據包形式存儲的，它有三個元數據。mysql

_index：文檔存儲的地方

數據被存儲和索引在分片「shards」中，索引只是把一個或多個分片組合在一塊兒的邏輯空間。這些由ES實現。使用者無需關心。

_type：表示一種事務，在5.x中type被移除。
_id：文檔的惟一標識符。

ES的映射是怎麼回事？

映射(mapping)機制用於進行字段類型確認，將每一個字段匹配爲一種特定的類型（string，number，booleans，date等）。
分析(analysis)機制用於進行全文文本（full text）分詞，以創建供搜索用的反向索引。

查看索引（index）的 type的mapping

ES爲對字段類型進行猜想，動態生成了字段和類型的映射關係。sql

date 類型的字段和 string 類型的字段的索引方式不一樣的，搜索的結果也是不一樣的。每一種數據類型都以不一樣的方式進行索引。

GET    /user/_mapping/user

--Response
{
    "user": {
        "mappings": {
            "user": {
                "properties": {
                    "height": {
                        "type": "long"
                    },
                    "nickname": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    },
                }
            }
        }
    }
}

確切值（Exact values）和全文文本（Full text）

確切值是肯定的，表示date或number等；
全文文本指非結構化數據，如 這是一個文本

Index參數控制字符串以何種方式被索引

值	解釋
analyzed	首先分析這個字符串，而後索引。（以全文形式索引此字段）
not_analyzed	索引這個字段，使之能夠被索引，但索引內容和指定值同樣
no	不索引這個字段

string 類型字段默認值是 analyzed。若是想映射字段爲確切值，則設置爲 not_analyzed

{
    "nickname": {
      "type": "string",
    "index": "not_analyzed"
  }
}

從空搜索來看ES搜索的返回

GET    /_search

--Response
{
    "took": 7,                        // 整個搜索的請求花費的時間（毫秒）
    "timed_out": false,                // 是否請求超時
    "_shards": {
        "total": 25,                // 參與查詢的分片
        "successful": 25,            // 成功的分片
        "skipped": 0,                // 跳過的
        "failed": 0                    // 失敗的
    },
    "hits": {
        "total": 1291,                // 匹配到到文檔數量
        "max_score": 1,                // 查詢結果 _score 中的最大值
        "hits": [
            {
                "_index": "feed",
                "_type": "feed",
                "_id": "1JNC42oB07Tkhuy89JSd",
                "_score": 1,
                "_source": {
                 }
              }
       ]
    }
}

建立索引

建立一個名爲 megacorp 的索引

PUT /user/

--Response
{
    "megacorp": {
        "aliases": {},
        "mappings": {},
        "settings": {
            "index": {
                "creation_date": "1558956839146",        // 建立時間（微秒時間戳）
                "number_of_shards": "5",
                "number_of_replicas": "1",
                "uuid": "pGBqNmxrR1S8I7_jAYdvBA",        // 惟一id
                "version": {
                    "created": "6060199"
                },
                "provided_name": "megacorp"
            }
        }
    }
}

建立文檔

PUT /user/user/1001
--Body
{
    "nickname": "你有病啊",
    "height": 180,
    "expect": "我喜歡看_書",
    "tags": ["大學黨", "求偶遇"]
}

--Response
{
    "_index": "user",
    "_type": "user",
    "_id": "10001",
    "_version": 1,        // 版本號，ES中每一個文檔都有版本號，每當文檔變化（包括刪除）_version會增長
    "result": "created",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 1,
    "_primary_term": 1
}

使用過濾器(filter)

搜索 height 大於 30 且 nickname 爲 threads 的文檔

GET    /user/user/_search
--Body
{
    "query": {
      "filtered": {
        "filter": {
          "range":{
            "height":{
              "gt": 170
          }
        }
      },
      "query": {
          "match": {
            "nickname": "threads"
        }
      }
    }
  }
}

結構化查詢

結構化查詢須要傳遞 query 參數。

GET /_search
{
    "query": 子查詢
}

# 子查詢
{
    "query_name": {
      "argument": value,...
  }
}
# 子查詢指向特定字段
{
    "query_name": {
        "field_name": {
            "argument": value,...
        }
    }
}

合併多子句

查詢子句能夠合併簡單的子句爲一個複雜的查詢語句。如：數據庫

簡單子句用以在將查詢字符串與一個字段（或多個字段）進行比較。

複合子句用以合併其餘的子句。

GET /user/user/_search
--Body
{
    "query": {
        "bool": {
            "must": {                            // must:必須知足該子句的條件
                "match": {
                    "nickname": "threads"
                }
            },
            "must_not": {                        // must_not: 必須不知足該子句條件
                "match": {
                    "height": 170
                }
            },
            "should": [                            // should: 結果可能知足該數組內的條件
                {
                    "match": {
                        "nickname": "threads"
                    }
                },
                {
                    "match": {
                        "expect": "我喜歡唱歌、跳舞、打遊戲"
                    }
                }
            ]
        }
    }
}
--Response
{
    "hits": {
        "total": 1,
        "max_score": 0.91862875,
        "hits": [
            {
                "_index": "user",
                "_type": "user",
                "_id": "98047",
                "_score": 0.91862875,
                "_source": {
                    "nickname": "threads",
                    "height": 175,
                    "expect": "我喜歡唱歌、跳舞、打遊戲",
                    "tags": [
                        "工做黨",
                        "吃雞"
                    ]
                }
            }
        ]
    }
}

全文搜索

GET /user/user/_search
--Body
{
    "query":{
        "match":{
            "expect": "打遊戲"
        }
    }
}

--Response
{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 3,
        "max_score": 0.8630463,
        "hits": [
            {
                "_index": "user",
                "_type": "user",
                "_id": "98047",
                "_score": 0.8630463,                            // 匹配分
                "_source": {
                    "nickname": "threads",
                    "height": 175,
                    "expect": "我喜歡唱歌、跳舞、打遊戲",
                    "tags": [
                        "工做黨",
                        "吃雞"
                    ]
                }
            },
            {
                "_index": "user",
                "_type": "user",
                "_id": "94302",
                "_score": 0.55900055,                    
                "_source": {
                    "nickname": "搖了搖頭",
                    "height": 173,
                    "expect": "我喜歡rap、跳舞、打遊戲",
                    "tags": [
                        "工做黨",
                        "吃飯"
                    ]
                }
            },
            {
                "_index": "user",
                "_type": "user",
                "_id": "91031",
                "_score": 0.53543615,
                "_source": {
                    "nickname": "你有病啊",
                    "height": 180,
                    "expect": "我喜歡學習、逛街、打遊戲",
                    "tags": [
                        "大學黨",
                        "求偶遇"
                    ]
                }
            }
        ]
    }
}

ES根據結果相關性評分來對結果集進行排序。即文檔與查詢條件的匹配程度

短語搜索

確切的匹配若干單詞或短語。

GET /user/user/_search
--Body
{
    "query": {
      "match_phrase": {
        "expect": "學習"
    }
  }
}

--Response
{
    "took": 3,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 1,
        "max_score": 1.357075,
        "hits": [
            {
                "_index": "user",
                "_type": "user",
                "_id": "91031",
                "_score": 1.357075,
                "_source": {
                    "nickname": "你有病啊",
                    "height": 180,
                    "expect": "我喜歡學習、逛街、打遊戲",
                    "tags": [
                        "大學黨",
                        "求偶遇"
                    ]
                }
            }
        ]
    }
}

結構化過濾

一條過濾語句會詢問每一個文檔的字段值是否包含着特定值：json

是否 created 的日期範圍在 xxx 到 xxx

是否 expect 包含 學習

是否 location 字段中的地理位置與目標點相距不超過 xxkm

過濾語句和查詢語句的性能對比

使用過濾語句獲得的結果集，快速匹配運算並存入內存是十分方便的，每一個文檔僅須要1個字節。

查詢語句不只要查詢相匹配的文檔，還須要計算每一個文檔的相關性，因此通常來講查詢語句要比過濾語句更耗時，而且查詢結果也不可緩存。
數組

什麼狀況下使用過濾語句，何時使用查詢語句？

原則上來講：使用查詢語句作全文文本搜索或其餘須要進行相關性評分的時候，剩下的所有使用過濾語句。
緩存

高亮結果

GET /user/user/_search
--Body
{
    "query": {
      "match_phrase": {
        "expect": "學習"
    }
  },
  "highlight": {
      "fields": {
        "expect": {}
    }
  }
}

排序

GET    /user/user/_search
--Body
{
    "query": {
        "match_all": {
        }
    },
    "size": 1000,                                        // 返回的數據集大小
    "sort": {
        "nickname.keyword": {                // 按nickname排序
            "order": "desc"
        },
        "height": {                                    // 按height排序
            "order": "desc"
        }
    }
}

搜索結果分頁

GET    /user/user/_search

--Body
{
    "from": 2,                //  同mysql 的 offset
    "size": 1                //  同mysql 的 limit
}

分析

GET    /user/user/_search
--Body
{
    "aggs": {
      "all_tags": {
        "terms": {
          "field": "tags"  // 這種分詞對於中文很不友好，會把「學習」分爲「學」，「習」
        "field": "tags.keyword"    //     5.x後的ES，使用這種寫法能夠完美分詞
      },
      "aggs": {
          "avg_height": {
          "avg": {
              "field": "height"
          }
      }
    }
  }
}

查詢過濾語句集合

`term` 過濾

term 主要用於精確匹配哪些值，好比數字，日期，布爾值或 not_analyzed 的字符串

{"term": {"height": 175}}                                        // height值爲175 （number類型）
{"term": {"date": "2014-09-01"}}                        // date值爲2014-09-01（date類型）
{"term": {"public": true}}                                    // public值爲true（布爾類型）
{"term": {"nickname": "threads"}}                        // nickname值爲threads（full_text類型）

`terms` 過濾

terms 跟 term 同樣，但 terms 容許指定多個匹配條件。若是某個字段指定了多個值，那麼文檔須要一塊兒去作匹配文檔。

// 匹配 nickname 爲 threads 或 搖了搖頭 的結果（使用keyword關鍵詞匹配中文）
{
    "query": {
        "terms": {
            "nickname.keyword": ["threads", "搖了搖頭"]
        }
    }
}

`range` 過濾

range 過濾是按照指定範圍查找數據。

{
    "query": {"range":{"height":{"gte": 150, "lt": 180}}}
}
// gt:         大於
// gte:     大於等於
// lt:         小於
// lte:        小於等於

`exists` 和 `missing` 過濾

exists 和 missing 過濾能夠用於查找文檔中是否包含指定字段或沒有某個字段，相似於SQL語句中的 IS_NULL 條件。

// 查詢存在 nickname 字段的結果
{"exists": {"field": "nickname"}}

`bool` 過濾

bool 過濾能夠用來合併多個過濾條件查詢結果的布爾邏輯：數據結構

must：多個查詢條件的徹底匹配，至關於 and
must_not: 多個查詢條件的相反匹配，至關於 not
should：至少有一個查詢條件匹配，至關於 or

{
    "bool": {
      "must":    {"term": {"nickname": "threads"}},
    "must_not":    {"term": {"height": 165}},
    "should": [
            {"term": {"height": 175}},
        {"term": {"nickname": "threads"}},
    ]
  }
}

`match_all` 查詢

使用 match_all 能夠查詢到因此文檔，是沒有查詢條件下的默認語句。（經常使用於合併過濾條件）app

{
    "match_all":{}
}

match 查詢

match 查詢是一個標準查詢，無論你須要全文本查詢仍是精確查詢基本上都要用到它。elasticsearch

{
    "match": {"nickname": "threads"}
}

`multi_match` 查詢

multi_match 查詢容許你作 match 查詢的基礎上同時搜索多個字段：

// 搜索 nickname 或 expect 包含 threads 的結果
{
    "multi_match": {
      "query": "threads",
    "fields": ["nickname", "expect"]
  }
}

`bool` 查詢

若是 bool 查詢下沒有 must 子句，那至少應該有一個 should 子句。可是若是有 must 子句，那麼沒有 should 子句也能夠進行查詢。

{
      "query":   {
        "bool": {
              "must": {
                  "multi_match": {"query": "學習", "fields": ["nickname", "expect"]}
              },
              "must_not": {
                  "match": {"height": 175}
              },
              "should": [
                  {"match": {"nickname": "threads"}}
              ]
        }
    }
}

使用 `filter` 帶過濾的查詢

使用 filter 來同時使一個語句中包含 查詢 和 過濾

// nickname 爲 threads 的結果，並在此結果集中篩選出 height 爲 175 的結果
{
    "query": {
        "bool": {
            "must": {"match": {"nickname": "threads"}},
            "filter": {"term": {"height":175}}
        }
    }
}

查詢語句的分析

驗證一個查詢語句的對錯？

GET        /user/user/_validate/query
{
    "query": {
      "match_all": {"nickname": "threads"}
  }
}

--Response
{
    "valid": false,
    "error": "ParsingException[[4:13] [match_all] unknown field [nickname], parser not found]; nested: XContentParseException[[4:13] [match_all] unknown field [nickname], parser not found];; org.elasticsearch.common.xcontent.XContentParseException: [4:13] [match_all] unknown field [nickname], parser not found"
}

如何理解一個查詢語句的執行？

GET        /user/user/_validate/query
{
    "query": {
      "match_all": {"nickname": "threads"}
  }
}

--Response
{
    "valid": false,
    "error": "ParsingException[[4:13] [match_all] unknown field [nickname], parser not found]; nested: XContentParseException[[4:13] [match_all] unknown field [nickname], parser not found];; org.elasticsearch.common.xcontent.XContentParseException: [4:13] [match_all] unknown field [nickname], parser not found"
}

// 經過返回能夠看出，驗證結果是非法的

ES查詢時常常出現的異常

在使用聚合時關於fielddata的異常

{
    "error": {
        "root_cause": [
            {
                "type": "illegal_argument_exception",
                "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [tags] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
            }
        ],
        ...
    },
    "status": 400
}

5.x後對排序，聚合這些操做用單獨的數據結構(fielddata)緩存到內存裏了，須要單獨開啓。
開啓fielddata

PUT user/_mapping/user/
--Body
{
  "properties": {
    "tags": { 
      "type":     "text",
      "fielddata": true
    }
  }
}

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。

寫給本身的Elasticsearch使用指南

寫給本身的Elasticsearch使用指南

什麼是ES的文檔？

ES的映射是怎麼回事？

查看索引（index）的 type的mapping

確切值（Exact values）和全文文本（Full text）

Index參數控制字符串以何種方式被索引

從 空搜索 來看ES搜索的返回

建立索引

建立文檔

使用過濾器(filter)

結構化查詢

合併多子句

全文搜索

短語搜索

結構化過濾

過濾語句和查詢語句的性能對比

什麼狀況下使用過濾語句，何時使用查詢語句？

高亮結果

排序

搜索結果分頁

分析

查詢過濾語句集合

term 過濾

terms 過濾

range 過濾

exists 和 missing 過濾

bool 過濾

match_all 查詢

match 查詢

multi_match 查詢

bool 查詢

使用 filter 帶過濾的查詢

查詢語句的分析

驗證一個查詢語句的對錯？

如何理解一個查詢語句的執行？

ES查詢時常常出現的異常

在使用聚合時關於fielddata的異常

從空搜索來看ES搜索的返回

`term` 過濾

`terms` 過濾

`range` 過濾

`exists` 和 `missing` 過濾

`bool` 過濾

`match_all` 查詢

`multi_match` 查詢

`bool` 查詢

使用 `filter` 帶過濾的查詢