Elasticsearch搜索與分析

時間 2021-02-18

標籤數組阿里雲 spa rest code orm blog 排序文檔欄目日誌分析简体版

原文原文鏈接

前言

本文檔主要簡單記錄一下在ElasticSearch中的一些搜索語句（基於阿里雲ElasticSearch環境）。數組

輕量搜索

最簡單的搜索應該就是一個簡單的get了，以下使用get來獲取數據：阿里雲

GET /index_name/_search

返回結果：spa

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 20,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "index_name",
        "_type" : "_doc",
        "_id" : "11",
        "_score" : 1.0,
        "_source" : {
          "tags" : [
            "gamegroup"
          ],
          "update_time" : 1607156354,
          "@timestamp" : "2021-02-18T03:40:01.235Z",
          "id" : 11,
          "name" : "對馬島之魂",
          "status" : 0,
          "create_time" : 1603073095,
          "@version" : "1",
        }
      },
      ···
    ]
  }
}

能夠看到咱們從index_name中獲取了10條數據（默認返回10條），返回的結果在數組hits中。須要注意的是，返回結果不單單是告知匹配到了哪一些文檔，還包含了文檔自己的全部數據。rest

同時，針對簡單匹配的搜索，咱們也能夠直接使用輕量搜索，在請求路徑中使用_search端點，並將查詢條件賦值給參數q。以下：code

GET /index_name/_search?q=name:馬島

返回結果：orm

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 5.1153607,
    "hits" : [
      {
        "_index" : "index_name",
        "_type" : "_doc",
        "_id" : "11",
        "_score" : 5.1153607,
        "_source" : {
          "update_time" : 1607156354,
          "@timestamp" : "2021-02-18T03:40:01.235Z",
          "id" : 11,
          "name" : "對馬島之魂",
          "status": 0,
          "create_time" : 1603073095,
          "@version" : "1",
        }
      }
    ]
  }
}

查詢表達式搜索

按上面的查詢咱們能夠實現最簡單的搜索方式，可是在實際應用中存在很大的侷限性，由於搜索業務的複雜性每每不會如此簡單。全部ElasticSearch提供了一個豐富靈活的查詢語句叫作查詢表達式，它支持更加複雜的查詢。好比：blog

GET /index_name/_search
{
  "query": {
    "match": {
      "name": "活動"
    }
  },
  "sort": [
    {
      "id": {
        "order": "desc"
      }
    }
  ],
  "from": 0,
  "size": 10
}

上面的查詢方式主要實現的是，查詢name能與活動相匹配的數據，偏移量爲0，獲取10條數據，同時按照id進行排序（在沒有指定排序方式的狀況下，默認按照匹配分數_source排序）。排序

複雜查詢

下面進行一些相比剛纔要更爲複雜的查詢，查詢index_name下的數據，要求name能夠匹配關鍵字「馬島」，同時須要rank_num的值大於5。ip

GET /index_name/_search
{
    "query" : {
        "bool": {
            "must": {
                "match" : {
                    "name" : "馬島" 
                }
            },
            "filter": {
                "range" : {
                    "rank_num" : { "gt" : 5 } 
                }
            }
        }
    }
}

短語匹配

這裏簡單介紹一下match與match_phrase，match匹配會將關鍵詞進行拆分品牌，而match_phrase不會將關鍵詞拆分。因此若是須要匹配短語的話，使用match_phrase替代match便可。文檔

 1 GET /index_name/_search
 2 {
 3     "query" : {
 4         "bool": {
 5             "must": {
 6                 "match_phrase" : {
 7                     "name" : "馬島" 
 8                 }
 9             },
10             "filter": {
11                 "range" : {
12                     "rank_num" : { "gt" : 5 } 
13                 }
14             }
15         }
16     }
17 }

高亮搜索(實踐於官方文檔有異，待肯定)

許多應用都傾向於在每一個搜索結果中高亮部分文本片斷，以便讓用戶知道爲什麼該文檔符合查詢條件。在 Elasticsearch 中檢索出高亮片斷也很容易。在前面的查詢中添加一個highlight字段便可。

簡單分析

Elasticsearch 有一個功能叫聚合（aggregations），容許咱們基於數據生成一些精細的分析結果。聚合與 SQL 中的 GROUP BY 相似但更強大。以下（按照parent_id字段進行聚合）：

1 GET /index_name/_search
2 {
3   "aggs": {
4     "all_interests": {
5       "terms": { "field": "parent_id" }
6     }
7   }
8 }

返回結果：

 1 {
 2    ...
 3    "hits": { ... },
 4    "aggregations": {
 5       "all_interests": {
 6          "buckets": [
 7             {
 8               "key" : 0,
 9               "doc_count" : 66422
10             },
11             {
12               "key" : -1,
13               "doc_count" : 4716
14             },
15             {
16               "key" : 27490,
17               "doc_count" : 1684
18             },
19             ...
20          ]
21       }
22    }
23 }

能夠看到表中，parent_id字段每一種數據佔的數量多少。同時咱們能夠在查詢時添加一下搜索語句：

GET /index_name/_search
{
  "query": {
    "match": {
      "platform": "pc"
    }
  },
  "aggs": {
    "all_interests": {
      "terms": { "field": "parent_id" }
    }
  }
}

返回結果：

 1 {
 2    ...
 3    "hits": { ... },
 4    "aggregations": {
 5       "all_interests": {
 6          "buckets": [
 7             {
 8               "key" : 0,
 9               "doc_count" : 52428
10             },
11             {
12               "key" : -1,
13               "doc_count" : 3494
14             },
15             {
16               "key" : 27490,
17               "doc_count" : 1684
18             },
19             ...
20          ]
21       }
22    }
23 }

分級彙總

查詢方式：

 1 GET /megacorp/employee/_search
 2 {
 3     "aggs" : {
 4         "all_interests" : {
 5             "terms" : { "field" : "interests" },
 6             "aggs" : {
 7                 "avg_age" : {
 8                     "avg" : { "field" : "age" }
 9                 }
10             }
11         }
12     }
13 }

返回結果：

...
  "all_interests": {
     "buckets": [
        {
           "key": "music",
           "doc_count": 2,
           "avg_age": {
              "value": 28.5
           }
        },
        {
           "key": "forestry",
           "doc_count": 1,
           "avg_age": {
              "value": 35
           }
        },
        {
           "key": "sports",
           "doc_count": 1,
           "avg_age": {
              "value": 25
           }
        }
     ]
  }

返回結果也是很好理解的，如最後一條數據表示sports的數量爲1，平均年齡爲25。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。