吐血整理：一文看懂ES的R，查詢與聚合

時間 2020-09-25

原文原文鏈接

對es查詢的索引的company，其有以下字段，下面是一個示例數據java

"id": "1", //id
"name": "張三",//姓名
"sex": "男",//性別
"age": 49,//年齡
"birthday": "1970-01-01",//生日
"position": "董事長",//職位
"joinTime": "1990-01-01",//入職時間,日期格式
"modified": "1562167817000",//修改時間，毫秒
"created": "1562167817000"  //建立時間，毫秒

下面的搜索都會將關係型數據庫語句轉換成es的搜索api以及參數。程序員

主要是用post方式，用DSL（結構化查詢）語句進行搜索。sql

1、查詢數據庫

一、簡單搜索json

【sql】
  select * from company
【ES】有兩種方式
  1、GET http://192.168.197.100:9200/company/_search
  2、POST http://192.168.197.100:9200/company/_search
    {
        "query":{"match_all":{}}
    }

二、精確匹配（不對查詢文本進行分詞）api

【sql】
  select * from company where name='張三'
【ES】
  POST http://192.168.197.100:9200/company/_search
    {
      "query":{
        "term":{"name.keyword":"張三"}
      }
    }

term是用於精確匹配的，相似於sql語句中的「=」，由於「name」字段用的是standard默認分詞器，其會將「張三」分紅「張」和「三」，並不會匹配姓名爲「張三」的人，而name.keyword可讓其不會進行分詞。數組

也能夠是terms，這個能夠用多個值去匹配一個字段，例如緩存

【sql】
  select * from company where name in ('張三','李四')
【ES】
  POST http://192.168.197.100:9200/company/_search
    {
      "query": {
        "terms": {
          "name.keyword": ["張三", "李四"]
        }
      }
    }

三、模糊匹配微信

【sql】
  select * from company where name like '%張%'
【ES】
  POST http://192.168.197.100:9200/company/_search
    {
      "query": {
        "match": {
          "name": "張"
        }
      }
    }

上述查詢會查出姓名中帶有「張」字的文檔post

四、分頁查詢

【sql】
  select * from company limit 0,10
【ES】
  POST http://192.168.197.100:9200/company/_search
    {
      "from":0,
      "size":10
    }

【注意】from+size不能大於10000，也能夠進行修改，但不建議這麼操做，由於es主要分片模式，其會在每一個分片都會執行同樣的查詢，而後再進行彙總排序，若是數據太大，會撐爆內存。例如每一個分片都查詢出10000條，總共5個分片，最後就會進行50000條數據的排序，最後再取值。

五、範圍查詢並進行排序

【sql】
  select * from company where age>=10 and age<=50
【ES】
  POST http://192.168.197.100:9200/company/_search
    {
      "query":{
        "range":{
          "age":{
            "gte":10,
            "lte":50
          }
        }
      },
      "sort":{
        "age":{
          "order":"desc"
        }
      }
    }

範圍查詢是range，有四種參數

(1)gte：大於等於

(2)gt：大於

(3)lte：小於等於

(4)lt：小於

排序是sort，降序是desc，升序是asc，能夠有多個排序字段

六、多字段匹配查詢

【sql】
  select * from company where sex like '%男%' or name like '%男%'
【ES】
  POST http://192.168.197.100:9200/company/_search
  {
      "query":{
        "multi_match":{
          "query":"男",
          "fields":["name","sex"]
        }
      }
    }

七、bool查詢（結構化查詢）

結構化查詢主要有三塊，分別是must，should，must_not，filter

(1)must：裏面的條件都是「並」關係，都匹配

(2)should：裏面的條件都是「或」關係，有一個條件匹配就行

(3)must_not：裏面的條件都是「並」關係，都不能匹配

(4)filter：過濾查詢，不像其它查詢須要計算_score相關性，它不進行此項計算，故比query查詢快

例如：

條件：

年齡在10到50，性別是男

性別必定不能是女

id是1~8的或者職位帶有「董」字的

【sql】
  select * from company where (age>=10 and age=50 and sex="男")
   and (sex!="女") 
   and (id in (1,2,3,4,5,6,7,8) or position like '%董%')
   and departments in ('市場部')
【ES】
  POST http://192.168.197.100:9200/company/_search
   {
      "query":{
        "bool":{
          "must":[
            {"term":{"sex":"男"}},
            {"range":{
              "age":{
                "gte":10,
                "lt":50
              }
            }}
          ],
          "must_not":[
            {"term":{"sex":"女"}}  
          ],
          "should":[
            {"terms":{"id":[1,2,3,4,5,6,7,8]}},
            {"match":{"position":"董"}}
          ],
          "filter":[
            {"match":{"departments.keyword":"市場部"}}  
          ]
        }
      }
    }

另外，bool查詢是能夠嵌套的，也就是must、must_not、should、filter裏面還能夠嵌套一個完整的bool查詢。

八、通配符查詢

？：只匹配一個字符

*：匹配多個字符

【sql】
  select * from company where departments like '%部'
【ES】
  POST http://192.168.197.100:9200/company/_search
  {
      "query":{
        "wildcard":{
                "departments.keyword":"*部"
            }
      }
    }

九、前綴查詢

【sql】
  select * from company where departments like '市%'
【ES】
  POST http://192.168.197.100:9200/company/_search
  {
      "query":{
        "match_phrase_prefix":{
                "departments.keyword":"市"
            }
      }
    }

十、查詢空值（null）

好比我添加一個文檔，裏面沒有sex字段或者添加的時候sex字段爲null，這種狀況該怎麼進行查詢呢？

//添加文檔
POST http://192.168.197.100:9200/company/_doc
//沒有sex字段的文檔
 {
  "id": "1",
    "name": "張十",
    "age": 54,
    "birthday": "1960-01-01",
    "position": "程序員",
    "joinTime": "1980-01-01",
    "modified": "1562167817000",
    "created": "1562167817000"
}

//sex字段值爲null的文檔
 {
  "id": "1",
    "name": "張十一",
    "age": 64,
    "sex":null,
    "birthday": "1960-01-01",
    "position": "程序員",
    "joinTime": "1980-01-01",
    "modified": "1562167817000",
    "created": "1562167817000"
}

這兩種狀況的查詢是同樣的，都是用exists查詢匹配，例如：下面的查詢會匹配出上述添加的兩個文檔。

【sql】
  select * from company where sex is null 
【ES】
  POST http://192.168.197.100:9200/company/_search
  {
    "query":{
      "bool":{
        "must_not":[
          {"exists":
            {"field":"sex"}
          }
        ]
      }
    }
  }

2、過濾（在es5以後被去除了）

過濾跟查詢很類似，都是用來查詢數據，只不過過濾會維繫一個緩存數組，數組裏面記錄了匹配的文檔，好比一個索引下面有兩個文檔，進行過濾，一個匹配，一個不匹配，那麼數組是這樣的[1,0]，匹配的文檔爲1。

在頻繁查詢的時候，建議用過濾而不是索引。

過濾跟查詢的請求體基本類似，只不過多嵌套了一層filtered。

例如：

【sql】
  select * from company where departments like '%市%'
【ES】
  POST http://192.168.197.100:9200/company/_search
  {
      "query":{
        "filtered":{
          "filter":{
             "match":{
                "departments.keyword":"市"
            }
          }
        }
      }
    }

3、聚合

聚合容許使用者對es文檔進行統計分析，相似與關係型數據庫中的group by，固然還有不少其餘的聚合，例如取最大值、平均值等等。

語法以下：

POST http://192.168.197.100:9200/company/_search
{
  "aggs": {
    "NAME": { //指定結果的名稱
      "AGG_TYPE": { //指定具體的聚合方法，
        TODO:  //# 聚合體內製定具體的聚合字段
      }
    }
    TODO:  //該處能夠嵌套聚合
  }
}

聚合分析功能主要有指標聚合、桶聚合、管道聚合和矩陣聚合，經常使用的有指標聚合和桶聚合，本文主要看一下指標聚合和桶聚合怎麼使用。

一、指標聚合

(1)對某個字段取最大值max

【sql】
  select max(age) from company 
【ES】
  POST http://192.168.197.100:9200/company/_search
  {
    "aggs":{
      "max_age":{
        "max":{"field":"age"}
      }
    },
    "size":0 //size=0是爲了只看聚合結果
  }
結果以下：

{
    "aggregations": {
        "max_age": {
            "value": 64
        }
    }
}

(2)對某個字段取最小值min

【sql】
  select min(age) from company 
【ES】
  POST http://192.168.197.100:9200/company/_search
  {
      "aggs":{
        "min_age":{
          "min":{"field":"age"}
        }
      },
      "size":0
    }

結果以下：

{
    "aggregations": {
        "min_age": {
            "value": 1
        }
    }
}

(3)對某個字段計算總和sum

【sql】
  select sum(age) from company 
【ES】
  POST http://192.168.197.100:9200/company/_search
  {
      "aggs":{
        "sum_age":{
          "sum":{"field":"age"}
        }
      },
      "size":0
    }

結果以下：

{
    "aggregations": {
        "sum_age": {
            "value": 315
        }
    }
}

(4)對某個字段的值計算平均值

【sql】
  select avg(sex) from company 
【ES】
  POST http://192.168.197.100:9200/company/_search
 {
    "aggs":{
      "age_avg":{
        "avg":{"field":"age"}
      }
    },
    "size":0
  }

結果以下：

{
    "aggregations": {
        "age_avg": {
            "value": 35
        }
    }
}

(5)對某個字段的值進行去重以後再取總數

【sql】
  select count(distinct(sex)) from company 
【ES】
  POST http://192.168.197.100:9200/company/_search
 {
    "aggs":{
      "sex_distinct":{
        "cardinality":{"field":"sex"}
      }
    },
    "size":0
  }

結果以下：

{
    "aggregations": {
        "sex_distinct": {
            "value": 2
        }
    }
}

(6)stats聚合，對某個字段一次性返回count，max，min，avg和sum五個指標

【sql】
  select count(distinct age),sum(age),avg(age),max(age),min(age) from company
【ES】
  POST http://192.168.197.100:9200/company/_search
 {
      "aggs":{
        "age_stats":{
          "stats":{"field":"age"}
        }
      },
      "size":0
    }

結果以下：

{
   "aggregations": {
        "age_stats": {
            "count": 9,
            "min": 1,
            "max": 64,
            "avg": 35,
            "sum": 315
        }
    }
}

(7)extended stats聚合，比stats聚合高級一點，多返回平方和、方差、標準差、平均值加/減兩個標準差的區間

【sql】
  --這個的sql不會寫，數學專業的人公式都忘了，恥辱  
【ES】
  POST http://192.168.197.100:9200/company/_search
 {
      "aggs":{
        "age_extended_stats":{
          "extended_stats":{"field":"age"}
        }
      },
      "size":0
    }

結果以下：

{
    "aggregations": {
        "age_extended_stats": {
            "count": 9,
            "min": 1,
            "max": 64,
            "avg": 35,
            "sum": 315,
            "sum_of_squares": 13857,
            "variance": 314.6666666666667,
            "std_deviation": 17.73884626086676,
            "std_deviation_bounds": {
                "upper": 70.47769252173353,
                "lower": -0.4776925217335233
            }
        }
    }
}

(8)percentiles聚合，對某個字段的值進行百分位統計

【ES】
  POST http://192.168.197.100:9200/company/_search
 {
    "aggs":{
      "age_percentiles":{
        "percentiles":{"field":"age"}
      }
    },
    "size":0
  }

結果以下：

{
    "aggregations": {
        "age_percentiles": {
            "values": {
                "1.0": 1,
                "5.0": 1,
                "25.0": 26,
                "50.0": 29,
                "75.0": 50.25,
                "95.0": 64,
                "99.0": 64
            }
        }
    }
}

(9)value count聚合，統計文檔中有某個字段的文檔數量

【sql】
  select sum(case when sex is null then 0 else 1 end) from company 
【ES】
  POST http://192.168.197.100:9200/company/_search
 {
    "aggs":{
      "sex_value_count":{
        "value_count":{"field":"sex"}
      }
    },
    "size":0
  }

結果以下：總共有8個文檔，我在以前添加了兩個沒有sex字段的文檔

【sql】
  select sum(case when sex is null then 0 else 1 end) from company 
【ES】
  POST http://192.168.197.100:9200/company/_search
 {
    "aggs":{
      "sex_value_count":{
        "value_count":{"field":"sex"}
      }
    },
    "size":0
  }

二、桶聚合

桶聚和至關於sql中的group by語句。

(1)terms聚合，分組統計

【sql】
  select sex,count(1) from company group by sex
【ES】
  POST http://192.168.197.100:9200/company/_search
  {
    "aggs":{
      "sex_groupby":{
        "terms":{"field":"sex"}
      }
    },
    "size":0
  }

結果以下：

{    
  "aggregations": {
        "sex_groupby": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "男",
                    "doc_count": 5
                },
                {
                    "key": "女",
                    "doc_count": 1
                }
            ]
        }
    }
}

(2)能夠在terms分組下再對其餘字段進行其餘聚合

【sql】
  SELECT name,count(1),AVG(age) from company group by name
【ES】
  POST http://192.168.197.100:9200/company/_search
  {
    "aggs":{
      "sex_groupby":{
        "terms":{"field":"sex"},
        "aggs":{
          "avg_age":{
            "avg":{"field":"age"}
          }
        }
      }
    },
    "size":0
  }

結果以下：

{
    "aggregations": {
        "sex_groupby": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "男",
                    "doc_count": 5,
                    "avg_age": {
                        "value": 33.8
                    }
                },
                {
                    "key": "女",
                    "doc_count": 1,
                    "avg_age": {
                        "value": 27
                    }
                }
            ]
        }
    }
}

(3)filter聚合，過濾器聚合，對符合過濾器中條件的文檔進行聚合

【sql】
  select sum(age) from company where sex = '男'
【ES】
  POST http://192.168.197.100:9200/company/_search
  {
    "aggs":{
      "sex_filter":{
        "filter":{"term":{"sex":"男"}},
        "aggs":{
          "sum_age":{
            "sum":{"field":"age"}
          }
        }
      }
    },
    "size":0
}

結果以下：

{
    "aggregations": {
        "sex_filter": {
            "doc_count": 5,
            "sum_age": {
                "value": 169
            }
        }
    }
}

(4)filters多過濾器聚合

【sql】
  SELECT name,count(1),sum(age) from company group by name
【ES】
  POST http://192.168.197.100:9200/company/_search
  {
  "aggs":{
    "sex_filter":{
      "filters":{
        "filters":[{"term":{"sex":"男"}},{"term":{"sex":"女"}}]
      },
      "aggs":{
        "sum_age":{
          "sum":{"field":"age"}
        }
      }
    }
  },
  "size":0
}

結果以下：

{
    "aggregations": {
        "sex_filter": {
            "buckets": [
                {
                    "doc_count": 5,
                    "sum_age": {
                        "value": 169
                    }
                },
                {
                    "doc_count": 1,
                    "sum_age": {
                        "value": 27
                    }
                }
            ]
        }
    }
}

(6)range範圍聚合，用於反映數據的分佈狀況

【sql】
  SELECT sum(case when age<=30 then 1 else 0 end), 
       sum(case when age>30 and age<=50 then 1 else 0 end),
       sum(case when age>50 then 1 else 0 end)
  from company 
【ES】
  POST http://192.168.197.100:9200/company/_search
  {
    "aggs":{
      "age_range":{
        "range":{
          "field":"age",
          "ranges":[
            {"to":30},
            {"from":30,"to":50},
            {"from":50}
          ]
        }
      }
    },
    "size":0
}

結果以下：

{
    "aggregations": {
        "age_range": {
            "buckets": [
                {
                    "key": "*-30.0",
                    "to": 30,
                    "doc_count": 5
                },
                {
                    "key": "30.0-50.0",
                    "from": 30,
                    "to": 50,
                    "doc_count": 2
                },
                {
                    "key": "50.0-*",
                    "from": 50,
                    "doc_count": 2
                }
            ]
        }
    }
}

(7)missing聚合，空值聚合，能夠統計缺乏某個字段的文檔數量

【sql】
  SELECT count(1) from company where sex is null
  
【ES】
  POST http://192.168.197.100:9200/company/_search
  {
  "aggs":{
    "missing_sex":{
      "missing":{"field":"sex"}
    }
  },
  "size":0
}

結果以下：

{
    "aggregations": {
        "missing_sex": {
            "doc_count": 4
        }
    }
}

這個也能夠用filter過濾器查詢，例如：獲得的結果是同樣的

 POST http://192.168.197.100:9200/company/_search
  {
  "aggs":{
    "missing_sex":{
      "filter":{
        "bool":{
          "must_not":[
            {"exists":{"field":"sex"}  }
          ]
        }
      }
    }
  },
  "size":0
}