有 SQL 背景的同窗在學習 Elasticsearch 時,面對一個查詢需求,不禁自主地會先思考如何用 SQL 來實現,而後再去想 Elasticsearch 的 Query DSL 如何實現。那麼本篇就給你們講一條常見的 SQL 語句如何用 Elasticsearch 的查詢語言實現。mysql
假設咱們有一個汽車的數據集,每一個汽車都有車型、顏色等字段,我但願獲取顏色種類大於1個的前2車型。假設汽車的數據模型以下:sql
{ "model":"modelA", "color":"red" }
假設咱們有一個 cars 表,經過以下語句建立測試數據。json
INSERT INTO cars (model,color) VALUES ('A','red'); INSERT INTO cars (model,color) VALUES ('A','white'); INSERT INTO cars (model,color) VALUES ('A','black'); INSERT INTO cars (model,color) VALUES ('A','yellow'); INSERT INTO cars (model,color) VALUES ('B','red'); INSERT INTO cars (model,color) VALUES ('B','white'); INSERT INTO cars (model,color) VALUES ('C','black'); INSERT INTO cars (model,color) VALUES ('C','red'); INSERT INTO cars (model,color) VALUES ('C','white'); INSERT INTO cars (model,color) VALUES ('C','yellow'); INSERT INTO cars (model,color) VALUES ('C','blue'); INSERT INTO cars (model,color) VALUES ('D','red'); INSERT INTO cars (model,color) VALUES ('A','red');
那麼實現咱們需求的 SQL 語句也比較簡單,實現以下:elasticsearch
SELECT model,COUNT(DISTINCT color) color_count FROM cars GROUP BY model HAVING color_count > 1 ORDER BY color_count desc LIMIT 2;
這條查詢語句中 Group By 是按照 model 作分組, Having color_count>1 限定了車型顏色種類大於1,ORDER BY color_count desc 限定結果按照顏色種類倒序排列,而 LIMIT 2 限定只返回前3條數據。學習
那麼在 Elasticsearch 中如何實現這個需求呢?測試
首先咱們須要先在 elasticsearch 中插入測試的數據,這裏咱們使用 bulk 接口 ,以下所示:code
POST _bulk {"index":{"_index":"cars","_type":"doc","_id":"1"}} {"model":"A","color":"red"} {"index":{"_index":"cars","_type":"doc","_id":"2"}} {"model":"A","color":"white"} {"index":{"_index":"cars","_type":"doc","_id":"3"}} {"model":"A","color":"black"} {"index":{"_index":"cars","_type":"doc","_id":"4"}} {"model":"A","color":"yellow"} {"index":{"_index":"cars","_type":"doc","_id":"5"}} {"model":"B","color":"red"} {"index":{"_index":"cars","_type":"doc","_id":"6"}} {"model":"B","color":"white"} {"index":{"_index":"cars","_type":"doc","_id":"7"}} {"model":"C","color":"black"} {"index":{"_index":"cars","_type":"doc","_id":"8"}} {"model":"C","color":"red"} {"index":{"_index":"cars","_type":"doc","_id":"9"}} {"model":"C","color":"white"} {"index":{"_index":"cars","_type":"doc","_id":"10"}} {"model":"C","color":"yellow"} {"index":{"_index":"cars","_type":"doc","_id":"11"}} {"model":"C","color":"blue"} {"index":{"_index":"cars","_type":"doc","_id":"12"}} {"model":"D","color":"red"} {"index":{"_index":"cars","_type":"doc","_id":"13"}} {"model":"A","color":"red"}
其中 index 爲 cars,type 爲 doc,全部數據與mysql 數據保持一致。你們能夠在 Kibana 的 Dev Tools 中執行上面的命令,而後執行下面的查詢語句驗證數據是否已經成功存入。排序
GET cars/_search
SQL 中 Group By 語句在 Elasticsearch 中對應的是 Terms Aggregation,即分桶聚合,對應 Group By color 的語句以下所示:接口
GET cars/_search { "size":0, "aggs":{ "models":{ "terms":{ "field":"model.keyword" } } } }
結果以下:ip
{ "took": 161, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 13, "max_score": 0, "hits": [] }, "aggregations": { "models": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "A", "doc_count": 5 }, { "key": "C", "doc_count": 5 }, { "key": "B", "doc_count": 2 }, { "key": "D", "doc_count": 1 } ] } } }
咱們看 aggregations 這個 key 下面的即爲返回結果。
SQL 語句中還有一項是 COUNT(DISTINCT color) color_count
用於計算每一個 model 的顏色數,在 Elasticsearch 中咱們須要使用一個指標類聚合 Cardinality ,進行不一樣值計數。語句以下:
GET cars/_search { "size": 0, "aggs": { "models": { "terms": { "field": "model.keyword" }, "aggs": { "color_count": { "cardinality": { "field": "color.keyword" } } } } } }
其返回結果以下:
{ "took": 74, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 13, "max_score": 0, "hits": [] }, "aggregations": { "models": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "A", "doc_count": 5, "color_count": { "value": 4 } }, { "key": "C", "doc_count": 5, "color_count": { "value": 5 } }, { "key": "B", "doc_count": 2, "color_count": { "value": 2 } }, { "key": "D", "doc_count": 1, "color_count": { "value": 1 } } ] } } }
結果中 color_count 即爲每一個 model 的顏色數,但這裏全部的模型都返回了,咱們只想要顏色數大於1的模型,所以這裏還要加一個過濾條件。
Having color_count > 1 在 Elasticsearch 中對應的是 Bucket Filter 聚合,語句以下所示:
GET cars/_search { "size": 0, "aggs": { "models": { "terms": { "field": "model.keyword" }, "aggs": { "color_count": { "cardinality": { "field": "color.keyword" } }, "color_count_filter": { "bucket_selector": { "buckets_path": { "colorCount": "color_count" }, "script": "params.colorCount>1" } } } } } }
返回結果以下:
{ "took": 39, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 13, "max_score": 0, "hits": [] }, "aggregations": { "models": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "A", "doc_count": 5, "color_count": { "value": 4 } }, { "key": "C", "doc_count": 5, "color_count": { "value": 5 } }, { "key": "B", "doc_count": 2, "color_count": { "value": 2 } } ] } } }
此時返回結果只包含顏色數大於1的模型,但你們會發現顏色數多的 C 不是在第一個位置,咱們還須要作排序處理。
ORDER BY color_count desc LIMIT 3 在 Elasticsearch 中可使用 Bucket Sort 聚合實現,語句以下所示:
GET cars/_search { "size": 0, "aggs": { "models": { "terms": { "field": "model.keyword" }, "aggs": { "color_count": { "cardinality": { "field": "color.keyword" } }, "color_count_filter": { "bucket_selector": { "buckets_path": { "colorCount": "color_count" }, "script": "params.colorCount>1" } }, "color_count_sort": { "bucket_sort": { "sort": { "color_count": "desc" }, "size": 2 } } } } } }
返回結果以下:
{ "took": 32, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 13, "max_score": 0, "hits": [] }, "aggregations": { "models": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "C", "doc_count": 5, "color_count": { "value": 5 } }, { "key": "A", "doc_count": 5, "color_count": { "value": 4 } } ] } } }
至此咱們便將 SQL 語句實現的功能用 Elasticsearch 查詢語句實現了。對比 SQL 語句與 Elasticsearch 的查詢語句,你們會發現後者複雜了不少,但並不是無章可循,隨着你們對常見語法愈來愈熟悉,相信必定會越寫越駕輕就熟!