Elasticsearch使用總結

時間 2019-11-11

原文原文鏈接

原文出自：https://www.2cto.com/database/201612/580142.htmlhtml

ELK乾貨：http://www.cnblogs.com/xing901022/p/4704319.htmljava

本身存的東西方便之後看數據庫

java裏面debugger以後把查詢的格式放到 Console - Kibana 裏查工具：Console - Kibanaapi

Elasticsearch 2.3.3 JAVA api說明文檔：https://www.blog-china.cn/template/documentHtml/1484101683485.html

這是官方對Elasticsearch的定位。通俗的講，Elasticsearch就是一款面向文檔的NoSQL數據庫，使用JSON做爲文檔序列化格式。可是，它的高級之處在於，使用Lucene做爲核心來實現全部索引和搜索的功能，使得每一個文檔的內容均可以被索引、搜索、排序、過濾。同時，提供了豐富的聚合功能，能夠對數據進行多維度分析。對外統一使用REST API接口進行溝通，即Client與Server之間使用HTTP協議通訊。
首先，來看看在存儲上的基本概念，這裏將其與MySQL進行了對比，從而能夠更清晰的搞清楚每一個概念的意義。數據結構

Elasticsearch	MySQL
index（索引，名詞）	database
doc type（文檔類型）	table
document（文檔）	row
field（字段）	column
mapping（映射）	schema
query DSL（查詢語言）	SQL

而後，來看看倒排索引的概念（官方解釋）。倒排索引是搜索引擎的基石，也是Elasticsearch能實現快速全文搜索的根本。概括起來，主要是對一個文檔內容作兩步操做：分詞、創建「單詞-文檔」列表。舉個例子，假若有下面兩個文檔：app

1. {"content": "The quick brown fox jumped over the lazy dog"}
2. {"content": "Quick brown foxes leap over lazy dogs in summer"}

Elasticsearch會使用分詞器對content字段的內容進行分詞，再根據單詞在文檔中是否出現創建以下所示的列表，√表示單詞在文檔中有出現。假如咱們想搜索「quick brown」，只須要找到每一個詞在哪一個文檔中出現便可。若是有多個文檔匹配，能夠根據匹配的程度進行打分，找出相關性高的文檔。函數

Term	Doc_1	Doc_2
Quick		√
The	√
brown	√	√
dog	√
dogs		√
fox	√
foxes		√
in		√
jumped	√
lazy	√	√
leap		√
over	√	√
quick	√
summer		√
the	√

最後，咱們再回過頭看看上面的映射的概念。相似於MySQL在db schema中申明每一個列的數據類型、索引類型等，Elasticsearch中使用mapping來作這件事。經常使用的是，在mapping中申明字段的數據類型、是否創建倒排索引、創建倒排索引時使用什麼分詞器。默認狀況下，Elasticsearch會爲全部的string類型數據使用standard分詞器創建倒排索引。工具

查看mapping：GET https://localhost:9200/<index name="">/_mapping
NOTE: 這裏的index是blog，doc type是test
{
    "blog": {
        "mappings": {
            "test": {
                "properties": {
                    "activity_type": {
                        "type": "string",
                        "index": "not_analyzed"
                    },
                    "address": {
                        "type": "string",
                        "analyzer": "ik_smart"
                    },
                    "happy_party_id": {
                        "type": "integer"
                    },
                    "last_update_time": {
                        "type": "date",
                        "format": "yyyy-MM-dd HH:mm:ss"
                    }
                }
            }
        }
    }
}</index>

數據插入

在MySQL中，咱們須要先創建database和table，申明db schema後才能夠插入數據。而在Elasticsearch，能夠直接插入數據，系統會自動創建缺失的index和doc type，並對字段創建mapping。由於半結構化數據的數據結構一般是動態變化的，咱們沒法預知某個文檔中究竟有哪些字段，若是每次插入數據都須要提早創建index、type、mapping，那就失去了其做爲NoSQL的優點了。ui

 1 直接插入數據：POST https://localhost:9200/blog/test
 2 {
 3     "count": 5,
 4     "desc": "hello world"
 5 }
 6  
 7 查看索引：GET https://localhost:9200/blog/_mapping
 8 {
 9     "blog": {
10         "mappings": {
11             "test": {
12                 "properties": {
13                     "count": {
14                         "type": "long"
15                     },
16                     "desc": {
17                         "type": "string"
18                     }
19                 }
20             }
21         }
22     }
23 }

然而這種靈活性是有限，好比上文咱們提到，默認狀況下，Elasticsearch會爲全部的string類型數據使用standard分詞器創建倒排索引，那麼若是某些字段不想創建倒排索引怎麼辦。Elasticsearch提供了dynamic template的概念來針對一組index設置默認mapping，只要index的名稱匹配了，就會使用該template設置的mapping進行字段映射。
??下面所示即建立一個名稱爲blog的template，該template會自動匹配以」blog_」開頭的index，爲其自動創建mapping，對文檔中的全部string自動增長一個.raw字段，而且該字段不作索引。這也是ELK中的作法，能夠查看ELK系統中Elasticsearch的template，會發現有一個名爲logstash的template。搜索引擎

 1 建立template：POST https://localhost:9200/_template/blog
 2 {
 3     "template": "blog_*",
 4     "mappings": {
 5         "_default_": {
 6             "dynamic_templates": [{
 7                 "string_fields": {
 8                     "mapping": {
 9                         "type": "string",
10                         "fields": {
11                             "raw": {
12                                 "index": "not_analyzed",
13                                 "ignore_above": 256,
14                                 "type": "string"
15                             }
16                         }
17                     },
18                     "match_mapping_type": "string"
19                 }
20             }],
21             "properties": {
22                 "timestamp": {
23                     "doc_values": true,
24                     "type": "date"
25                 }
26             },
27             "_all": {
28                 "enabled": false
29             }
30         }
31     }
32 }
33  
34 直接插入數據：POST https://localhost:9200/blog_2016-12-25/test
35 {
36     "count": 5,
37     "desc": "hello world"
38 }

插入問題還有個話題就是批量插入。Elasticsearch提供了bulk API用來作批量的操做，你能夠在該API中自由組合你要作的操做和數據，一次性發送給Elasticsearch進行處理，其格式是這樣的。

 1 action_and_meta_data\n
 2 optional_source\n
 3 action_and_meta_data\n
 4 optional_source\n
 5 ....
 6 action_and_meta_data\n
 7 optional_source\n
 8  
 9 好比：
10 { "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
11 { "field1" : "value1" }
12 { "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
13 { "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
14 { "field1" : "value3" }
15 { "update" : {"_id" : "1", "_type" : "type1", "_index" : "test"} }
16 { "doc" : {"field2" : "value2"} }

若是是針對相同的index和doc type進行操做，則在REST API中指定index和type便可。批量插入的操做舉例以下：

 1 批量插入：POST https://localhost:9200/blog_2016-12-24/test/_bulk
 2 {"index": {}}
 3 {"count": 5, "desc": "hello world 111"}
 4 {"index": {}}
 5 {"count": 6, "desc": "hello world 222"}
 6 {"index": {}}
 7 {"count": 7, "desc": "hello world 333"}
 8 {"index": {}}
 9 {"count": 8, "desc": "hello world 444"}
10  
11 查看插入的結果：GET https://localhost:9200/blog_2016-12-24/test/_search

數據查詢

Elasticsearch的查詢語法（query DSL）分爲兩部分：query和filter，區別在於查詢的結果是要徹底匹配仍是相關性匹配。filter查詢考慮的是「文檔中的字段值是否等於給定值」，答案在「是」與「否」中；而query查詢考慮的是「文檔中的字段值與給定值的匹配程度如何」，會計算出每份文檔與給定值的相關性分數，用這個分數對匹配了的文檔進行相關性排序。
??在實際使用中，要注意兩點：第一，filter查詢要在沒有作倒排索引的字段上作，即上面mapping中增長的.raw字段；第二，一般使用filter來縮小查詢範圍，使用query進行搜索，即兩者配合使用。舉例來看，注意看三個不一樣查詢在寫法上的區別：

 1 1. 只使用query進行查詢：
 2 POST https://localhost:9200/user_action/_search
 3 查詢的結果是page_name字段中包含了wechat全部文檔
 4 這裏使用size來指定返回文檔的數量，默認Elasticsearch是返回前100條數據的
 5 {
 6     "query": {
 7         "bool": {
 8             "must": [{
 9                 "match": {
10                     "page_name": "wechat"
11                 }
12             },
13             {
14                 "range": {
15                     "timestamp": {
16                         "gte": 1481218631,
17                         "lte": 1481258231,
18                         "format": "epoch_second"
19                     }
20                 }
21             }]
22         }
23     },
24     "size": 2
25 }
26  
27 2. 只使用filter進行查詢：
28 POST https://localhost:9200/user_action/_search
29 查詢的結果是page_name字段值等於"example.cn/wechat/view.html"的全部文檔
30 {
31     "filter": {
32         "bool": {
33             "must": [{
34                 "term": {
35                     "page_name.raw": "example.cn/wechat/view.html"
36                 }
37             },
38             {
39                 "range": {
40                     "timestamp": {
41                         "gte": 1481218631,
42                         "lte": 1481258231,
43                         "format": "epoch_second"
44                     }
45                 }
46             }]
47         }
48     },
49     "size": 2
50 }
51  
52 3. 同時使用query與filter進行查詢：
53 POST https://localhost:9200/user_action/_search
54 查詢的結果是page_name字段值等於"example.cn/wechat/view.html"的全部文檔
55 {
56     "query": {
57         "bool": {
58             "filter": [{
59                 "bool": {
60                     "must": [{
61                         "term": {
62                             "page_name.raw": "job.gikoo.cn/wechat/view.html"
63                         }
64                     },
65                     {
66                         "range": {
67                             "timestamp": {
68                                 "gte": 1481218631,
69                                 "lte": 1481258231,
70                                 "format": "epoch_second"
71                             }
72                         }
73                     }]
74                 }
75             }]
76         }
77     },
78     "size": 2
79 }

聚合分析

相似於MySQL中的聚合由分組和聚合計算兩種，Elasticsearch的聚合也有兩部分組成：Buckets與Metrics。Buckets至關於SQL中的分組group by，而Metrics則至關於SQL中的聚合函數COUNT，SUM，MAX，MIN等等。聚合分析天然離不開對多個字段值進行分組，在MySQL中，咱們只要使用「group by c1, c2, c3」就能夠完成這樣的功能，可是Elasticsearch沒有這樣的語法。Elasticsearch提供了另外一種方法，即Buckets嵌套，仔細想一想，彷佛這種設計更加符合人的思惟方式。舉例來看具體操做方法：

1. 最簡單的聚合查詢
POST https://localhost:9200/user_action/_search
爲了簡單，這裏刪除了query的條件描述
將符合條件的文檔按照公司進行聚合
這裏有兩個size，和aggs並列的size=0表示返回結果不包含查詢結果，只返回聚合結果，terms裏面的size表示返回的聚合結果數量
{
    "aggs": {
        "company_terms": {
            "terms": {
                "field": "company",
                "size": 2
            }
        }
    },
    "size": 0
}
 
2. Buckets與Metric配合
POST https://localhost:9200/user_action/_search
將符合條件的文檔按照公司進行聚合，並獲取每一個公司最近一次操做的時間
{
    "aggs": {
        "company_terms": {
            "terms": {
                "field": "company",
                "size": 2
            },
            "aggs": {
                "latest_record": {
                    "max": {
                        "field": "timestamp"
                    }
                }
            }
        }
    },
    "size": 0
}
 
3. Buckets嵌套
POST https://localhost:9200/user_action/_search
將符合條件的文檔先按照公司進行聚合，再對每一個公司下的門店進行聚合，並獲取每一個門店最近一次操做的時間
{
    "aggs": {
        "company_terms": {
            "terms": {
                "field": "company",
                "size": 1
            },
            "aggs": {
                "store_terms": {
                    "terms": {
                        "field": "store",
                        "size": 2
                    },
                    "aggs": {
                        "latest_record": {
                            "max": {
                                "field": "timestamp"
                            }
                        }
                    }
                }
            }
        }
    },
    "size": 0
}

相關標籤/搜索

elasticsearch+elasticsearch

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。