核心詳解

1、文檔

在Elasticsearch中，文檔以JSON格式進行存儲，能夠是複雜的結構如：node

{
    "_index": "haoke",
    "_type": "user",
    "_id": "1007",
    "_version": 1,
    "_score": 1,
    "_source": {
        "id": 1007,
        "name": "seven",
        "age": 20,
        "sex": "女",
        "card": {
            "card_number": "123456789"
        }
    }
}

其中，card是一個複雜對象，嵌套的Card對象。數據庫

1. 元數據

節點	說明
_index	文檔存儲的地方
_type	文檔表明的對象的類
_id	文檔的惟一標識

_indexjson

索引(index)相似於關係型數據庫裏的「數據庫」——它是咱們存儲和索引關聯數據的地方
提示：
事實上，咱們的數據被存儲和索引在分片(shards)中，索引只是一個把一個或多個分片分組在一塊兒的邏輯空
間。然而，這只是一些內部細節——咱們的程序徹底不用關心分片。對於咱們的程序而言，文檔存儲在索引
(index)中。剩下的細節由Elasticsearch關心既可
api

_type數組

在應用中，咱們使用對象表示一些「事物」，例如一個用戶、一篇博客、一個評論，或者一封郵件。每一個對象都屬於一個類(class)，這個類定義了屬性或與對象關聯的數據。 user 類的對象可能包含姓名、性別、年齡和Email地址。緩存

在關係型數據庫中，咱們常常將相同類的對象存儲在一個表裏，由於它們有着相同的結構。同理，在Elasticsearch中，咱們使用相同類型(type)的文檔表示相同的「事物」，由於他們的數據結構也是相同的。網絡

每一個類型(type)都有本身的映射(mapping)或者結構定義，就像傳統數據庫表中的列同樣。全部類型下的文檔被存儲在同一個索引下，可是類型的映射(mapping)會告訴Elasticsearch不一樣的文檔如何被索引。數據結構

_type 的名字能夠是大寫或小寫，不能包含下劃線或逗號。咱們將使用 blog 作爲類型名app

_id分佈式

id僅僅是一個字符串，它與 _index 和 _type 組合時，就能夠在Elasticsearch中惟一標識一個文檔。當建立一個文檔，你能夠自定義 _id ，也可讓Elasticsearch幫你自動生成（32位長度）

二. 查詢響應

1. pretty

能夠在查詢url後面添加pretty參數，使得返回的json更易查看

GET haoke/user/1007?pretty

#響應
{
    "_index": "haoke",
    "_type": "user",
    "_id": "1007",
    "_version": 1,
    "_seq_no": 8,
    "_primary_term": 2,
    "found": true,
    "_source": {
        "id": 1007,
        "name": "seven",
        "age": 20,
        "sex": "女",
        "card": {
            "card_number": "123456789"
        }
    }
}

2.指定響應字段

在響應的數據中，若是咱們不須要所有的字段，能夠指定某些須要的字段進行返回

GET /haoke/user/1007?_source=id,name

#響應
{
    "_index": "haoke",
    "_type": "user",
    "_id": "1007",
    "_version": 1,
    "_seq_no": 8,
    "_primary_term": 2,
    "found": true,
    "_source": {
        "name": "seven",
        "id": 1007
    }
}

如不須要返回元數據，僅僅返回原始數據，能夠這樣：

GET /haoke/user/1007/_source

#響應

{
    "id": 1007,
    "name": "seven",
    "age": 20,
    "sex": "女",
    "card": {
        "card_number": "123456789"
    }
}

原始數據+指定字段

GET /haoke/user/1007/_source?_source=id,name
#響應
{
    "name": "seven",
    "id": 1007
}

3. 判斷文檔是否存在

若是咱們只須要判斷文檔是否存在，而不是查詢文檔內容，那麼能夠這樣：
文件存在時：

HEAD /haoke/user/1007

# 文件存在 響應爲空 
Status:200

文件不存在時：

HEAD /haoke/user/1009

# 文件不存在 
Status 404 NotFound

固然，這隻表示你在查詢的那一刻文檔不存在，但並不表示幾毫秒後依舊不存在。另外一個進程在這期間可能創
建新文檔。

4.批量操做

4.1. 批量查詢

POST  /haoke/user/_mget
#參數
{
    "ids":["1001","1002"]
}


#響應

{
    "docs": [
        {
            "_index": "haoke",
            "_type": "user",
            "_id": "1001",
            "_version": 3,
            "_seq_no": 3,
            "_primary_term": 2,
            "found": true,
            "_source": {
                "id": 1001,
                "name": "張三",
                "age": 23,
                "sex": "女"
            }
        },
        {
            "_index": "haoke",
            "_type": "user",
            "_id": "1002",
            "_version": 1,
            "_seq_no": 0,
            "_primary_term": 1,
            "found": true,
            "_source": {
                "id": 1002,
                "name": "張三",
                "age": 20,
                "sex": "男"
            }
        }
    ]
}

若是，某一條數據不存在，不影響總體響應，須要經過found的值進行判斷是否查詢到數據。

found:false 標識數據不存在

POST /haoke/user/_mget
# 參數

{
    "ids":["1001","1006"]
}
# 響應
{
    "docs": [
        {
            "_index": "haoke",
            "_type": "user",
            "_id": "1001",
            "_version": 3,
            "_seq_no": 3,
            "_primary_term": 2,
            "found": true,
            "_source": {
                "id": 1001,
                "name": "張三",
                "age": 23,
                "sex": "女"
            }
        },
        {
            "_index": "haoke",
            "_type": "user",
            "_id": "1006",
            "found": false
        }
    ]
}

4.2 _bulk操做

在Elasticsearch中，支持批量的插入、修改、刪除操做，都是經過_bulk的api完成的。
請求格式：

{ action: { metadata }}\n
{ request body }\n
{ action: { metadata }}\n
{ request body }\n
...

4.2.1 批量添加

示例

注意傳參最後一行必定回車

POST /haoke/user/_bulk

#參數：

{"create":{"_index":"haoke","_type":"user","_id":2001}}
{"id":2001,"name":"name1","age": 20,"sex": "男"}
{"create":{"_index":"haoke","_type":"user","_id":2002}}
{"id":2002,"name":"name2","age": 20,"sex": "男"}
{"create":{"_index":"haoke","_type":"user","_id":2003}}
{"id":2003,"name":"name3","age": 20,"sex": "男"}

#響應

{
    "took": 11,
    "errors": false,
    "items": [
        {
            "create": {
                "_index": "haoke",
                "_type": "user",
                "_id": "2001",
                "_version": 1,
                "result": "created",
                "_shards": {
                    "total": 1,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 9,
                "_primary_term": 2,
                "status": 201
            }
        },
        {
            "create": {
                "_index": "haoke",
                "_type": "user",
                "_id": "2002",
                "_version": 1,
                "result": "created",
                "_shards": {
                    "total": 1,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 3,
                "_primary_term": 2,
                "status": 201
            }
        },
        {
            "create": {
                "_index": "haoke",
                "_type": "user",
                "_id": "2003",
                "_version": 1,
                "result": "created",
                "_shards": {
                    "total": 1,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 4,
                "_primary_term": 2,
                "status": 201
            }
        }
    ]
}

4.2.2 批量刪除

POST /haoke/user/_bulk

#參數
{"delete":{"_index":"haoke","_type":"user","_id":2001}}
{"delete":{"_index":"haoke","_type":"user","_id":2002}}
{"delete":{"_index":"haoke","_type":"user","_id":2003}}

#響應
{
    "took": 11,
    "errors": false,
    "items": [
        {
            "delete": {
                "_index": "haoke",
                "_type": "user",
                "_id": "2001",
                "_version": 2,
                "result": "deleted",
                "_shards": {
                    "total": 1,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 10,
                "_primary_term": 2,
                "status": 200
            }
        },
        {
            "delete": {
                "_index": "haoke",
                "_type": "user",
                "_id": "2002",
                "_version": 2,
                "result": "deleted",
                "_shards": {
                    "total": 1,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 5,
                "_primary_term": 2,
                "status": 200
            }
        },
        {
            "delete": {
                "_index": "haoke",
                "_type": "user",
                "_id": "2003",
                "_version": 2,
                "result": "deleted",
                "_shards": {
                    "total": 1,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 6,
                "_primary_term": 2,
                "status": 200
            }
        }
    ]
}

其餘操做就相似了。
一次請求多少性能最高？

整個批量請求須要被加載到接受咱們請求節點的內存裏，因此請求越大，給其它請求可用的內存就越小。有一個最佳的bulk請求大小。超過這個大小，性能再也不提高並且可能下降。
最佳大小，固然並非一個固定的數字。它徹底取決於你的硬件、你文檔的大小和複雜度以及索引和搜索的負載。
幸運的是，這個最佳點(sweetspot)仍是容易找到的：試着批量索引標準的文檔，隨着大小的增加，當性能開始下降，說明你每一個批次的大小太大了。開始的數量能夠在1000~5000個文檔之間，若是你的文檔很是大，可使用較小的批次。
一般着眼於你請求批次的物理大小是很是有用的。一千個1kB的文檔和一千個1MB的文檔大不相同。一個好的批次最好保持在5-15MB大小間。

5.分頁

和SQL使用 LIMIT 關鍵字返回只有一頁的結果同樣，Elasticsearch接受 from 和 size 參數：

size: 結果數，默認10
from: 跳過開始的結果數，默認0

示例：

GET /_search?size=1&from=1

#響應
{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 2,
        "successful": 2,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 4,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "haoke",
                "_type": "user",
                "_id": "2003",
                "_score": 1.0,
                "_source": {
                    "id": 2003,
                    "name": "name3",
                    "age": 20,
                    "sex": "男"
                }
            }
        ]
    }
}

在集羣系統中深度分頁
爲了理解爲何深度分頁是有問題的，讓咱們假設在一個有5個主分片的索引中搜索。當咱們請求結果的第一頁（結果1到10）時，每一個分片產生本身最頂端10個結果真後返回它們給請求節點(requesting node)，它再排序這全部的50個結果以選出頂端的10個結果。

如今假設咱們請求第1000頁——結果10001到10010。工做方式都相同，不一樣的是每一個分片都必須產生頂端的10010個結果。而後請求節點排序這50050個結果並丟棄50040個！

你能夠看到在分佈式系統中，排序結果的花費隨着分頁的深刻而成倍增加。這也是爲何網絡搜索引擎中任何
語句不能返回多於1000個結果的緣由。

6.映射

前面咱們建立的索引以及插入數據，都是由Elasticsearch進行自動判斷類型，有些時候咱們是須要進行明確字段類型的，不然，自動判斷的類型和實際需求是不相符的。
自動判斷的規則以下：

string類型在ElasticSearch 舊版本中使用較多，從ElasticSearch 5.x開始再也不支持string，由text和keyword類型替代。
text 類型，當一個字段是要被全文搜索的，好比Email內容、產品描述，應該使用text類型。設置text類型之後，字段內容會被分析，在生成倒排索引之前，字符串會被分析器分紅一個一個詞項。text類型的字段不用於排序，不多用於聚合。
keyword類型適用於索引結構化的字段，好比email地址、主機名、狀態碼和標籤。若是字段須要進行過濾(好比查找已發佈博客中status屬性爲published的文章)、排序、聚合。keyword類型的字段只能經過精確值搜索到

6.1 建立明確類型的索引

PUT /ela
#參數
{
    "settings": {
        "index": {
            "number_of_shards": "1"
        }
    },
    "mappings": {
            "properties": {
                "name": { "type": "text"},
                "age": {"type": "integer" },
                "mail": {"type": "keyword" },
                "hobby": {  "type": "text"}
            }
    }
}
#響應
{
    "acknowledged": true,
    "shards_acknowledged": true,
    "index": "ela"
}

6.2 查尋已經建立的映射：

GET  /ela/_mapping

#響應
{
    "ela": {
        "mappings": {
            "properties": {
                "age": {
                    "type": "integer"
                },
                "hobby": {
                    "type": "text"
                },
                "mail": {
                    "type": "keyword"
                },
                "name": {
                    "type": "text"
                }
            }
        }
    }
}

6.3 插入數據

POST /ela/_bulk

#參數

{"index":{"_index":"ela"}}
{"name":"張三","age": 20,"mail": "111@qq.com","hobby":"羽毛球、乒乓球、足球"}
{"index":{"_index":"ela"}}
{"name":"李四","age": 21,"mail": "222@qq.com","hobby":"羽毛球、乒乓球、足球、籃球"}
{"index":{"_index":"ela"}}
{"name":"王五","age": 22,"mail": "333@qq.com","hobby":"羽毛球、籃球、游泳、聽音樂"}
{"index":{"_index":"ela"}}
{"name":"趙六","age": 23,"mail": "444@qq.com","hobby":"跑步、游泳"}
{"index":{"_index":"ela"}}
{"name":"孫七","age": 24,"mail": "555@qq.com","hobby":"聽音樂、看電影"}


#響應
{
    "took": 12,
    "errors": false,
    "items": [
        {
            "index": {
                "_index": "ela",
                "_type": "_doc",
                "_id": "1r5njnUBAdKB-kbRFcG6",
                "_version": 1,
                "result": "created",
                "_shards": {
                    "total": 2,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 0,
                "_primary_term": 1,
                "status": 201
            }
        },
        {
            "index": {
                "_index": "ela",
                "_type": "_doc",
                "_id": "175njnUBAdKB-kbRFcG6",
                "_version": 1,
                "result": "created",
                "_shards": {
                    "total": 2,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 1,
                "_primary_term": 1,
                "status": 201
            }
        },
        {
            "index": {
                "_index": "ela",
                "_type": "_doc",
                "_id": "2L5njnUBAdKB-kbRFcG6",
                "_version": 1,
                "result": "created",
                "_shards": {
                    "total": 2,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 2,
                "_primary_term": 1,
                "status": 201
            }
        },
        {
            "index": {
                "_index": "ela",
                "_type": "_doc",
                "_id": "2b5njnUBAdKB-kbRFcG6",
                "_version": 1,
                "result": "created",
                "_shards": {
                    "total": 2,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 3,
                "_primary_term": 1,
                "status": 201
            }
        },
        {
            "index": {
                "_index": "ela",
                "_type": "_doc",
                "_id": "2r5njnUBAdKB-kbRFcG6",
                "_version": 1,
                "result": "created",
                "_shards": {
                    "total": 2,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 4,
                "_primary_term": 1,
                "status": 201
            }
        }
    ]
}

6.4 測試搜索

POST /ela/_search

#參數
{
    "query": {
        "match": {
            "hobby": "音樂"
        }
    }
}

#響應
{
    "took": 18,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 2,
            "relation": "eq"
        },
        "max_score": 1.9159472,
        "hits": [
            {
                "_index": "ela",
                "_type": "_doc",
                "_id": "2r5njnUBAdKB-kbRFcG6",
                "_score": 1.9159472,
                "_source": {
                    "name": "孫七",
                    "age": 24,
                    "mail": "555@qq.com",
                    "hobby": "聽音樂、看電影"
                }
            },
            {
                "_index": "ela",
                "_type": "_doc",
                "_id": "2L5njnUBAdKB-kbRFcG6",
                "_score": 1.5506182,
                "_source": {
                    "name": "王五",
                    "age": 22,
                    "mail": "333@qq.com",
                    "hobby": "羽毛球、籃球、游泳、聽音樂"
                }
            }
        ]
    }
}

7. 結構化查詢

7.1 term查詢

term 主要用於精確匹配哪些值，好比數字，日期，布爾值或 not_analyzed 的字符串(未經分析的文本數據類型)：

POST /ela/_search

#參數
{
    "query": {
        "term": {
            "age":20
        }
    }
}
#響應
{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "ela",
                "_type": "_doc",
                "_id": "1r5njnUBAdKB-kbRFcG6",
                "_score": 1.0,
                "_source": {
                    "name": "張三",
                    "age": 20,
                    "mail": "111@qq.com",
                    "hobby": "羽毛球、乒乓球、足球"
                }
            }
        ]
    }
}

7.2 terms查詢

terms 跟 term 有點相似，但 terms 容許指定多個匹配條件。若是某個字段指定了多個值，那麼文檔須要一塊兒去作匹配

POST /ela/_search
{
    "query": {
        "terms": {
            "age":[20,21]
        }
    }
}
#響應
"hits": [
            {
                "_index": "ela",
                "_type": "_doc",
                "_id": "1r5njnUBAdKB-kbRFcG6",
                "_score": 1.0,
                "_source": {
                    "name": "張三",
                    "age": 20,
                    "mail": "111@qq.com",
                    "hobby": "羽毛球、乒乓球、足球"
                }
            },
            {
                "_index": "ela",
                "_type": "_doc",
                "_id": "175njnUBAdKB-kbRFcG6",
                "_score": 1.0,
                "_source": {
                    "name": "李四",
                    "age": 21,
                    "mail": "222@qq.com",
                    "hobby": "羽毛球、乒乓球、足球、籃球"
                }
            }
]

7.3 range查詢

range 過濾容許咱們按照指定範圍查找一批數據：

#POST /ela/_search

#
{
    "query": {
        "range": {
            "age":{
                "gte":20,
                "lt":22
            }
        }
    }
}
#
"hits": [
            {
                "_index": "ela",
                "_type": "_doc",
                "_id": "1r5njnUBAdKB-kbRFcG6",
                "_score": 1.0,
                "_source": {
                    "name": "張三",
                    "age": 20,
                    "mail": "111@qq.com",
                    "hobby": "羽毛球、乒乓球、足球"
                }
            },
            {
                "_index": "ela",
                "_type": "_doc",
                "_id": "175njnUBAdKB-kbRFcG6",
                "_score": 1.0,
                "_source": {
                    "name": "李四",
                    "age": 21,
                    "mail": "222@qq.com",
                    "hobby": "羽毛球、乒乓球、足球、籃球"
                }
            }
        ]

範圍操做符包含：
gt :: 大於
gte :: 大於等於
lt :: 小於
lte :: 小於等於

7.4 exists查詢

exists 查詢能夠用於查找文檔中是否包含指定字段或沒有某個字段，相似於SQL語句中的 IS_NULL 條件

#POST /ela/_search
#
{
    "query": {
        "exists": {
            "field": "age"    
        }
    }
}
#

參考：https://blog.csdn.net/qq_29202513/article/details/103710554

7.5. match查詢

match 查詢是一個標準查詢，無論你須要全文本查詢仍是精確查詢基本上都要用到它

#POST /ela/_search
{
    "query": {
        "match": {
            "hobby": "羽毛球"
        }
    }
}

若是你使用 match 查詢一個全文本字段，它會在真正查詢以前用分析器先分析 match 一下查詢字符：

{ "match": { "age": 26 }}
{ "match": { "date": "2014-09-01" }}
{ "match": { "public": true }}
{ "match": { "tag": "full_text" }}

7.6 bool 查詢

bool 查詢能夠用來合併多個條件查詢結果的布爾邏輯，它包含一下操做符：

must :: 多個查詢條件的徹底匹配,至關於 and 。
must_not :: 多個查詢條件的相反匹配，至關於 not 。
should :: 至少有一個查詢條件匹配, 至關於 or 。

這些參數能夠分別繼承一個查詢條件或者一個查詢條件的數組：

{
    "bool": {
        "must": { "term": { "folder": "inbox" }},
        "must_not": { "term": { "tag": "spam" }},
        "should": [
            { "term": { "starred": true }},
            { "term": { "unread": true }}
        ]
    }
}

8. 過濾查詢

Elasticsearch也支持過濾查詢，如term、range、match等。
示例：查詢年齡爲20歲的用戶。

POST /ela/_search
#
{
    "query": {
        "bool": {
            "filter": {
                "term": {
                    "age": 20
                }
            }
        }
    }
}
#        
"hits": [
            {
                "_index": "ela",
                "_type": "_doc",
                "_id": "1r5njnUBAdKB-kbRFcG6",
                "_score": 0.0,
                "_source": {
                    "name": "張三",
                    "age": 20,
                    "mail": "111@qq.com",
                    "hobby": "羽毛球、乒乓球、足球"
                }
            }
        ]

查詢和過濾的對比

一條過濾語句會詢問每一個文檔的字段值是否包含着特定值。
查詢語句會詢問每一個文檔的字段值與特定值的匹配程度如何。
一條查詢語句會計算每一個文檔與查詢語句的相關性，會給出一個相關性評分 _score，而且按照相關性對匹配到的文檔進行排序。這種評分方式很是適用於一個沒有徹底配置結果的全文本搜索。
一個簡單的文檔列表，快速匹配運算並存入內存是十分方便的，每一個文檔僅須要1個字節。這些緩存的過濾結果集與後續請求的結合使用是很是高效的。
查詢語句不只要查找相匹配的文檔，還須要計算每一個文檔的相關性，因此通常來講查詢語句要比過濾語句更耗時，而且查詢結果也不可緩存

建議：作精確匹配搜索時，最好用過濾語句，由於過濾語句能夠緩存數據

ElasticSearch 核心詳解