ElasticSearch 核心詳解

核心詳解

1、文檔

Elasticsearch中,文檔以JSON格式進行存儲,能夠是複雜的結構如:node

{
    "_index": "haoke",
    "_type": "user",
    "_id": "1007",
    "_version": 1,
    "_score": 1,
    "_source": {
        "id": 1007,
        "name": "seven",
        "age": 20,
        "sex": "女",
        "card": {
            "card_number": "123456789"
        }
    }
}

其中,card是一個複雜對象,嵌套的Card對象。數據庫


1. 元數據

節點     說明
_index         文檔存儲的地方
_type     文檔表明的對象的類
_id     文檔的惟一標識

_indexjson

索引(index)相似於關係型數據庫裏的數據庫」——它是咱們存儲和索引關聯數據的地方
提示:
事實上,咱們的數據被存儲和索引在分片(shards)中,索引只是一個把一個或多個分片分組在一塊兒的邏輯空
間。然而,這只是一些內部細節
——咱們的程序徹底不用關心分片。對於咱們的程序而言,文檔存儲在索引
(index)中。剩下的細節由Elasticsearch關心既可
 api

_type數組

 在應用中,咱們使用對象表示一些事物,例如一個用戶、一篇博客、一個評論,或者一封郵件。每一個對象都屬於一個(class),這個類定義了屬性或與對象關聯的數據。 user 類的對象可能包含姓名、性別、年齡和Email地址。緩存


在關係型數據庫中,咱們常常將相同類的對象存儲在一個表裏,由於它們有着相同的結構。同理,在Elasticsearch中,咱們使用相同類型(type)的文檔表示相同的事物,由於他們的數據結構也是相同的。網絡


每一個類型(type)都有本身的映射(mapping)或者結構定義,就像傳統數據庫表中的列同樣。全部類型下的文檔被存儲在同一個索引下,可是類型的映射(mapping)會告訴Elasticsearch不一樣的文檔如何被索引。數據結構


_type 的名字能夠是大寫或小寫,不能包含下劃線或逗號。咱們將使用 blog 作爲類型名app

_id分佈式

id僅僅是一個字符串,它與 _index _type 組合時,就能夠在Elasticsearch中惟一標識一個文檔。當建立一個文檔,你能夠自定義 _id ,也可讓Elasticsearch幫你自動生成(32位長度)
 

二. 查詢響應

1. pretty

能夠在查詢url後面添加pretty參數,使得返回的json更易查看
 

GET haoke/user/1007?pretty

#響應
{
    "_index": "haoke",
    "_type": "user",
    "_id": "1007",
    "_version": 1,
    "_seq_no": 8,
    "_primary_term": 2,
    "found": true,
    "_source": {
        "id": 1007,
        "name": "seven",
        "age": 20,
        "sex": "女",
        "card": {
            "card_number": "123456789"
        }
    }
}

2.指定響應字段

在響應的數據中,若是咱們不須要所有的字段,能夠指定某些須要的字段進行返回
 

GET /haoke/user/1007?_source=id,name

#響應
{
    "_index": "haoke",
    "_type": "user",
    "_id": "1007",
    "_version": 1,
    "_seq_no": 8,
    "_primary_term": 2,
    "found": true,
    "_source": {
        "name": "seven",
        "id": 1007
    }
}

如不須要返回元數據,僅僅返回原始數據,能夠這樣:

GET /haoke/user/1007/_source

#響應

{
    "id": 1007,
    "name": "seven",
    "age": 20,
    "sex": "女",
    "card": {
        "card_number": "123456789"
    }
}

原始數據+指定字段

GET /haoke/user/1007/_source?_source=id,name
#響應
{
    "name": "seven",
    "id": 1007
}

3. 判斷文檔是否存在

若是咱們只須要判斷文檔是否存在,而不是查詢文檔內容,那麼能夠這樣:
文件存在時:

HEAD /haoke/user/1007

# 文件存在 響應爲空 
Status:200

文件不存在時:

HEAD /haoke/user/1009

# 文件不存在 
Status 404 NotFound

固然,這隻表示你在查詢的那一刻文檔不存在,但並不表示幾毫秒後依舊不存在。另外一個進程在這期間可能創
建新文檔。

4.批量操做

4.1. 批量查詢

POST  /haoke/user/_mget
#參數
{
    "ids":["1001","1002"]
}


#響應

{
    "docs": [
        {
            "_index": "haoke",
            "_type": "user",
            "_id": "1001",
            "_version": 3,
            "_seq_no": 3,
            "_primary_term": 2,
            "found": true,
            "_source": {
                "id": 1001,
                "name": "張三",
                "age": 23,
                "sex": "女"
            }
        },
        {
            "_index": "haoke",
            "_type": "user",
            "_id": "1002",
            "_version": 1,
            "_seq_no": 0,
            "_primary_term": 1,
            "found": true,
            "_source": {
                "id": 1002,
                "name": "張三",
                "age": 20,
                "sex": "男"
            }
        }
    ]
}

若是,某一條數據不存在,不影響總體響應,須要經過found的值進行判斷是否查詢到數據。

found:false 標識數據不存在

POST /haoke/user/_mget
# 參數

{
    "ids":["1001","1006"]
}
# 響應
{
    "docs": [
        {
            "_index": "haoke",
            "_type": "user",
            "_id": "1001",
            "_version": 3,
            "_seq_no": 3,
            "_primary_term": 2,
            "found": true,
            "_source": {
                "id": 1001,
                "name": "張三",
                "age": 23,
                "sex": "女"
            }
        },
        {
            "_index": "haoke",
            "_type": "user",
            "_id": "1006",
            "found": false
        }
    ]
}

4.2 _bulk操做

Elasticsearch中,支持批量的插入、修改、刪除操做,都是經過_bulkapi完成的。
請求格式:

{ action: { metadata }}\n
{ request body }\n
{ action: { metadata }}\n
{ request body }\n
...

4.2.1 批量添加

示例

注意  傳參最後一行必定回車

POST /haoke/user/_bulk

#參數:

{"create":{"_index":"haoke","_type":"user","_id":2001}}
{"id":2001,"name":"name1","age": 20,"sex": "男"}
{"create":{"_index":"haoke","_type":"user","_id":2002}}
{"id":2002,"name":"name2","age": 20,"sex": "男"}
{"create":{"_index":"haoke","_type":"user","_id":2003}}
{"id":2003,"name":"name3","age": 20,"sex": "男"}

#響應

{
    "took": 11,
    "errors": false,
    "items": [
        {
            "create": {
                "_index": "haoke",
                "_type": "user",
                "_id": "2001",
                "_version": 1,
                "result": "created",
                "_shards": {
                    "total": 1,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 9,
                "_primary_term": 2,
                "status": 201
            }
        },
        {
            "create": {
                "_index": "haoke",
                "_type": "user",
                "_id": "2002",
                "_version": 1,
                "result": "created",
                "_shards": {
                    "total": 1,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 3,
                "_primary_term": 2,
                "status": 201
            }
        },
        {
            "create": {
                "_index": "haoke",
                "_type": "user",
                "_id": "2003",
                "_version": 1,
                "result": "created",
                "_shards": {
                    "total": 1,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 4,
                "_primary_term": 2,
                "status": 201
            }
        }
    ]
}

4.2.2 批量刪除

 

POST /haoke/user/_bulk

#參數
{"delete":{"_index":"haoke","_type":"user","_id":2001}}
{"delete":{"_index":"haoke","_type":"user","_id":2002}}
{"delete":{"_index":"haoke","_type":"user","_id":2003}}

#響應
{
    "took": 11,
    "errors": false,
    "items": [
        {
            "delete": {
                "_index": "haoke",
                "_type": "user",
                "_id": "2001",
                "_version": 2,
                "result": "deleted",
                "_shards": {
                    "total": 1,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 10,
                "_primary_term": 2,
                "status": 200
            }
        },
        {
            "delete": {
                "_index": "haoke",
                "_type": "user",
                "_id": "2002",
                "_version": 2,
                "result": "deleted",
                "_shards": {
                    "total": 1,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 5,
                "_primary_term": 2,
                "status": 200
            }
        },
        {
            "delete": {
                "_index": "haoke",
                "_type": "user",
                "_id": "2003",
                "_version": 2,
                "result": "deleted",
                "_shards": {
                    "total": 1,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 6,
                "_primary_term": 2,
                "status": 200
            }
        }
    ]
}

其餘操做就相似了。
一次請求多少性能最高?

  • 整個批量請求須要被加載到接受咱們請求節點的內存裏,因此請求越大,給其它請求可用的內存就越小。有一個最佳的bulk請求大小。超過這個大小,性能再也不提高並且可能下降。
  • 最佳大小,固然並非一個固定的數字。它徹底取決於你的硬件、你文檔的大小和複雜度以及索引和搜索的負載。
  • 幸運的是,這個最佳點(sweetspot)仍是容易找到的:試着批量索引標準的文檔,隨着大小的增加,當性能開始下降,說明你每一個批次的大小太大了。開始的數量能夠在1000~5000個文檔之間,若是你的文檔很是大,可使用較小的批次。
  • 一般着眼於你請求批次的物理大小是很是有用的。一千個1kB的文檔和一千個1MB的文檔大不相同。一個好的批次最好保持在5-15MB大小間。

5.分頁

SQL使用 LIMIT 關鍵字返回只有一頁的結果同樣,Elasticsearch接受 from size 參數:

size: 結果數,默認10
from:
跳過開始的結果數,默認0

示例:

GET /_search?size=1&from=1

#響應
{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 2,
        "successful": 2,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 4,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "haoke",
                "_type": "user",
                "_id": "2003",
                "_score": 1.0,
                "_source": {
                    "id": 2003,
                    "name": "name3",
                    "age": 20,
                    "sex": "男"
                }
            }
        ]
    }
}

在集羣系統中深度分頁
爲了理解爲何深度分頁是有問題的,讓咱們假設在一個有5個主分片的索引中搜索。當咱們請求結果的第一頁(結果110)時,每一個分片產生本身最頂端10個結果真後返回它們給請求節點(requesting node),它再排序這全部的50個結果以選出頂端的10個結果。


如今假設咱們請求第1000——結果1000110010。工做方式都相同,不一樣的是每一個分片都必須產生頂端的10010個結果。而後請求節點排序這50050個結果並丟棄50040個!


你能夠看到在分佈式系統中,排序結果的花費隨着分頁的深刻而成倍增加。這也是爲何網絡搜索引擎中任何
語句不能返回多於
1000個結果的緣由。

6.映射

前面咱們建立的索引以及插入數據,都是由Elasticsearch進行自動判斷類型,有些時候咱們是須要進行明確字段類型的,不然,自動判斷的類型和實際需求是不相符的。
自動判斷的規則以下:

  • string類型在ElasticSearch 舊版本中使用較多,從ElasticSearch 5.x開始再也不支持string,由textkeyword類型替代。
  • text 類型,當一個字段是要被全文搜索的,好比Email內容、產品描述,應該使用text類型。設置text類型之後,字段內容會被分析,在生成倒排索引之前,字符串會被分析器分紅一個一個詞項。text類型的字段不用於排序,不多用於聚合。
  • keyword類型適用於索引結構化的字段,好比email地址、主機名、狀態碼和標籤。若是字段須要進行過濾(好比查找已發佈博客中status屬性爲published的文章)、排序、聚合。keyword類型的字段只能經過精確值搜索到
     

6.1 建立明確類型的索引

PUT /ela
#參數
{
    "settings": {
        "index": {
            "number_of_shards": "1"
        }
    },
    "mappings": {
            "properties": {
                "name": { "type": "text"},
                "age": {"type": "integer" },
                "mail": {"type": "keyword" },
                "hobby": {  "type": "text"}
            }
    }
}
#響應
{
    "acknowledged": true,
    "shards_acknowledged": true,
    "index": "ela"
}

6.2 查尋已經建立的映射:

GET  /ela/_mapping

#響應
{
    "ela": {
        "mappings": {
            "properties": {
                "age": {
                    "type": "integer"
                },
                "hobby": {
                    "type": "text"
                },
                "mail": {
                    "type": "keyword"
                },
                "name": {
                    "type": "text"
                }
            }
        }
    }
}

6.3 插入數據

POST /ela/_bulk

#參數

{"index":{"_index":"ela"}}
{"name":"張三","age": 20,"mail": "111@qq.com","hobby":"羽毛球、乒乓球、足球"}
{"index":{"_index":"ela"}}
{"name":"李四","age": 21,"mail": "222@qq.com","hobby":"羽毛球、乒乓球、足球、籃球"}
{"index":{"_index":"ela"}}
{"name":"王五","age": 22,"mail": "333@qq.com","hobby":"羽毛球、籃球、游泳、聽音樂"}
{"index":{"_index":"ela"}}
{"name":"趙六","age": 23,"mail": "444@qq.com","hobby":"跑步、游泳"}
{"index":{"_index":"ela"}}
{"name":"孫七","age": 24,"mail": "555@qq.com","hobby":"聽音樂、看電影"}


#響應
{
    "took": 12,
    "errors": false,
    "items": [
        {
            "index": {
                "_index": "ela",
                "_type": "_doc",
                "_id": "1r5njnUBAdKB-kbRFcG6",
                "_version": 1,
                "result": "created",
                "_shards": {
                    "total": 2,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 0,
                "_primary_term": 1,
                "status": 201
            }
        },
        {
            "index": {
                "_index": "ela",
                "_type": "_doc",
                "_id": "175njnUBAdKB-kbRFcG6",
                "_version": 1,
                "result": "created",
                "_shards": {
                    "total": 2,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 1,
                "_primary_term": 1,
                "status": 201
            }
        },
        {
            "index": {
                "_index": "ela",
                "_type": "_doc",
                "_id": "2L5njnUBAdKB-kbRFcG6",
                "_version": 1,
                "result": "created",
                "_shards": {
                    "total": 2,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 2,
                "_primary_term": 1,
                "status": 201
            }
        },
        {
            "index": {
                "_index": "ela",
                "_type": "_doc",
                "_id": "2b5njnUBAdKB-kbRFcG6",
                "_version": 1,
                "result": "created",
                "_shards": {
                    "total": 2,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 3,
                "_primary_term": 1,
                "status": 201
            }
        },
        {
            "index": {
                "_index": "ela",
                "_type": "_doc",
                "_id": "2r5njnUBAdKB-kbRFcG6",
                "_version": 1,
                "result": "created",
                "_shards": {
                    "total": 2,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 4,
                "_primary_term": 1,
                "status": 201
            }
        }
    ]
}

 

6.4 測試搜索

POST /ela/_search

#參數
{
    "query": {
        "match": {
            "hobby": "音樂"
        }
    }
}

#響應
{
    "took": 18,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 2,
            "relation": "eq"
        },
        "max_score": 1.9159472,
        "hits": [
            {
                "_index": "ela",
                "_type": "_doc",
                "_id": "2r5njnUBAdKB-kbRFcG6",
                "_score": 1.9159472,
                "_source": {
                    "name": "孫七",
                    "age": 24,
                    "mail": "555@qq.com",
                    "hobby": "聽音樂、看電影"
                }
            },
            {
                "_index": "ela",
                "_type": "_doc",
                "_id": "2L5njnUBAdKB-kbRFcG6",
                "_score": 1.5506182,
                "_source": {
                    "name": "王五",
                    "age": 22,
                    "mail": "333@qq.com",
                    "hobby": "羽毛球、籃球、游泳、聽音樂"
                }
            }
        ]
    }
}

7. 結構化查詢 

7.1 term查詢

term 主要用於精確匹配哪些值,好比數字,日期,布爾值或 not_analyzed 的字符串(未經分析的文本數據類型)

POST /ela/_search

#參數
{
    "query": {
        "term": {
            "age":20
        }
    }
}
#響應
{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "ela",
                "_type": "_doc",
                "_id": "1r5njnUBAdKB-kbRFcG6",
                "_score": 1.0,
                "_source": {
                    "name": "張三",
                    "age": 20,
                    "mail": "111@qq.com",
                    "hobby": "羽毛球、乒乓球、足球"
                }
            }
        ]
    }
}

7.2 terms查詢

terms term 有點相似,但 terms 容許指定多個匹配條件。 若是某個字段指定了多個值,那麼文檔須要一塊兒去作匹配
 

POST /ela/_search
{
    "query": {
        "terms": {
            "age":[20,21]
        }
    }
}
#響應
"hits": [
            {
                "_index": "ela",
                "_type": "_doc",
                "_id": "1r5njnUBAdKB-kbRFcG6",
                "_score": 1.0,
                "_source": {
                    "name": "張三",
                    "age": 20,
                    "mail": "111@qq.com",
                    "hobby": "羽毛球、乒乓球、足球"
                }
            },
            {
                "_index": "ela",
                "_type": "_doc",
                "_id": "175njnUBAdKB-kbRFcG6",
                "_score": 1.0,
                "_source": {
                    "name": "李四",
                    "age": 21,
                    "mail": "222@qq.com",
                    "hobby": "羽毛球、乒乓球、足球、籃球"
                }
            }
]

7.3 range查詢

range 過濾容許咱們按照指定範圍查找一批數據:
 

#POST /ela/_search

#
{
    "query": {
        "range": {
            "age":{
                "gte":20,
                "lt":22
            }
        }
    }
}
#
"hits": [
            {
                "_index": "ela",
                "_type": "_doc",
                "_id": "1r5njnUBAdKB-kbRFcG6",
                "_score": 1.0,
                "_source": {
                    "name": "張三",
                    "age": 20,
                    "mail": "111@qq.com",
                    "hobby": "羽毛球、乒乓球、足球"
                }
            },
            {
                "_index": "ela",
                "_type": "_doc",
                "_id": "175njnUBAdKB-kbRFcG6",
                "_score": 1.0,
                "_source": {
                    "name": "李四",
                    "age": 21,
                    "mail": "222@qq.com",
                    "hobby": "羽毛球、乒乓球、足球、籃球"
                }
            }
        ]

範圍操做符包含:
gt :: 大於
gte :: 大於等於
lt :: 小於
lte :: 小於等於


7.4 exists查詢

exists 查詢能夠用於查找文檔中是否包含指定字段或沒有某個字段,相似於SQL語句中的 IS_NULL 條件
 

#POST /ela/_search
#
{
    "query": {
        "exists": {
            "field": "age"    
        }
    }
}
#

參考:https://blog.csdn.net/qq_29202513/article/details/103710554

7.5. match查詢

match 查詢是一個標準查詢,無論你須要全文本查詢仍是精確查詢基本上都要用到它

#POST /ela/_search
{
    "query": {
        "match": {
            "hobby": "羽毛球"
        }
    }
}

若是你使用 match 查詢一個全文本字段,它會在真正查詢以前用分析器先分析 match 一下查詢字符:

{ "match": { "age": 26 }}
{ "match": { "date": "2014-09-01" }}
{ "match": { "public": true }}
{ "match": { "tag": "full_text" }}

7.6 bool 查詢

bool 查詢能夠用來合併多個條件查詢結果的布爾邏輯,它包含一下操做符:

  • must :: 多個查詢條件的徹底匹配,至關於 and
  • must_not :: 多個查詢條件的相反匹配,至關於 not
  • should :: 至少有一個查詢條件匹配, 至關於 or

這些參數能夠分別繼承一個查詢條件或者一個查詢條件的數組:
 

{
    "bool": {
        "must": { "term": { "folder": "inbox" }},
        "must_not": { "term": { "tag": "spam" }},
        "should": [
            { "term": { "starred": true }},
            { "term": { "unread": true }}
        ]
    }
}

8. 過濾查詢

Elasticsearch也支持過濾查詢,如termrangematch等。
示例:查詢年齡爲20歲的用戶。
 

POST /ela/_search
#
{
    "query": {
        "bool": {
            "filter": {
                "term": {
                    "age": 20
                }
            }
        }
    }
}
#        
"hits": [
            {
                "_index": "ela",
                "_type": "_doc",
                "_id": "1r5njnUBAdKB-kbRFcG6",
                "_score": 0.0,
                "_source": {
                    "name": "張三",
                    "age": 20,
                    "mail": "111@qq.com",
                    "hobby": "羽毛球、乒乓球、足球"
                }
            }
        ]

查詢和過濾的對比

  • 一條過濾語句會詢問每一個文檔的字段值是否包含着特定值。
  • 查詢語句會詢問每一個文檔的字段值與特定值的匹配程度如何。
  • 一條查詢語句會計算每一個文檔與查詢語句的相關性,會給出一個相關性評分 _score,而且 按照相關性對匹配到的文檔進行排序。 這種評分方式很是適用於一個沒有徹底配置結果的全文本搜索。
  • 一個簡單的文檔列表,快速匹配運算並存入內存是十分方便的, 每一個文檔僅須要1個字節。這些緩存的過濾結果集與後續請求的結合使用是很是高效的。
  • 查詢語句不只要查找相匹配的文檔,還須要計算每一個文檔的相關性,因此通常來講查詢語句要比 過濾語句更耗時,而且查詢結果也不可緩存
     

建議: 作精確匹配搜索時,最好用過濾語句,由於過濾語句能夠緩存數據  

相關文章
相關標籤/搜索