結構化搜索針對日期、時間、數字等結構化數據的搜索,它們有本身的格式,咱們能夠對它們進行範圍,比較大小等邏輯操做,這些邏輯操做獲得的結果非黑即白,要麼符合條件在結果集裏,要麼不符合條件在結果集以外,沒有那種類似的概念。java
結構化搜索將會有大量的搜索實例,咱們將"音樂APP"做爲主要的案例背景,去開發一些跟音樂APP相關的搜索或數據分析,有助力於咱們理解實戰的目標,順帶鞏固一下學習的知識。mysql
咱們將一首歌須要的字段暫定爲:
| name | code | type | remark |
| :---- | :--: | :--: | -----: |
| ID | id | keyword | 文檔ID |
| 歌手 | author | text | |
| 歌曲名稱 | name | text | |
| 歌詞 | content | text | |
| 語種 | language | text | |
| 標籤 | tags | text | |
| 歌曲時長 | length | long | 記錄秒數 |
| 喜歡次數 | likes | long | 點擊喜歡1次,自增1 |
| 是否發佈 | isRelease | boolean | true已發佈,false未發佈 |
| 發佈日期 | releaseDate | date | |sql
咱們手動定義的索引mapping信息以下:數組
PUT /music { "mappings": { "children": { "properties": { "id": { "type": "keyword" }, "author_first_name": { "type": "text", "analyzer": "english" }, "author_last_name": { "type": "text", "analyzer": "english" }, "author": { "type": "text", "analyzer": "english", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "name": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "content": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "language": { "type": "text", "analyzer": "english", "fielddata": true }, "tags": { "type": "text", "analyzer": "english" }, "length": { "type": "long" }, "likes": { "type": "long" }, "isRelease": { "type": "boolean" }, "releaseDate": { "type": "date" } } } } }
咱們預先導入一批數據進去:緩存
POST /music/children/_bulk { "index": { "_id": 1 }} { "id" : "34116101-7fa2-5630-a1a4-1735e19d2834", "author_first_name":"Peter", "author_last_name":"Gymbo", "author" : "Peter Gymbo", "name": "gymbo", "content":"I hava a friend who loves smile, gymbo is his name", "language":"english", "tags":["enlighten","gymbo","friend"], "length":53, "likes": 5, "isRelease":true, "releaseDate": "2019-12-20" } { "index": { "_id": 2 }} { "id" : "34117101-54cb-59a1-9b7a-82adb46fa58d", "author_first_name":"John", "author_last_name":"Smith", "author" : "John Smith", "name": "wake me, shark me", "content":"don't let me sleep too late, gonna get up brightly early in the morning", "language":"english", "tags":["wake","early","morning"], "length":55, "likes": 8,"isRelease":true, "releaseDate": "2019-12-21" } { "index": { "_id": 3 }} { "id" : "34117201-8d01-49d4-a495-69634ae67017", "author_first_name":"Jimmie", "author_last_name":"Davis", "author" : "Jimmie Davis", "name": "you are my sunshine", "content":"you are my sunshine, my only sunshine, you make me happy, when skies are gray", "language":"english", "tags":["sunshine","happy"], "length":65,"likes": 12, "isRelease":true, "releaseDate": "2019-12-22" } { "index": { "_id": 4 }} { "id" : "55fa74f7-35f3-4313-a678-18c19c918a78", "author_first_name":"Peter", "author_last_name":"Raffi", "author" : "Peter Raffi", "name": "brush your teeth", "content":"When you wake up in the morning it's a quarter to one, and you want to have a little fun You brush your teeth", "language":"english", "tags":"teeth", "length":45,"likes": 17, "isRelease":true, "releaseDate": "2019-12-22" } { "index": { "_id": 5 }} { "id" : "1740e61c-63da-474f-9058-c2ab3c4f0b0a", "author_first_name":"Jean", "author_last_name":"Ritchie", "author" : "Jean Ritchie", "name": "love somebody", "content":"love somebody, yes I do", "language":"english", "tags":"love", "length":38, "likes": 3,"isRelease":true, "releaseDate": "2019-12-22" }
咱們根據文檔的mapping設計,能夠按ID、按日期進行查找。微信
GET /music/children/_search { "query" : { "constant_score" : { "filter" : { "term" : { "id" : "34116101-7fa2-5630-a1a4-1735e19d2834" } } } } }
注意ID創建時,類型是指定爲keyword,這樣ID在索引時不會進行分詞。若是類型爲text,UUID值在索引時會分詞,這樣反而查不到結果了。架構
GET /music/children/_search { "query" : { "constant_score" : { "filter" : { "term" : { "releaseDate" : "2019-12-21" } } } } }
GET /music/children/_search { "query" : { "constant_score" : { "filter" : { "term" : { "length" : 53 } } } } }
GET /music/children/_search { "query" : { "constant_score" : { "filter" : { "term" : { "isRelease" : true } } } } }
以上3個小例子能夠發現:準確值搜索對keyword、日期、數字、boolean值自然支持。併發
前面的4個小例子都是單條件過濾的,實際的需求確定會有多個條件,不過萬變不離其宗,再複雜的搜索需求,也是由一個一個的基礎條件複合而成的,咱們來看幾個簡單的組合過濾的例子。app
複習一下以前學過的邏輯:分佈式
GET /music/children/_search { "query": { "constant_score": { "filter": { "bool": { "should": [ {"term":{ "releaseDate":"2019-12-20" }}, {"term":{ "id":"2a8f4288-c0a9-5c9b-8f99-67339b66f4c0" }} ], "must_not": { "term": { "releaseDate":"2019-12-21" } } } } } } }
GET /music/children/_search { "query": { "constant_score": { "filter": { "bool": { "should": [ {"term":{ "id":"2a8f4288-c0a9-5c9b-8f99-67339b66f4c0" }}, { "bool": { "must" : [ { "term" : { "id":"34116101-7fa2-5630-a1a4-1735e19d2834" }}, { "term" : { "releaseDate":"2019-12-20" }} ] } } ] } } } } }
使用語法terms,能夠同時搜索多個值,相似mysql的in語句。
GET /music/children/_search { "query": { "constant_score": { "filter": { "terms": { "id": [ "34116101-7fa2-5630-a1a4-1735e19d2834", "99268c7e-8308-569a-a975-bbce7d3f9a8e" ] } } } } }
針對Long類型和date類型的數據,是支持範圍查詢的,使用gt、lt、gte、lte來完成範圍的判斷。與mysql的>、<、>=、<=以及between...and殊途同歸。
對Long類型的範圍查詢,直接使用範圍表達式:
GET /music/children/_search { "query": { "constant_score": { "filter": { "range": { "length": { "gte": 45, "lte": 60 } } } } } }
針對日期的範圍搜索,除了直接寫日期,加上常規的範圍表達式以外,還可使用+1d、-1d表示對指定日期的加減,如"2019-12-21||-1d"表示"2019-12-20",也可使用now-1d表示昨天,挺有趣。
給個示例:搜索2019-12-21前一天新發布的歌曲
GET /music/children/_search { "query": { "constant_score": { "filter": { "range": { "releaseDate" :{ "gt":"2019-12-21||-1d" } } } } } }
倒排索引在創建時,是不接受空值的,這就意味着null,[],[null]這些各類形式的null值,不沒法存入倒排索引的,那這樣怎麼辦?
Elasticsearch提供了兩種查詢,相似於mysql的is not null和not exists。
exists查詢,會返回那些指定字段有值的文檔,與mysql的is not null相似。
案例中的tags字段,就是一個選填項,有些記錄多是null值,若是我須要查詢全部的tags值的記錄,請求以下:
GET /music/children/_search { "query": { "constant_score": { "filter": { "exists": { "field": "tags" } } } } }
缺失查詢原來是有關鍵字missing表示,效果與exists相反,語法上與mysql的is null相似,但6.x版本就已經廢棄了,咱們能夠改用must not + exists實現相同的效果。
仍是使用tags字段爲例,查詢tags爲空的文檔:
GET /music/children/_search { "query": { "bool": { "must_not": { "exists": { "field": "tags" } } } } }
過濾器爲何效率那麼高?除了自己的設計集合來達到高效過濾以外,還將查詢結果適當地緩存化。
咱們瞭解一下Elasticsearch對過濾器的簡單操做:
filter比query好處是會caching,下次不用查倒排索引,filter大部分狀況下在query以前執行query會計算doc對搜索條件的relevance score,還會根據這個score去排序
filter簡單過濾出想要的數據,不計算relevance score,也不排序
緩存的更新很是智能,增量更新的方式,若是有document新增或修改時,會將新文檔加入bitset,而不是刪除緩存或整個從新計算。
本篇前半部分使用了大量的示例,能夠快速閱讀,後面介紹了filter的過濾原理及緩存處理機制,能夠了解一下,謝謝。
專一Java高併發、分佈式架構,更多技術乾貨分享與心得,請關注公衆號:Java架構社區
能夠掃左邊二維碼添加好友,邀請你加入Java架構社區微信羣共同探討技術