1、Es中建立索引
1.建立索引:html
在以前的Es插件的安裝和使用中說到建立索引自定義分詞器和建立type,當時是分開寫的,其實建立索引時也能夠建立type,並指定分詞器。spring
PUT /my_index { "settings": { "analysis": { "analyzer": { "ik_smart_pinyin": { "type": "custom", "tokenizer": "ik_smart", "filter": ["my_pinyin", "word_delimiter"] }, "ik_max_word_pinyin": { "type": "custom", "tokenizer": "ik_max_word", "filter": ["my_pinyin", "word_delimiter"] } }, "filter": { "my_pinyin": { "type" : "pinyin", "keep_separate_first_letter" : true, "keep_full_pinyin" : true, "keep_original" : true, "limit_first_letter_length" : 16, "lowercase" : true, "remove_duplicated_term" : true } } } }, "mappings": { "my_type":{ "properties": { "id":{ "type": "integer" }, "name":{ "type": "text", "analyzer": "ik_max_word_pinyin" }, "age":{ "type":"integer" } } } } }
2.添加數據緩存
POST /my_index/my_type/_bulk
{ "index": { "_id":1}}
{ "id":1,"name": "張三","age":20}
{ "index": { "_id": 2}}
{ "id":2,"name": "張四","age":22}
{ "index": { "_id": 3}}
{ "id":3,"name": "張三李四王五","age":20}app
3.查看數據類型elasticsearch
GET /my_index/my_type/_mapping 結果: { "my_index": { "mappings": { "my_type": { "properties": { "age": { "type": "integer" }, "id": { "type": "integer" }, "name": { "type": "text", "analyzer": "ik_max_word_pinyin" } } } } } }
2、結合JAVA(在這以前需在項目中配置好es,網上有好多例子能夠參考)ide
1.建立Es實體類post
package com.example.es_query_list.entity.es;
import lombok.Getter;
import lombok.Setter;
import org.springframework.data.annotation.Id;
import org.springframework.data.elasticsearch.annotations.Document;
@Setter
@Getter
@Document(indexName = "my_index",type = "my_type")
public class User {
@Id
private Integer id;
private String name;
private Integer age;
}
2.建立dao層性能
package com.example.es_query_list.repository.es; import com.example.es_query_list.entity.es.User; import org.springframework.data.elasticsearch.repository.ElasticsearchRepository; public interface EsUserRepository extends ElasticsearchRepository<User,Integer> { }
3、基本工做完成後,開始查詢ui
1.精確值查詢spa
查詢非文本類型數據
GET /my_index/my_type/_search { "query": { "term": { "age": { "value": "20" } } } } 結果: { "took": 0, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 2, "max_score": 1, "hits": [ { "_index": "my_index", "_type": "my_type", "_id": "1", "_score": 1, "_source": { "name": "張三", "age": 20 } }, { "_index": "my_index", "_type": "my_type", "_id": "3", "_score": 1, "_source": { "name": "李四", "age": 20 } } ] } }
2.查詢文本類型
{ "took": 0, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 0, "max_score": null, "hits": [] } }
這時小夥伴們可能看到查詢結果爲空,爲何精確匹配卻查不到我輸入的準確值呢???以前說過我們在建立type時,字段指定的分詞器,若是輸入未被分析出來的詞是查不到結果的,讓咱們證實一下!!!!
首先先查看一下我們查詢的詞被分析成哪幾部分
GET my_index/_analyze { "text":"張三李四王五", "analyzer": "ik_max_word" } 結果: { "tokens": [ { "token": "張三李四", "start_offset": 0, "end_offset": 4, "type": "CN_WORD", "position": 0 }, { "token": "張三", "start_offset": 0, "end_offset": 2, "type": "CN_WORD", "position": 1 }, { "token": "三", "start_offset": 1, "end_offset": 2, "type": "TYPE_CNUM", "position": 2 }, { "token": "李四", "start_offset": 2, "end_offset": 4, "type": "CN_WORD", "position": 3 }, { "token": "四", "start_offset": 3, "end_offset": 4, "type": "TYPE_CNUM", "position": 4 }, { "token": "王", "start_offset": 4, "end_offset": 5, "type": "CN_CHAR", "position": 5 }, { "token": "五", "start_offset": 5, "end_offset": 6, "type": "TYPE_CNUM", "position": 6 } ] }
結果說明,張三李四王五被沒有被分析成張三李四王五,因此查詢結果爲空。
POST /my_index/_mapping/my_type { "properties": { "name": { "type": "text", "analyzer": "ik_max_word_pinyin", "fields": { "keyword":{ //自定義映射名 "type": "keyword" } } } } }
設置好完成後,需將原有的數據刪除在添加一遍,再次查詢就能查到了
public List<User> termQuery() { QueryBuilder queryBuilder = QueryBuilders.termQuery("age",20); // QueryBuilder queryBuilder = QueryBuilders.termQuery("name.keyword","張三李四王五"); SearchQuery searchQuery = new NativeSearchQueryBuilder() .withIndices("my_index") .withTypes("my_type") .withQuery(queryBuilder) .build(); List<User> list = template.queryForList(searchQuery,User.class); return list; }
4、組合過濾器
注意:官方文檔有點問題,在5.X後,filtered 被bool代替了,The filtered
query is replaced by the bool query。
一個 bool
過濾器由三部分組成:
{ "bool" : { "must" : [], "should" : [], "must_not" : [], } }
must
全部的語句都 必須(must) 匹配,與 AND
等價。
must_not
全部的語句都 不能(must not) 匹配,與 NOT
等價。
should
至少有一個語句要匹配,與 OR
等價。
GET /my_index/my_type/_search { "query" : { "bool" : { "should" : [ { "term" : {"age" : 20}}, { "term" : {"age" : 30}} ], "must" : { "term" : {"name.keyword" : "張三"} } } } }
public List<User> boolQuery() { BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery(); boolQueryBuilder.should(QueryBuilders.termQuery("age",20)); boolQueryBuilder.should(QueryBuilders.termQuery("age",30)); boolQueryBuilder.must(QueryBuilders.termQuery("name.keyword","張三")); SearchQuery searchQuery = new NativeSearchQueryBuilder() .withIndices("my_index") .withTypes("my_type") .withQuery(boolQueryBuilder) .build(); List<User> list = template.queryForList(searchQuery,User.class); return list; }
儘管 bool
是一個複合的過濾器,能夠接受多個子過濾器,須要注意的是 bool
過濾器自己仍然還只是一個過濾器。 這意味着咱們能夠將一個 bool
過濾器置於其餘 bool
過濾器內部,這爲咱們提供了對任意複雜布爾邏輯進行處理的能力。
GET /my_index/my_type/_search { "query" : { "bool" : { "should" : [ { "term" : {"age" : 20}}, { "bool" : { "must": [ {"term": { "name.keyword": { "value": "李四" } }} ] }} ] } } } 結果: { "took": 0, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 2, "max_score": 1, "hits": [ { "_index": "my_index", "_type": "my_type", "_id": "1", "_score": 1, "_source": { "id": 1, "name": "張三", "age": 20 } }, { "_index": "my_index", "_type": "my_type", "_id": "3", "_score": 1, "_source": { "id": 3, "name": "張三李四王五", "age": 20 } } ] } }
由於 term
和 bool
過濾器是兄弟關係,他們都處於外層的布爾邏輯 should
的內部,返回的命中文檔至少須匹配其中一個過濾器的條件。
這兩個 term
語句做爲兄弟關係,同時處於 must
語句之中,因此返回的命中文檔要必須都能同時匹配這兩個條件。
5、查找多個精確值
GET my_index/my_type/_search { "query": { "terms": { "age": [ 20, 22 ] } } } 結果: { "took": 0, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 3, "max_score": 1, "hits": [ { "_index": "my_index", "_type": "my_type", "_id": "2", "_score": 1, "_source": { "id": 2, "name": "張四", "age": 22 } }, { "_index": "my_index", "_type": "my_type", "_id": "1", "_score": 1, "_source": { "id": 1, "name": "張三", "age": 20 } }, { "_index": "my_index", "_type": "my_type", "_id": "3", "_score": 1, "_source": { "id": 3, "name": "張三李四王五", "age": 20 } } ] } }
必定要了解 term
和 terms
是 包含(contains) 操做,而非 等值(equals) (判斷)。
TermsQueryBuilder termsQueryBuilder = QueryBuilders.termsQuery("age",list);
6、範圍查詢
一、數字範圍查詢
GET my_index/my_type/_search { "query": { "range": { "age": { "gte": 10, "lte": 20 } } } } 結果: { "took": 0, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 2, "max_score": 1, "hits": [ { "_index": "my_index", "_type": "my_type", "_id": "1", "_score": 1, "_source": { "id": 1, "name": "張三", "age": 20 } }, { "_index": "my_index", "_type": "my_type", "_id": "3", "_score": 1, "_source": { "id": 3, "name": "張三李四王五", "age": 20 } } ] } }
RangeQueryBuilder rangeQueryBuilder = QueryBuilders.rangeQuery("age").gte(10).lte(20);
2.對於時間範圍查詢
更新type,添加時間字段
POST /my_index/_mapping/my_type
{
"properties": {
"date":{
"type":"date",
"format":"yyyy-MM-dd"
}
}
}
添加數據:
POST /my_index/my_type/_bulk { "index": { "_id":4}} { "id":4,"name": "趙六","age":20,"date":"2018-10-1"} { "index": { "_id": 5}} { "id":5,"name": "對七","age":22,"date":"2018-11-20"} { "index": { "_id": 6}} { "id":6,"name": "王八","age":20,"date":"2018-7-28"}
查詢:
GET my_index/my_type/_search { "query": { "range": { "date": { "gte": "2018-10-20", "lte": "2018-11-29" } } } } 結果: { "took": 0, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 1, "max_score": 1, "hits": [ { "_index": "my_index", "_type": "my_type", "_id": "5", "_score": 1, "_source": { "id": 5, "name": "對七", "age": 22, "date": "2018-11-20" } } ] } }
RangeQueryBuilder rangeQueryBuilder = QueryBuilders.rangeQuery("date").gte("2018-10-20").lte("2018-11-29");
7、處理null值
1.添加數據
POST /my_index/posts/_bulk { "index": { "_id": "1" }} { "tags" : ["search"] } { "index": { "_id": "2" }} { "tags" : ["search", "open_source"] } { "index": { "_id": "3" }} { "other_field" : "some data" } { "index": { "_id": "4" }} { "tags" : null } { "index": { "_id": "5" }} { "tags" : ["search", null] }
2.查詢指定字段存在的數據
GET /my_index/posts/_search { "query" : { "constant_score" : { //不在去計算評分,默認都是1 "filter" : { "exists" : { "field" : "tags" } } } } } 結果: { "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 3, "max_score": 1, "hits": [ { "_index": "my_index", "_type": "posts", "_id": "5", "_score": 1, "_source": { "tags": [ "search", null ] } }, { "_index": "my_index", "_type": "posts", "_id": "2", "_score": 1, "_source": { "tags": [ "search", "open_source" ] } }, { "_index": "my_index", "_type": "posts", "_id": "1", "_score": 1, "_source": { "tags": [ "search" ] } } ] } }
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
boolQueryBuilder.filter(QueryBuilders.constantScoreQuery(QueryBuilders.existsQuery("tags")));
3.查詢指定字段缺失數據
注:Filter Query Missing 已經從 ES 5 版本移除
GET /my_index/posts/_search { "query" : { "bool": { "must_not": [ {"constant_score": { "filter": { "exists": { "field": "tags" }} }} ] } } } 查詢結果: { "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 2, "max_score": 1, "hits": [ { "_index": "my_index", "_type": "posts", "_id": "4", "_score": 1, "_source": { "tags": null } }, { "_index": "my_index", "_type": "posts", "_id": "3", "_score": 1, "_source": { "other_field": "some data" } } ] } }
注:處理null值,當字段內容爲空時,將自定義將其當作爲null值處理
boolQueryBuilder.mustNot(QueryBuilders.boolQuery().filter(QueryBuilders.constantScoreQuery(QueryBuilders.existsQuery("tags"))));
8、關於緩存
1.核心
其核心實際是採用一個 bitset 記錄與過濾器匹配的文檔。Elasticsearch 積極地把這些 bitset 緩存起來以備隨後使用。一旦緩存成功,bitset 能夠複用 任何 已使用過的相同過濾器,而無需再次計算整個過濾器。
這些 bitsets 緩存是「智能」的:它們以增量方式更新。當咱們索引新文檔時,只需將那些新文檔加入已有 bitset,而不是對整個緩存一遍又一遍的重複計算。和系統其餘部分同樣,過濾器是實時的,咱們無需擔憂緩存過時問題。
2.獨立的過濾器緩存
屬於一個查詢組件的 bitsets 是獨立於它所屬搜索請求其餘部分的。這就意味着,一旦被緩存,一個查詢能夠被用做多個搜索請求。bitsets 並不依賴於它所存在的查詢上下文。這樣使得緩存能夠加速查詢中常用的部分,從而下降較少、易變的部分所帶來的消耗。
一樣,若是單個請求重用相同的非評分查詢,它緩存的 bitset 能夠被單個搜索裏的全部實例所重用。
讓咱們看看下面例子中的查詢,它查找知足如下任意一個條件的電子郵件:
查詢條件(例子):(1)在收件箱中,且沒有被讀過的 (2)不在 收件箱中,但被標註重要的
GET /inbox/emails/_search { "query": { "constant_score": { "filter": { "bool": { "should": [ { "bool": { 1 "must": [ { "term": { "folder": "inbox" }}, { "term": { "read": false }} ] }}, { "bool": { 2 "must_not": { "term": { "folder": "inbox" } }, "must": { "term": { "important": true } } }} ] } } } } }
1和2共用的一個過濾器,因此使用同一個bitset
儘管其中一個收件箱的條件是 must
語句,另外一個是 must_not
語句,但他們二者是徹底相同的。這意味着在第一個語句執行後, bitset 就會被計算而後緩存起來供另外一個使用。當再次執行這個查詢時,收件箱的這個過濾器已經被緩存了,因此兩個語句都會使用已緩存的 bitset 。
這點與查詢表達式(query DSL)的可組合性結合得很好。它易被移動到表達式的任何地方,或者在同一查詢中的多個位置複用。這不只能方便開發者,並且對提高性能有直接的益處。
3.自動緩存行爲
在 Elasticsearch 的較早版本中,默認的行爲是緩存一切能夠緩存的對象。這也一般意味着系統緩存 bitsets 太富侵略性,從而由於清理緩存帶來性能壓力。不只如此,儘管不少過濾器都很容易被評價,但本質上是慢於緩存的(以及從緩存中複用)。緩存這些過濾器的意義不大,由於能夠簡單地再次執行過濾器。
檢查一個倒排是很是快的,而後絕大多數查詢組件卻不多使用它。例如 term
過濾字段 "user_id"
:若是有上百萬的用戶,每一個具體的用戶 ID 出現的機率都很小。那麼爲這個過濾器緩存 bitsets 就不是很合算,由於緩存的結果極可能在重用以前就被剔除了。
這種緩存的擾動對性能有着嚴重的影響。更嚴重的是,它讓開發者難以區分有良好表現的緩存以及無用緩存。
爲了解決問題,Elasticsearch 會基於使用頻次自動緩存查詢。若是一個非評分查詢在最近的 256 次查詢中被使用過(次數取決於查詢類型),那麼這個查詢就會做爲緩存的候選。可是,並非全部的片斷都能保證緩存 bitset 。只有那些文檔數量超過 10,000 (或超過總文檔數量的 3% )纔會緩存 bitset 。由於小的片斷能夠很快的進行搜索和合並,這裏緩存的意義不大。
一旦緩存了,非評分計算的 bitset 會一直駐留在緩存中直到它被剔除。剔除規則是基於 LRU 的:一旦緩存滿了,最近最少使用的過濾器會被剔除。