Query DSL (資料來自: http://www.elasticsearch.cn/guide/reference/query-dsl/)
http://elasticsearch.qiniudn.com/express
elasticsearch 提供基於JSON的完整的Query DSL查詢表達式(DSL即領域專用語言). 通常來講, 普通的查詢如 term 或者 prefix. 另外還有混合查詢如 bool 等. 另外查詢表達式(Queries)還可以關聯特定的過濾表達式,如 filtered 或者 constant_score 查詢.
你能夠把Query DSL看成是一系列的抽象的查詢表達式樹( AST ). 特定查詢可以包含其它的查詢,(如 bool ), 有些查詢可以包含過濾器(如 constant_score), 還有的能夠同時包含查詢和過濾器 (如 filtered). 都可以從ES支持查詢集合裏面選擇任意一個查詢或者是從過濾器集合裏面挑選出任意一個過濾器, 這樣的話,咱們就能夠構造出任意複雜(maybe 很是有趣)的查詢了,是否是很靈活啊.
查詢和過濾均可以被用於各類不一樣的API接口裏面. 如 search query, 或者是 facet filter 等等. 本章會介紹構造AST可以用到的各類查詢或者過濾器.
提示. 過濾器很是有用由於他們比簡單的查詢更快(不進行文檔評分)而且會自動緩存.
過濾器和緩存(Filters and Caching)
過濾器是用來實現緩存的很好的辦法. 由於緩存這些過濾結果並不須要太多的內存, 並且其它的查詢能夠重用這些過濾(注意是一樣參數哦),因此速度是刷刷的.
某些過濾產生的結果是很易於緩存的,有關緩存與否的區別在因而否將過濾結果存放到緩存中,像以下過濾器如 term, terms, prefix, 和 range 默認就是會進行緩存的, 而且建議使用這些過濾條件而不使用同等效果的查詢.
其它過濾器,通常會將字段數據加載到內存中來工做, 默認是不緩存結果的. 這些過濾操做的速度其實已經很是快了,若是將它們的結果緩存須要作額外的操做來使它們可以被其它查詢使用,這些查詢,包括地理位置的(geo), numeric_range, 和 script 默認是沒有緩存結果的.
最後一個過濾器的類型是過濾器之間的組合, and, not 和 or ,這些過濾器是沒有緩存結果的,由於它們主要是操做內聯的過濾器,因此不須要過濾.
全部的過濾器都容許設置 _cache 元素來顯式的控制緩存與否. 而且容許設置一個 _cache_key 用來看成緩存的主鍵. 這個在過濾大集合的狀況下很是有用 (如包含不少元素的 terms filter).
api
text 類型的查詢, 能夠用於處理各類文本. 例如:
{
"text" : {
"message" : "this is a test"
}
}
注意, 雖然他的名字叫text, 但能夠用它來精確匹配 (相似於 term) 數字和日期.
其中, message 是字段的名稱, 你能夠用你實際使用的字段名來替換 (包括 _all).
Text Queries的類型
boolean
默認的 text 查詢是 boolean 型的. 意思就是說提供的文本會被分析構建爲一個布爾型查詢. operator 標誌可使用 or 或者 and 來組合布爾子句 (默認爲 or).
analyzer 用於設定在分析過程當中哪個分析器會用於處理這段文本. 它會使用mapping中定義的分析器, 若是沒有定義則會使用索引的默認分析器.
fuzziness can be set to a value (depending on the relevant type, for string types it should be a value between 0.0 and 1.0) to constructs fuzzy queries for each term analyzed. The prefix_length and max_expansions can be set in this case to control the fuzzy process.
下面這個例子使用了額外的參數 (注意例子中的結構變化, message 是字段的名稱):
{
"text" : {
"message" : {
"query" : "this is a test",
"operator" : "and"
}
}
}
phrase
text_phrase 查詢會分析文本而且建立一個 phrase 查詢. 例如:
{
"text_phrase" : {
"message" : "this is a test"
}
}
既然 text_phrase 只是 text 查詢的一個 種類 , 你也可使用下面的方式:
{
"text" : {
"message" : {
"query" : "this is a test",
"type" : "phrase"
}
}
}
A phrase query maintains order of the terms up to a configurable slop (which defaults to 0).
The analyzer can be set to control which analyzer will perform the analysis process on the text. It default to the field explicit mapping definition, or the default search analyzer, for example:
{
"text_phrase" : {
"message" : {
"query" : "this is a test",
"analyzer" : "my_analyzer"
}
}
}
text_phrase_prefix
The text_phrase_prefix is the same as text_phrase, expect it allows for prefix matches on the last term in the text. For example:
{
"text_phrase_prefix" : {
"message" : "this is a test"
}
}
Or:
{
"text" : {
"message" : {
"query" : "this is a test",
"type" : "phrase_prefix"
}
}
}
It accepts the same parameters as the phrase type. In addition, it also accepts a max_expansions parameter that can control to how many prefixes the last term will be expanded. It is highly recommended to set it to an acceptable value to control the execution time of the query. For example:
{
"text_phrase_prefix" : {
"message" : {
"query" : "this is a test",
"max_expansions" : 10
}
}
}
Comparison to query_string / field
The text family of queries does not go through a 「query parsing」 process. It does not support field name prefixes, wildcard characters, or other 「advance」 features. For this reason, chances of it failing are very small / non existent, and it provides an excellent behavior when it comes to just analyze and run that text as a query behavior (which is usually what a text search box does). Also, the phrase_prefix can provide a great 「as you type」 behavior to automatically load search results.緩存
一個由其餘類型查詢組合而成的文檔匹配查詢, 對應Lucene的 BooleanQuery. 它能夠由一個或者多個查詢語句構成, 每種語句都有它們的匹配條件. 可能的匹配條件以下:
Occur Description
must 匹配的文檔必須知足該查詢語句.
should 匹配的文檔能夠知足該查詢語句. 若是一個布爾查詢(Bool Query)不包含 must 查詢語句, 那麼匹配的文檔必須知足其中一個或多個 should 查詢語句, 可使用 minimum_number_should_match 參數來設定最低知足的數量.
must_not 匹配的文檔必須不知足該查詢語句. 注意, 不能只用一個 must_not 查詢語句來搜索文檔.
布爾查詢(Bool Query)也支持 disable_coord 參數 (默認爲 false).
{
"bool" : {
"must" : {
"term" : { "user" : "kimchy" }
},
"must_not" : {
"range" : {
"age" : { "from" : 10, "to" : 20 }
}
},
"should" : [
{
"term" : { "tag" : "wow" }
},
{
"term" : { "tag" : "elasticsearch" }
}
],
"minimum_number_should_match" : 1,
"boost" : 1.0
}
}
app
The boosting query can be used to effectively demote results that match a given query. Unlike the 「NOT」 clause in bool query, this still selects documents that contain undesirable terms, but reduces their overall score.
{
"boosting" : {
"positive" : {
"term" : {
"field1" : "value1"
}
},
"negative" : {
"term" : {
"field2" : "value2"
}
},
"negative_boost" : 0.2
}
}
elasticsearch
Filters documents that only have the provided ids. Note, this filter does not require the _id field to be indexed since it works using the _uid field.
{
"ids" : {
"type" : "my_type"
"values" : ["1", "4", "100"]
}
}
The type is optional and can be omitted, and can also accept an array of values.
ide
custom_score 查詢能夠包含其餘種類的查詢而且自定義評分標準, 可使用 腳本表達式 來根據文檔查詢結果中(數值型)的值計算評分, 下面是一個簡單的例子:
"custom_score" : {
"query" : {
....
},
"script" : "_score * doc['my_numeric_field'].value"
}
除了使用文檔結果字段和腳本表達式外, 還可使用 _score 參數來獲取其所含查詢的評分.
腳本參數
腳本會被緩存下來用以加快執行速度. 若是腳本中有參數須要代入使用的話, 推薦的方法是使用同一個腳本,而後傳入參數:
"custom_score" : {
"query" : {
....
},
"params" : {
"param1" : 2,
"param2" : 3.1
}
"script" : "_score * doc['my_numeric_field'].value / pow(param1, param2)"
}
ui
A query that wraps a filter or another query and simply returns a constant score equal to the query boost for every document in the filter. Maps to Lucene ConstantScoreQuery.
{
"constant_score" : {
"filter" : {
"term" : { "user" : "kimchy"}
},
"boost" : 1.2
}
}
The filter object can hold only filter elements, not queries. Filters can be much faster compared to queries since they don’t perform any scoring, especially when they are cached.
A query can also be wrapped in a constant_score query:
{
"constant_score" : {
"query" : {
"term" : { "user" : "kimchy"}
},
"boost" : 1.2
}
}
this
A query that generates the union of documents produced by its subqueries, and that scores each document with the maximum score for that document as produced by any subquery, plus a tie breaking increment for any additional matching subqueries.
This is useful when searching for a word in multiple fields with different boost factors (so that the fields cannot be combined equivalently into a single search field). We want the primary score to be the one associated with the highest boost, not the sum of the field scores (as Boolean Query would give). If the query is 「albino elephant」 this ensures that 「albino」 matching one field and 「elephant」 matching another gets a higher score than 「albino」 matching both fields. To get this result, use both Boolean Query and DisjunctionMax Query: for each term a DisjunctionMaxQuery searches for it in each field, while the set of these DisjunctionMaxQuery’s is combined into a BooleanQuery.
The tie breaker capability allows results that include the same term in multiple fields to be judged better than results that include this term in only the best of those multiple fields, without confusing this with the better case of two different terms in the multiple fields.The default tie_breaker is 0.0.
This query maps to Lucene DisjunctionMaxQuery.
{
"dis_max" : {
"tie_breaker" : 0.7,
"boost" : 1.2,
"queries" : [
{
"term" : { "age" : 34 }
},
{
"term" : { "age" : 35 }
}
]
}
}
idea
A query that executes a query string against a specific field. It is a simplified version of query_string query (by setting the default_field to the field this query executed against). In its simplest form:
{
"field" : {
"name.first" : "+something -else"
}
}
Most of the query_string parameters are allowed with the field query as well, in such a case, the query should be formatted as follows:
{
"field" : {
"name.first" : {
"query" : "+something -else",
"boost" : 2.0,
"enable_position_increments": false
}
}
}
spa
對應於Lucene裏面的 FilteredQuery ,能夠在一個查詢的結果上應用一個過濾操做.
{
"filtered" : {
"query" : {
"term" : { "tag" : "wow" }
},
"filter" : {
"range" : {
"age" : { "from" : 10, "to" : 20 }
}
}
}
}
該DSL裏面的 filter 對象只能使用 filter 元素, 而不能是query類型. 過濾(Filters) 要比查詢快不少,由於和查詢相比它們不須要執行打分過程, 尤爲是當設置緩存過濾結果以後.
Fuzzy like this query find documents that are 「like」 provided text by running it against one or more fields.
{
"fuzzy_like_this" : {
"fields" : ["name.first", "name.last"],
"like_text" : "text like this one",
"max_query_terms" : 12
}
}
fuzzy_like_this can be shortened to flt.
The fuzzy_like_this top level parameters include:
Parameter Description
fields A list of the fields to run the more like this query against. Defaults to the _all field.
like_text The text to find documents like it, required.
ignore_tf Should term frequency be ignored. Defaults to false.
max_query_terms The maximum number of query terms that will be included in any generated query. Defaults to 25.
min_similarity The minimum similarity of the term variants. Defaults to 0.5.
prefix_length Length of required common prefix on variant terms. Defaults to 0.
boost Sets the boost value of the query. Defaults to 1.0.
analyzer The analyzer that will be used to analyze the text. Defaults to the analyzer associated with the field.
How it Works
Fuzzifies ALL terms provided as strings and then picks the best n differentiating terms. In effect this mixes the behaviour of FuzzyQuery and MoreLikeThis but with special consideration of fuzzy scoring factors. This generally produces good results for queries where users may provide details in a number offields and have no knowledge of boolean query syntax and also want a degree of fuzzy matching and a fast query.
For each source term the fuzzy variants are held in a BooleanQuery with no coord factor (because we are not looking for matches on multiple variants in any one doc). Additionally, a specialized TermQuery is used for variants and does not use that variant term’s IDF because this would favour rarer terms eg misspellings. Instead, all variants use the same IDF ranking (the one for the source query term) and this is factored into the variant’s boost. If the source query term does not exist in the index the average IDF of the variants is used.
The fuzzy_like_this_field query is the same as the fuzzy_like_this query, except that it runs against a single field. It provides nicer query DSL over the generic fuzzy_like_this query, and support typed fields query (automatically wraps typed fields with type filter to match only on the specific type).
{
"fuzzy_like_this_field" : {
"name.first" : {
"like_text" : "text like this one",
"max_query_terms" : 12
}
}
}
fuzzy_like_this_field can be shortened to flt_field.
The fuzzy_like_this_field top level parameters include:
Parameter Description
like_text The text to find documents like it, required.
ignore_tf Should term frequency be ignored. Defaults to false.
max_query_terms The maximum number of query terms that will be included in any generated query. Defaults to 25.
min_similarity The minimum similarity of the term variants. Defaults to 0.5.
prefix_length Length of required common prefix on variant terms. Defaults to 0.
boost Sets the boost value of the query. Defaults to 1.0.
analyzer The analyzer that will be used to analyze the text. Defaults to the analyzer associated with the field.
A fuzzy based query that uses similarity based on Levenshtein (edit distance) algorithm.
Warning: this query is not very scalable with its default prefix length of 0 – in this case, every term will be enumerated and cause an edit score calculation or max_expansions is not set.
Here is a simple example:
{
"fuzzy" : { "user" : "ki" }
}
More complex settings can be set (the values here are the default values):
{
"fuzzy" : {
"user" : {
"value" : "ki",
"boost" : 1.0,
"min_similarity" : 0.5,
"prefix_length" : 0
}
}
}
The max_expansions parameter (unbounded by default) controls the number of terms the fuzzy query will expand to.
Numeric / Date Fuzzy
fuzzy query on a numeric field will result in a range query 「around」 the value using the min_similarity value. For example:
{
"fuzzy" : {
"price" : {
"value" : 12,
"min_similarity" : 2
}
}
}
Will result in a range query between 10 and 14. Same applies to dates, with support for time format for the min_similarity field:
{
"fuzzy" : {
"created" : {
"value" : "2010-02-05T12:05:07",
"min_similarity" : "1d"
}
}
}
In the mapping, numeric and date types now allow to configure a fuzzy_factor mapping value (defaults to 1), which will be used to multiply the fuzzy value by it when used in a query_string type query. For example, for dates, a fuzzy factor of 「1d」 will result in multiplying whatever fuzzy value provided in the min_similarity by it. Note, this is explicitly supported since query_string query only allowed for similarity valued between 0.0 and 1.0.
has_child 查詢僅僅是將一個 has_child 過濾器包含進了一個 constant_score 中. 它的語法跟 has_child filter 是同樣的:
{
"has_child" : {
"type" : "blog_tag"
"query" : {
"term" : {
"tag" : "something"
}
}
}
}
Scope
A _scope can be defined on the filter allowing to run facets on the same scope name that will work against the child documents. For example:
{
"has_child" : {
"_scope" : "my_scope",
"type" : "blog_tag"
"query" : {
"term" : {
"tag" : "something"
}
}
}
}
內存考量
目前的實現方式是, 全部 _id 的值都會被載入了內存(堆)以便於更快的查找, 因此請確認有足夠的內存來存放它們.
A query that matches all documents. Maps to Lucene MatchAllDocsQuery.
{
"match_all" : { }
}
Which can also have boost associated with it:
{
"match_all" : { "boost" : 1.2 }
}
Index Time Boost
When indexing, a boost value can either be associated on the document level, or per field. The match all query does not take boosting into account by default. In order to take boosting into account, the norms_field needs to be provided in order to explicitly specify which field the boosting will be done on (Note, this will result in slower execution time). For example:
{
"match_all" : { "norms_field" : "my_field" }
}
More like this query find documents that are 「like」 provided text by running it against one or more fields.
{
"more_like_this" : {
"fields" : ["name.first", "name.last"],
"like_text" : "text like this one",
"min_term_freq" : 1,
"max_query_terms" : 12
}
}
more_like_this can be shortened to mlt.
The more_like_this top level parameters include:
Parameter Description
fields A list of the fields to run the more like this query against. Defaults to the _all field.
like_text The text to find documents like it, required.
percent_terms_to_match The percentage of terms to match on (float value). Defaults to 0.3 (30 percent).
min_term_freq The frequency below which terms will be ignored in the source doc. The default frequency is 2.
max_query_terms The maximum number of query terms that will be included in any generated query. Defaults to 25.
stop_words An array of stop words. Any word in this set is considered 「uninteresting」 and ignored. Even if your Analyzer allows stopwords, you might want to tell the MoreLikeThis code to ignore them, as for the purposes of document similarity it seems reasonable to assume that 「a stop word is never interesting」.
min_doc_freq The frequency at which words will be ignored which do not occur in at least this many docs. Defaults to 5.
max_doc_freq The maximum frequency in which words may still appear. Words that appear in more than this many docs will be ignored. Defaults to unbounded.
min_word_len The minimum word length below which words will be ignored. Defaults to 0.
max_word_len The maximum word length above which words will be ignored. Defaults to unbounded (0).
boost_terms Sets the boost factor to use when boosting terms. Defaults to 1.
boost Sets the boost value of the query. Defaults to 1.0.
analyzer The analyzer that will be used to analyze the text. Defaults to the analyzer associated with the field.
The more_like_this_field query is the same as the more_like_this query, except it runs against a single field. It provides nicer query DSL over the generic more_like_this query, and support typed fields query (automatically wraps typed fields with type filter to match only on the specific type).
{
"more_like_this_field" : {
"name.first" : {
"like_text" : "text like this one",
"min_term_freq" : 1,
"max_query_terms" : 12
}
}
}
more_like_this_field can be shortened to mlt_field.
The more_like_this_field top level parameters include:
Parameter Description
like_text The text to find documents like it, required.
percent_terms_to_match The percentage of terms to match on (float value). Defaults to 0.3 (30 percent).
min_term_freq The frequency below which terms will be ignored in the source doc. The default frequency is 2.
max_query_terms The maximum number of query terms that will be included in any generated query. Defaults to 25.
stop_words An array of stop words. Any word in this set is considered 「uninteresting」 and ignored. Even if your Analyzer allows stopwords, you might want to tell the MoreLikeThis code to ignore them, as for the purposes of document similarity it seems reasonable to assume that 「a stop word is never interesting」.
min_doc_freq The frequency at which words will be ignored which do not occur in at least this many docs. Defaults to 5.
max_doc_freq The maximum frequency in which words may still appear. Words that appear in more than this many docs will be ignored. Defaults to unbounded.
min_word_len The minimum word length below which words will be ignored. Defaults to 0.
max_word_len The maximum word length above which words will be ignored. Defaults to unbounded (0).
boost_terms Sets the boost factor to use when boosting terms. Defaults to 1.
boost Sets the boost value of the query. Defaults to 1.0.
analyzer The analyzer that will be used to analyze the text. Defaults to the analyzer associated with the field.
Matches documents that have fields containing terms with a specified prefix (not analyzed). The prefix query maps to Lucene PrefixQuery. The following matches documents where the user field contains a term that starts with ki:
{
"prefix" : { "user" : "ki" }
}
A boost can also be associated with the query:
{
"prefix" : { "user" : { "value" : "ki", "boost" : 2.0 } }
}
Or :
{
"prefix" : { "user" : { "prefix" : "ki", "boost" : 2.0 } }
}
This multi term query allows to control how it gets rewritten using the rewrite parameter.
A query that uses a query parser in order to parse its content. Here is an example:
{
"query_string" : {
"default_field" : "content",
"query" : "this AND that OR thus"
}
}
The query_string top level parameters include:
Parameter Description
query The actual query to be parsed.
default_field The default field for query terms if no prefix field is specified. Defaults to the _all field.
default_operator The default operator used if no explicit operator is specified. For example, with a default operator of OR, the query capital of Hungary is translated to capital OR of OR Hungary, and with default operator of AND, the same query is translated to capital AND of AND Hungary. The default value is OR.
analyzer The analyzer name used to analyze the query string.
allow_leading_wildcard When set, * or ? are allowed as the first character. Defaults to true.
lowercase_expanded_terms Whether terms of wildcard, prefix, fuzzy, and range queries are to be automatically lower-cased or not (since they are not analyzed). Default it true.
enable_position_increments Set to true to enable position increments in result queries. Defaults to true.
fuzzy_prefix_length Set the prefix length for fuzzy queries. Default is 0.
fuzzy_min_sim Set the minimum similarity for fuzzy queries. Defaults to 0.5
phrase_slop Sets the default slop for phrases. If zero, then exact phrase matches are required. Default value is 0.
boost Sets the boost value of the query. Defaults to 1.0.
analyze_wildcard By default, wildcards terms in a query string are not analyzed. By setting this value to true, a best effort will be made to analyze those as well.
auto_generate_phrase_queries Default to false.
minimum_should_match A percent value (for example 20%) controlling how many 「should」 clauses in the resulting boolean query should match.
When a multi term query is being generated, one can control how it gets rewritten using the rewrite parameter.
Multi Field
The query_string query can also run against multiple fields. The idea of running the query_string query against multiple fields is by internally creating several queries for the same query string, each with default_field that match the fields provided. Since several queries are generated, combining them can be automatically done either using a dis_max query or a simple bool query. For example (the name is boosted by 5 using ^5 notation):
{
"query_string" : {
"fields" : ["content", "name^5"],
"query" : "this AND that OR thus",
"use_dis_max" : true
}
}
When running the query_string query against multiple fields, the following additional parameters are allowed:
Parameter Description
use_dis_max Should the queries be combined using dis_max (set it to true), or a bool query (set it to false). Defaults to true.
tie_breaker When using dis_max, the disjunction max tie breaker. Defaults to 0.
The fields parameter can also include pattern based field names, allowing to automatically expand to the relevant fields (dynamically introduced fields included). For example:
{
"query_string" : {
"fields" : ["content", "name.*^5"],
"query" : "this AND that OR thus",
"use_dis_max" : true
}
}
Syntax Extension
There are several syntax extensions to the Lucene query language.
missing / exists
The _exists_ and _missing_ syntax allows to control docs that have fields that exists within them (have a value) and missing. The syntax is: _exists_:field1, _missing_:field and can be used anywhere a query string is used.
Matches documents with fields that have terms within a certain range. The type of the Lucene query depends on the field type, for string fields, the TermRangeQuery, while for number/date fields, the query is a NumericRangeQuery. The following example returns all documents where age is between 10 and 20:
{
"range" : {
"age" : {
"from" : 10,
"to" : 20,
"include_lower" : true,
"include_upper": false,
"boost" : 2.0
}
}
}
The range query top level parameters include:
Name Description
from The lower bound. Defaults to start from the first.
to The upper bound. Defaults to unbounded.
include_lower Should the first from (if set) be inclusive or not. Defaults to true
include_upper Should the last to (if set) be inclusive or not. Defaults to true.
gt Same as setting from to the value, and include_lower to false.
gte Same as setting from to the value,and include_lower to true.
lt Same as setting to to the value, and include_upper to false.
lte Same as setting to to the value, and include_upper to true.
boost Sets the boost value of the query. Defaults to 1.0.
Matches spans near the beginning of a field. The span first query maps to Lucene SpanFirstQuery. Here is an example:
{
"span_first" : {
"match" : {
"span_term" : { "user" : "kimchy" }
},
"end" : 3
}
}
The match clause can be any other span type query. The end controls the maximum end position permitted in a match.
Matches spans which are near one another. One can specify slop, the maximum number of intervening unmatched positions, as well as whether matches are required to be in-order. The span near query maps to Lucene SpanNearQuery. Here is an example:
{
"span_near" : {
"clauses" : [
{ "span_term" : { "field" : "value1" } },
{ "span_term" : { "field" : "value2" } },
{ "span_term" : { "field" : "value3" } }
],
"slop" : 12,
"in_order" : false,
"collect_payloads" : false
}
}
The clauses element is a list of one or more other span type queries and the slop controls the maximum number of intervening unmatched positions permitted.
Removes matches which overlap with another span query. The span not query maps to Lucene SpanNotQuery. Here is an example:
{
"span_not" : {
"include" : {
"span_term" : { "field1" : "value1" }
},
"exclude" : {
"span_term" : { "field2" : "value2" }
}
}
}
The include and exclude clauses can be any span type query. The include clause is the span query whose matches are filtered, and the exclude clause is the span query whose matches must not overlap those returned.
Matches the union of its span clauses. The span or query maps to Lucene SpanOrQuery. Here is an example:
{
"span_or" : {
"clauses" : [
{ "span_term" : { "field" : "value1" } },
{ "span_term" : { "field" : "value2" } },
{ "span_term" : { "field" : "value3" } }
]
}
}
The clauses element is a list of one or more other span type queries.
Matches spans containing a term. The span term query maps to Lucene SpanTermQuery. Here is an example:
{
"span_term" : { "user" : "kimchy" }
}
A boost can also be associated with the query:
{
"span_term" : { "user" : { "value" : "kimchy", "boost" : 2.0 } }
}
Or :
{
"span_term" : { "user" : { "term" : "kimchy", "boost" : 2.0 } }
}
The top_children query runs the child query with an estimated hits size, and out of the hit docs, aggregates it into parent docs. If there aren’t enough parent docs matching the requested from/size search request, then it is run again with a wider (more hits) search.
The top_children also provide scoring capabilities, with the ability to specify max, sum or avg as the score type.
One downside of using the top_children is that if there are more child docs matching the required hits when executing the child query, then the total_hits result of the search response will be incorrect.
How many hits are asked for in the first child query run is controlled using the factor parameter (defaults to 5). For example, when asking for 10 docs with from 0, then the child query will execute with 50 hits expected. If not enough parents are found (in our example, 10), and there are still more child docs to query, then the search hits are expanded my multiplying by the incremental_factor (defaults to 2).
The required parameters are the query and type (the child type to execute the query on). Here is an example with all different parameters, including the default values:
{
"top_children" : {
"type": "blog_tag",
"query" : {
"term" : {
"tag" : "something"
}
}
"score" : "max",
"factor" : 5,
"incremental_factor" : 2
}
}
Scope
A _scope can be defined on the query allowing to run facets on the same scope name that will work against the child documents. For example:
{
"top_children" : {
"_scope" : "my_scope",
"type": "blog_tag",
"query" : {
"term" : {
"tag" : "something"
}
}
}
}
Memory Considerations
With the current implementation, all _id values are loaded to memory (heap) in order to support fast lookups, so make sure there is enough mem for it.
Matches documents that have fields matching a wildcard expression (not analyzed). Supported wildcards are *, which matches any character sequence (including the empty one), and ?, which matches any single character. Note this query can be slow, as it needs to iterate over many terms. In order to prevent extremely slow wildcard queries, a wildcard term should not start with one of the wildcards * or ?. The wildcard query maps to Lucene WildcardQuery.
{
"wildcard" : { "user" : "ki*y" }
}
A boost can also be associated with the query:
{
"wildcard" : { "user" : { "value" : "ki*y", "boost" : 2.0 } }
}
Or :
{
"wildcard" : { "user" : { "wildcard" : "ki*y", "boost" : 2.0 } }
}
This multi term query allows to control how it gets rewritten using the rewrite parameter.
Nested query allows to query nested objects / docs (see nested mapping). The query is executed against the nested objects / docs as if they were indexed as separate docs (they are, internally) and resulting in the root parent doc (or parent nested mapping). Here is a sample mapping we will work with:
{
"type1" : {
"properties" : {
"obj1" : {
"type" : "nested"
}
}
}
}
And here is a sample nested query usage:
{
"nested" : {
"path" : "obj1",
"score_mode" : "avg",
"query" : {
"bool" : {
"must" : [
{
"text" : {"obj1.name" : "blue"}
},
{
"range" : {"obj1.count" : {"gt" : 5}}
}
]
}
}
}
}
The query path points to the nested object path, and the query (or filter) includes the query that will run on the nested docs matching the direct path, and joining with the root parent docs.
The score_mode allows to set how inner children matching affects scoring of parent. It defaults to avg, but can be total, max and none.
Multi level nesting is automatically supported, and detected, resulting in an inner nested query to automatically match the relevant nesting level (and not root) if it exists within another nested query.
A custom_filters_score query allows to execute a query, and if the hit matches a provided filter (ordered), use either a boost or a script associated with it to compute the score. Here is an example:
{
"custom_filters_score" : {
"query" : {
"match_all" : {}
},
"filters" : [
{
"filter" : { "range" : { "age" : {"from" : 0, "to" : 10} } },
"boost" : "3"
},
{
"filter" : { "range" : { "age" : {"from" : 10, "to" : 20} } },
"boost" : "2"
}
]
}
}
This can considerably simplify and increase performance for parameterized based scoring since filters are easily cached for faster performance, and boosting / script is considerably simpler.
Score Mode
A score_mode can be defined to control how multiple matching filters control the score. By default, it is set to first which means the first matching filter will control the score of the result. It can also be set to max/total/avg which will aggregate the result from all matching filters based on the aggregation type.
Script
A script can be used instead of boost for more complex score calculations. With optional params and lang (on the same level as query and filters).
from: http://www.elasticsearch.cn/guide/reference/query-dsl/)