指定分詞器(分析器更合理),對索引和查詢都有效。以下,指定ik分詞的配置:html
PUT http://192.168.20.46:9200/my_index { "mappings": { "my_type": { "properties": { "content": { "type": "text", "analyzer": "ik_smart", "search_analyzer": "ik_max_word" } } } } } POST http://192.168.20.46:9200/my_index/my_type/1 { "content":"我是中國人,我愛個人祖國" } POST http://192.168.20.46:9200/my_index/_search?pretty { "query":{ "match":{ "content":"祖國" } } }
normalizer用於解析前的標準化配置,好比把全部的字符轉化爲小寫等。例子:node
POST http://node1:9200/my_index { "settings":{ "analysis":{ "normalizer":{ "my_normalizer":{ "type":"custom", "char_filter":[], "filter":[ "lowercase", "asciifolding" ] } } } }, "mappings":{ "my_data":{ "properties":{ "foo":{ "type":"keyword", "normalizer":"my_normalizer" } } } } } POST http://node1:9200/my_index/my_data/1 { "foo":"Zhangsan" } POST http://node1:9200/my_index/_search { "query":{ "match":{ "foo":"ZHANGSAN" } } }
具體解釋:http://www.javashuo.com/article/p-elsqybdx-ew.htmljson
boost字段用於設置字段的權重,好比,關鍵字出如今title字段的權重是出如今content字段中權重的2倍,設置mapping以下,其中content字段的默認權重是1.數組
POST http://node1:9200/my_index { "mappings":{ "my_type":{ "properties":{ "title":{ "type":"text", "boost":2 }, "content":{ "type":"text" } } } } }
一樣,在查詢時指定權重也是同樣的:session
POST http://node1:9200/my_index/_search { "query": { "match" : { "title": { "query": "quick brown fox", "boost": 2 } } } }
推薦在查詢時指定boost,第一中在mapping中寫死,若是不從新索引文檔,權重沒法修改,使用查詢能夠實現一樣的效果。數據結構
coerce屬性用於清除髒數據,coerce的默認值是true。整型數字5有可能會被寫成字符串「5」或者浮點數5.0.coerce屬性能夠用來清除髒數據:app
POST http://node1:9200/my_index { "mappings":{ "my_data":{ "properties":{ "number_one":{ "type": "integer" }, "number_two":{ "type":"integer", "coerce":false } } } } } POST http://node1:9200/my_index/my_data/1 { "number_one":"10" } POST http://node1:9200/my_index/my_data/2 { "number_two":"10" }
mapping中指定number_one字段是integer類型,雖然插入的數據類型是String,但依然能夠插入成功。number_two字段關閉了coerce,所以插入失敗。elasticsearch
copy_to屬性用於配置自定義的_all字段。換言之,就是多個字段能夠合併成一個超級字段。好比,first_name和last_name能夠合併爲full_name字段。ide
POST http://node1:9200/my_index { "mappings":{ "my_data":{ "properties":{ "first_name":{ "type":"text", "copy_to":"full_name" }, "last_name":{ "type":"text", "copy_to":"full_name" }, "full_name":{ "type":"text" } } } } } POST http://node1:9200/my_index/my_data/1 { "first_name": "John", "last_name": "Smith" } POST http://node1:9200/my_index/_search { "query":{ "match":{ "full_name":{ "query":"John Smith", "operator":"and" } } } }
doc_values是爲了加快排序、聚合操做,在創建倒排索引的時候,額外增長一個列式存儲映射,是一個空間換時間的作法。默認是開啓的,對於肯定不須要聚合或者排序的字段能夠關閉。ui
POST http://node1:9200/my_index { "mappings":{ "my_type":{ "properties":{ "status_code":{ "type":"keyword" }, "session_id":{ "type":"keyword", "doc_values":false } } } } }
注:text類型不支持doc_values。
dynamic屬性用於檢測新發現的字段,有三個取值:
例子:
POST http://node1:9200/my_index { "mappings":{ "my_data":{ "dynamic":false, "properties":{ "user":{ "properties":{ "name":{ "type":"text" }, "social_networks":{ "dynamic":true, "properties":{} } } } } } } }
PS:取值爲strict,非布爾值要加引號。
ELasticseaech默認會索引全部的字段,enabled設爲false的字段,es會跳過字段內容,該字段只能從_source中獲取,可是不可搜。並且字段能夠是任意類型。
POST http://node1:9200/my_index { "user_id": "kimchy", "session_data": { "arbitrary_object": { "some_array": [ "foo", "bar", { "baz": 2 } ] } }, "last_updated": "2015-12-06T18:20:22" } POST http://node1:9200/my_index/session/session_1 { "user_id": "kimchy", "session_data": { "arbitrary_object": { "some_array": [ "foo", "bar", { "baz": 2 } ] } }, "last_updated": "2015-12-06T18:20:22" } POST http://node1:9200/my_index/session/session_2 { "user_id": "jpountz", "session_data": "none", "last_updated": "2015-12-06T18:22:13" }
搜索要解決的問題是「包含查詢關鍵詞的文檔有哪些?」,聚合偏偏相反,聚合要解決的問題是「文檔包含哪些詞項」,大多數字段再索引時生成doc_values,可是text字段不支持doc_values。
取而代之,text字段在查詢時會生成一個fielddata的數據結構,fielddata在字段首次被聚合、排序、或者使用腳本的時候生成。ELasticsearch經過讀取磁盤上的倒排記錄表從新生成文檔詞項關係,最後在Java堆內存中排序。
text字段的fielddata屬性默認是關閉的,開啓fielddata很是消耗內存。在你開啓text字段之前,想清楚爲何要在text類型的字段上作聚合、排序操做。大多數狀況下這麼作是沒有意義的。
「New York」會被分析成「new」和「york」,在text類型上聚合會分紅「new」和「york」2個桶,也許你須要的是一個「New York」。這是能夠加一個不分析的keyword字段:
POST http://node1:9200/my_index { "mappings":{ "my_type":{ "properties":{ "my_field":{ "type":"text", "fields":{ "keyword":{ "type":"keyword" } } } } } } }
上面的mapping中實現了經過my_field字段作全文搜索,my_field.keyword作聚合、排序和使用腳本。
format屬性主要用於格式化日期:
POST http://node1:9200/my_index { "mappings": { "my_type": { "properties": { "date": { "type": "date", "format": "yyyy-MM-dd" } } } } }
更多內置的日期格式:https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-date-format.html
ignore_above用於指定字段索引和存儲的長度最大值,超過最大值的會被忽略:
PUT http://node1:9200/my_index { "mappings":{ "my_type":{ "properties":{ "message":{ "type":"keyword", "ignore_above":15 } } } } } POST http://node1:9200/my_index/my_type/1 { "message": "Syntax error" } POST http://node1:9200/my_index/my_type/2 { "message": "Syntax error with some long stacktrace" } POST http://node1:9200/my_index/_search { "size": 0, "aggs": { "messages": { "terms": { "field": "message" } } } }
mapping中指定了ignore_above字段的最大長度爲15,第一個文檔的字段長小於15,所以索引成功,第二個超過15,所以不索引,返回結果只有」Syntax error」,結果以下:
{ "took": 50, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 2, "max_score": 0, "hits": [] }, "aggregations": { "message": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "Syntax error", "doc_count": 1 } ] } } }
ignore_malformed能夠忽略不規則數據,對於login字段,有人可能填寫的是date類型,也有人填寫的是郵件格式。給一個字段索引不合適的數據類型發生異常,致使整個文檔索引失敗。若是ignore_malformed參數設爲true,異常會被忽略,出異常的字段不會被索引,其它字段正常索引。
POST http://node1:9200/my_index { "mappings":{ "my_type":{ "properties":{ "number_one":{ "type":"integer", "ignore_malformed":true }, "number_two":{ "type":"integer" } } } } } POST http://node1:9200/my_index/my_type/1 { "text": "Some text value", "number_one": "foo" } POST http://node1:9200/my_index/my_type/2 { "text": "Some text value", "number_one": 123 } POST http://node1:9200/my_index/my_type/3 --> error { "text": "Some text value", "number_two": "abc" } GET http://node1:9200/my_index/_search { "took": 21, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 2, "max_score": 1, "hits": [ { "_index": "my_index", "_type": "my_type", "_id": "2", "_score": 1, "_source": { "text": "Some text value", "number_one": 123 } }, { "_index": "my_index", "_type": "my_type", "_id": "1", "_score": 1, "_source": { "text": "Some text value", "number_one": "foo" } } ] } }
上面的例子中number_one接受integer類型,ignore_malformed屬性設爲true,所以文檔一種number_one字段雖然是字符串但依然能寫入成功,而且索引成功;number_two接受integer類型,默認ignore_malformed屬性爲false,所以寫入失敗。
include_in_all屬性用於指定字段是否包含在_all字段裏面,默認開啓,除索引時index屬性爲no。
例子以下,title和content字段包含在_all字段裏,date不包含。
POST http://node1:9200/my_index { "mappings":{ "my_type":{ "properties":{ "title":{ "type":"text" }, "content":{ "type":"text" }, "date":{ "type":"text", "include_in_all":false } } } } }
include_in_all也可用於字段級別,以下my_type下的全部字段都排除在_all字段以外,author.first_name 和author.last_name 包含在in _all中:
POST http://node1:9200/my_index { "mappings":{ "my_type":{ "include_in_all":false, "properties":{ "title":{"type":"text"}, "author":{ "include_in_all":true, "properties":{ "first_name":{"type":"text"}, "last_name":{"type":"text"} } }, "editor":{ "properties":{ "first_name":{"type":"text"}, "last_name":{"type":"text","include_in_all":true} } } } } } }
index屬性指定字段是否索引,不索引也就不可搜索,取值能夠爲true或者false。
index_options控制索引時存儲哪些信息到倒排索引中,接受如下配置:
參數 | 做用 |
---|---|
docs | 只存儲文檔編號 |
freqs | 存儲文檔編號和詞項頻率 |
positions | 文檔編號、詞項頻率、詞項的位置被存儲,偏移位置可用於臨近搜索和短語查詢 |
offsets | 文檔編號、詞項頻率、詞項的位置、詞項開始和結束的字符位置都被存儲,offsets設爲true會使用Postings highlighter |
fields可讓同一文本有多種不一樣的索引方式,好比一個String類型的字段,可使用text類型作全文檢索,使用keyword類型作聚合和排序。
POST http://node1:9200/my_index { "mappings":{ "my_type":{ "properties":{ "city":{ "type":"text", "fields":{ "raw":{ "type":"keyword" } } } } } } } POST http://node1:9200/my_index/my_type/1 { "city":"New York" } POST http://node1:9200/my_index/my_type/2 { "city":"York" } POST http://node1:9200/my_index/_search { "query":{ "match":{ "city":"york" } }, "sort":{ "city.raw":"asc" }, "aggs":{ "cities":{ "terms":{ "field":"city.raw" } } } } { "took": 141, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 2, "max_score": null, "hits": [ { "_index": "my_index", "_type": "my_type", "_id": "1", "_score": null, "_source": { "city": "New York" }, "sort": [ "New York" ] }, { "_index": "my_index", "_type": "my_type", "_id": "2", "_score": null, "_source": { "city": "York" }, "sort": [ "York" ] } ] }, "aggregations": { "cities": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "New York", "doc_count": 1 }, { "key": "York", "doc_count": 1 } ] } } }
norms參數用於標準化文檔,以便查詢時計算文檔的相關性。norms雖然對評分有用,可是會消耗較多的磁盤空間,若是不須要對某個字段進行評分,最好不要開啓norms。
值爲null的字段不索引也不能夠搜索,null_value參數可讓值爲null的字段顯式的可索引、可搜索。例子:
PUT http://node1:9200/my_index { "mappings":{ "my_type":{ "properties":{ "status_code":{ "type":"keyword", "null_value":"NULL" } } } } } POST http://node1:9200/my_index/my_type/1 { "status_code":null } POST http://node1:9200/my_index/my_type/2 { "status_code":[] } POST http://node1:9200/my_index/_search { "query":{ "term":{ "status_code":"NULL" } } }
文檔1能夠被搜索到,由於status_code的值爲null,文檔2不能夠被搜索到,由於status_code爲空數組,可是不是null。
爲了支持近似或者短語查詢,text字段被解析的時候會考慮此項的位置信息。舉例,一個字段的值爲數組類型:
"names": [ "John Abraham", "Lincoln Smith"]
爲了區別第一個字段和第二個字段,Abraham和Lincoln在索引中有一個間距,默認是100。例子以下,這是查詢」Abraham Lincoln」是查不到的:
POST http://node1:9200/my_index/groups/1 { "names": [ "John Abraham", "Lincoln Smith"] } //查詢不到 POST http://node1:9200/my_index/groups/_search { "query":{ "match_phrase":{ "names":{ "query": "Abraham Lincoln" } } } }
指定間距大於100能夠查詢到:
//查詢獲得 POST http://node1:9200/my_index/groups/_search { "query":{ "match_phrase":{ "names":{ "query": "Abraham Lincoln" , "slop":101 } } } }
在mapping中經過position_increment_gap參數指定間距:
PUT http://node1:9200/my_index { "mappings":{ "groups":{ "properties":{ "names":{ "type":"text", "position_increment_gap":0 } } } } } POST http://node1:9200/my_index/groups/1 { "names": [ "John Abraham", "Lincoln Smith"] } http://node1:9200/my_index/groups/_search { "query":{ "match_phrase":{ "names":{ "query": "Abraham Lincoln" } } } }
能夠查到數據
Object或者nested類型,下面還有嵌套類型,能夠經過properties參數指定。
PUT http://node1:9200/my_index { "mappings":{ "my_type":{ "properties":{ "manager":{ "properties":{ "age":{"type":"integer"}, "name":{"type":"text"} } }, "employee":{ "type":"nested", "properties":{ "age":{"type":"integer"}, "name":{"type":"text"} } } } } } } POST http://node1:9200/my_index/my_type/1 { "region": "US", "manager": { "name": "Alice White", "age": 30 }, "employees": [ { "name": "John Smith", "age": 34 }, { "name": "Peter Brown", "age": 26 } ] }
能夠對manager.name、manager.age作搜索、聚合等操做。(未驗證經過 回頭看)
POST http://node1:9200/my_index/_search { "query": { "match": { "manager.name": "Alice White" } }, "aggs": { "Employees": { "nested": { "path": "employees" }, "aggs": { "Employee Ages": { "histogram": { "field": "employees.age", "interval": 5 } } } } } }
大多數狀況下索引和搜索的時候應該指定相同的分析器,確保query解析之後和索引中的詞項一致。可是有時候也須要指定不一樣的分析器,例如使用edge_ngram過濾器實現自動補全。
默認狀況下查詢會使用analyzer屬性指定的分析器,但也能夠被search_analyzer覆蓋。例子:
PUT http://node1:9200/my_index { "settings":{ "analysis":{ "filter":{ "autocomplete_filter":{ "type":"edge_ngram", "min_gram":1, "max_gram":20 } }, "analyzer":{ "autocomplete":{ "type":"custom", "tokenizer":"standard", "filter":[ "lowercase", "autocomplete_filter" ] } } } }, "mappings":{ "my_type":{ "properties":{ "text":{ "type":"text", "analyzer":"autocomplete", "search_analyzer":"standard" } } } } } POST http://node1:9200/my_index/my_type/1 { "text": "Quick Brown Fox" } POST http://node1:9200/my_index/_search { "query": { "match": { "text": { "query": "Quick Br", "operator": "and" } } } }
similarity參數用於指定文檔評分模型,參數有三個:
POST http://node1:9200/my_index { "mappings":{ "my_type":{ "properties":{ "default_field":{ "type":"text" }, "classic_field":{ "type":"text", "similarity":"classic" }, "boolean_sim_field":{ "type":"text", "similarity":"boolean" } } } } }
default_field自動使用BM25評分模型,classic_field使用TF/IDF經典評分模型,boolean_sim_field使用布爾評分模型。
默認狀況下,自動是被索引的也能夠搜索,可是不存儲,這也不要緊,由於_source字段裏面保存了一份原始文檔。在某些狀況下,store參數有意義,好比一個文檔裏面有title、date和超大的content字段,若是隻想獲取title和date,能夠這樣:
PUT http://node1:9200/my_index { "mappings":{ "my_type":{ "properties":{ "title":{ "type":"text", "store":true }, "date":{ "type":"date", "store":true }, "content":{ "type":"text" } } } } } POST http://node1:9200/my_index/my_type/1 { "title": "Some short title", "date": "2015-01-01", "content": "A very long content field..." } POST http://node1:9200/my_index/_search { "stored_fields": [ "title", "date"] } { "took": 12, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 1, "hits": [ { "_index": "my_index", "_type": "my_type", "_id": "1", "_score": 1, "fields": { "date": [ "2015-01-01T00:00:00.000Z" ], "title": [ "Some short title" ] } } ] } }
Stored fields返回的老是數組,若是想返回原始字段,仍是要從_source中取。
詞向量包含了文本被解析之後的如下信息:
term_vector參數有如下取值:
參數取值 | 含義 |
---|---|
no | 默認值,不存儲詞向量 |
yes | 只存儲詞項集合 |
with_positions | 存儲詞項和詞項位置 |
with_offsets | 詞項和字符偏移位置 |
with_positions_offsets | 存儲詞項、詞項位置、字符偏移位置 |
例子:
PUT http://node1:9200/my_index { "mappings":{ "my_type":{ "properties":{ "text":{ "type":"text", "term_vector":"with_positions_offsets" } } } } } POST http://node1:9200/my_index/my_type/1 { "text": "Quick brown fox" } POST http://node1:9200/my_index/_search { "query": { "match": { "text": "brown fox" } }, "highlight": { "fields": { "text": {} } } } { "took": 89, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.5063205, "hits": [ { "_index": "my_index", "_type": "my_type", "_id": "1", "_score": 0.5063205, "_source": { "text": "Quick brown fox" }, "highlight": { "text": [ "Quick <em>brown</em> <em>fox</em>" ] } } ] } }