ElasticSearch提供了豐富的參數對文檔字段進行定義,好比字段的分詞器、字段權重、日期格式、檢索模型等等。能夠查看官網每一個參數的定義及使用:https://www.elastic.co/guide/en/elasticsearch/reference/6.1/mapping-params.html。html
分詞器對索引和查詢有效:https://www.elastic.co/guide/en/elasticsearch/reference/6.1/analyzer.htmlgit
咱們要測試分詞器參數使用首先要安裝分詞器組件,從https://github.com/medcl/elasticsearch-analysis-ik/releases下載和elasticsearch相匹配的組件版本,這裏下載elasticsearch-analysis-ik-6.2.3.zip文件,拷貝到elasticsearch安裝目錄的plugins文件夾下面,解壓,刪除zip文件,重啓elasticsearch(必定要重啓才生效)。github
定義索引:json
DELETE my_index PUT my_index
使用ik_smart分詞session
GET my_index/_analyze { "analyzer": "ik_smart", "text": "安徽省長江流域" }
結果app
{ "tokens": [ { "token": "安徽省", "start_offset": 0, "end_offset": 3, "type": "CN_WORD", "position": 0 }, { "token": "長江流域", "start_offset": 3, "end_offset": 7, "type": "CN_WORD", "position": 1 } ] }
定義mapping,指定字段分詞器elasticsearch
PUT my_index/fulltext/_mapping { "properties": { "content":{ "type": "text", "analyzer": "ik_max_word", "search_analyzer": "ik_max_word" } } }
添加文檔ide
PUT my_index/fulltext/1 { "content":"軟件測試是很是複雜的工做" } PUT my_index/fulltext/2 { "content":"發改委表示,上半年審覈批准固定資產項目102個" } PUT my_index/fulltext/3 { "content":"全球最大資產管理公司貝萊德成立區塊鏈研究組" } PUT my_index/fulltext/4 { "content":"資本投資瘋狂,工業產能過剩" }
經過關鍵字查詢區塊鏈
GET my_index/fulltext/_search { "query": { "match": { "content": "資產" } } }
查詢結果測試
{ "took": 11, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 2, "max_score": 0.5897495, "hits": [ { "_index": "my_index", "_type": "fulltext", "_id": "2", "_score": 0.5897495, "_source": { "content": "發改委表示,上半年審覈批准固定資產項目102個" } }, { "_index": "my_index", "_type": "fulltext", "_id": "3", "_score": 0.2876821, "_source": { "content": "全球最大資產管理公司貝萊德成立區塊鏈研究組" } } ] } }
normalizer用於解析前的標準化配置,好比把全部的字符轉化爲小寫等。
https://www.elastic.co/guide/en/elasticsearch/reference/6.1/normalizer.html
定義映射
DELETE my_index PUT my_index { "settings": { "analysis": { "normalizer": { "my_normalizer": { "type": "custom", "char_filter": [], "filter": ["lowercase", "asciifolding"] } } } }, "mappings": { "my_type": { "properties": { "foo": { "type": "keyword", "normalizer": "my_normalizer" } } } } }
索引文檔
PUT my_index/my_type/1 { "foo": "BÀR" } PUT my_index/my_type/2 { "foo": "bar" } PUT my_index/my_type/3 { "foo": "baz" } POST my_index/_refresh
GET my_index/_search { "query": { "match": { "foo": "BAR" } } }
因爲設置foo字段索引時會進行標準化,保存是「BAR」會被轉化爲「bar」進行保存,在搜索時也會將搜索條件中的「BAR」轉化爲「bar」進行匹配。
{ "took": 7, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 2, "max_score": 0.2876821, "hits": [ { "_index": "my_index", "_type": "my_type", "_id": "2", "_score": 0.2876821, "_source": { "foo": "bar" } }, { "_index": "my_index", "_type": "my_type", "_id": "1", "_score": 0.2876821, "_source": { "foo": "BÀR" } } ] } }
經過查詢能夠統計字段「foo」被反向索引個數
GET my_index/_search { "size": 0, "aggs": { "foo_terms": { "terms": { "field": "foo" } } } }
能夠看到"bar"被索引2個,"baz"被索引1個
{ "took": 14, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 3, "max_score": 0, "hits": [] }, "aggregations": { "foo_terms": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "bar", "doc_count": 2 }, { "key": "baz", "doc_count": 1 } ] } } }
能夠經過指定一個boost值來控制每一個查詢子句的相對權重,該值默認爲1。一個大於1的boost會增長該查詢子句的相對權重。
https://www.elastic.co/guide/en/elasticsearch/reference/6.1/mapping-boost.html#mapping-boost
DELETE my_index PUT my_index PUT my_index/my_type/1 { "title":"quick brown fox" } GET my_index/_search { "query": { "match" : { "title": { "query": "quick brown fox", "boost":2 } } } }
設定權重2,默認1
{ "took": 5, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 1, "max_score": 1.7260926, "hits": [ { "_index": "my_index", "_type": "my_type", "_id": "1", "_score": 1.7260926, "_source": { "title": "quick brown fox" } } ] } }
數據並不老是乾淨的,在json中有些熟悉的值的類型不必定就是該數據格式定義的類型,例如json中一個字符串類型"5"表示的意思有可能就是數字類型5。coerce默認爲true,elasticsearch會自動將"5"轉化爲5保存。
https://www.elastic.co/guide/en/elasticsearch/reference/6.1/coerce.html#coerce
建立索引,定義文檔結構:該文檔中包含兩個字段,都是integer類型,一個關閉coerce
DELETE my_index PUT my_index { "mappings": { "my_type":{ "properties": { "number_one":{ "type": "integer" }, "number_tow":{ "type": "integer", "coerce":false } } } } }
保存數據
PUT my_index/my_type/1 { "number_one":"5" } PUT my_index/my_type/2 { "number_tow":"5" }
第一個保存成功,第二個保存失敗
{ "error": { "root_cause": [ { "type": "mapper_parsing_exception", "reason": "failed to parse [number_tow]" } ], "type": "mapper_parsing_exception", "reason": "failed to parse [number_tow]", "caused_by": { "type": "illegal_argument_exception", "reason": "Integer value passed as String" } }, "status": 400 }
copy_to屬性用於配置自定義的_all字段。換言之,就是多個字段能夠合併成一個超級字段。好比,first_name和last_name能夠合併爲full_name字段。
https://www.elastic.co/guide/en/elasticsearch/reference/6.1/copy-to.html
建立索引,定義文檔結構,包含三個字段"first_name"、"last_name"、"full_name",將first_name和last_name的值 賦給full_name。
DELETE my_index PUT my_index { "mappings": { "my_type":{ "properties": { "first_name":{ "type": "text", "copy_to": "full_name" }, "last_name":{ "type": "text", "copy_to": "full_name" }, "full_name":{ "type": "text" } } } } }
保存數據
PUT my_index/my_type/1 { "first_name":"John", "last_name":"Smith" } GET my_index/my_type/_search { "query": { "match": { "full_name": "John Smith" } } }
查詢時能夠經過first_name對應的值,或者last_name對應的值也能夠經過full_name查詢同時對應first_name或者last_name。
{ "took": 4, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 1, "max_score": 0.5753642, "hits": [ { "_index": "my_index", "_type": "my_type", "_id": "1", "_score": 0.5753642, "_source": { "first_name": "John", "last_name": "Smith" } } ] } }
doc_values是爲了加快排序、聚合操做,在創建倒排索引的時候,額外增長一個列式存儲映射,是一個空間換時間的作法。默認是開啓的,對於肯定不須要聚合或者排序的字段能夠關閉。
https://www.elastic.co/guide/en/elasticsearch/reference/6.1/doc-values.html#doc-values
DELETE my_index PUT my_index { "mappings": { "my_type":{ "properties": { "status_code":{ "type": "keyword" }, "session_id":{ "type": "keyword", "doc_values":false } } } } }
https://www.elastic.co/guide/en/elasticsearch/reference/6.1/dynamic.html
屬性用於檢測新發現的字段,有三個取值:
true:新發型的字段添加到映射中(默認)。
false:新檢測的字段被忽略,必須顯示添加新字段。
strict:若是檢測到新字段就會觸發異常,並拒絕保存。
定義索引
DELETE my_index PUT my_index { "mappings": { "my_type": { "dynamic":"strict", "properties": { "title":{ "type": "text" } } } } }
保存文檔數據
PUT my_index/my_type/2 { "title":"this is a test", "content":"上半年上海市貨幣信貸運行平穩 我的住房貸款增速回落" }
由於content字段沒有在mapping中定義,且設置dynamic爲strict。保存是異常
{ "error": { "root_cause": [ { "type": "strict_dynamic_mapping_exception", "reason": "mapping set to strict, dynamic introduction of [content] within [my_type] is not allowed" } ], "type": "strict_dynamic_mapping_exception", "reason": "mapping set to strict, dynamic introduction of [content] within [my_type] is not allowed" }, "status": 400 }
ELasticseaech默認會索引全部的字段,enabled設爲false的字段,es會跳過字段內容,該字段只能從_source中獲取,可是不可搜。
以下建立索引,插入數據
DELETE my_index PUT my_index { "mappings": { "my_type":{ "properties": { "name":{ "enabled":false } } } } } PUT my_index/my_type/1 { "name":"sean", "title":"this is a test" }
搜索name
GET /my_index/_search { "query": { "match": { "name": "sean" } } }
由於name字段設置enabled爲false,因此不能做爲條件搜索
{ "took": 8, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 0, "max_score": null, "hits": [] } }