ElasticSearch5.x實踐_day05_04_Mapping參數

時間 2019-12-12

標籤 elasticsearch5.x elasticsearch 實踐 day05 day mapping 參數欄目日誌分析简体版

原文原文鏈接

3、Mapping參數

3.1 analyzer

指定分詞器(分析器更合理)，對索引和查詢都有效。以下，指定ik分詞的配置：html

PUT http://192.168.20.46:9200/my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "content": {
          "type": "text",
          "analyzer": "ik_smart",
          "search_analyzer": "ik_max_word"
        }
      }
    }
  }
}

POST http://192.168.20.46:9200/my_index/my_type/1
{
	"content":"我是中國人，我愛個人祖國"
}

POST http://192.168.20.46:9200/my_index/_search?pretty
{
	"query":{
		"match":{
			"content":"祖國"
		}
	}
}

3.2 normalizer

normalizer用於解析前的標準化配置，好比把全部的字符轉化爲小寫等。例子：node

POST http://node1:9200/my_index
{
	"settings":{
		"analysis":{
			"normalizer":{
				"my_normalizer":{
					"type":"custom",
					"char_filter":[],
					"filter":[
						"lowercase",
						"asciifolding"
					]
				}
			}
		}
	},
	"mappings":{
		"my_data":{
			"properties":{
				"foo":{
					"type":"keyword",
					"normalizer":"my_normalizer"
				}
			}
		}
	}
}

POST http://node1:9200/my_index/my_data/1
{
	"foo":"Zhangsan"
}

POST http://node1:9200/my_index/_search
{
	"query":{
		"match":{
			"foo":"ZHANGSAN"
		}
	}
}

具體解釋：http://www.javashuo.com/article/p-elsqybdx-ew.htmljson

3.3 boost

boost字段用於設置字段的權重，好比，關鍵字出如今title字段的權重是出如今content字段中權重的2倍，設置mapping以下，其中content字段的默認權重是1.數組

POST http://node1:9200/my_index
{
	"mappings":{
		"my_type":{
			"properties":{
				"title":{
					"type":"text",
					"boost":2
				},
				"content":{
					"type":"text"
				}
			}
		}
	}
}

一樣，在查詢時指定權重也是同樣的：session

POST http://node1:9200/my_index/_search
{
    "query": {
        "match" : {
            "title": {
                "query": "quick brown fox",
                "boost": 2
            }
        }
    }
}

推薦在查詢時指定boost，第一中在mapping中寫死，若是不從新索引文檔，權重沒法修改，使用查詢能夠實現一樣的效果。數據結構

3.4 coerce

coerce屬性用於清除髒數據，coerce的默認值是true。整型數字5有可能會被寫成字符串「5」或者浮點數5.0.coerce屬性能夠用來清除髒數據：app

字符串會被強制轉換爲整數
浮點數被強制轉換爲整數

POST http://node1:9200/my_index
{
	"mappings":{
		"my_data":{
			"properties":{
				"number_one":{
					"type": "integer"
				},
				"number_two":{
					"type":"integer",
					"coerce":false
				}
			}
		}
	}
}

POST http://node1:9200/my_index/my_data/1
{
	"number_one":"10"
}

POST http://node1:9200/my_index/my_data/2
{
	"number_two":"10"
}

mapping中指定number_one字段是integer類型，雖然插入的數據類型是String，但依然能夠插入成功。number_two字段關閉了coerce，所以插入失敗。elasticsearch

3.5 copy_to

copy_to屬性用於配置自定義的_all字段。換言之，就是多個字段能夠合併成一個超級字段。好比，first_name和last_name能夠合併爲full_name字段。ide

POST http://node1:9200/my_index
{
	"mappings":{
		"my_data":{
			"properties":{
				"first_name":{
					"type":"text",
					"copy_to":"full_name"
				},
				"last_name":{
					"type":"text",
					"copy_to":"full_name"
				},
				"full_name":{
					"type":"text"
				}
			}
		}
	}
}
POST http://node1:9200/my_index/my_data/1
{
  "first_name": "John",
  "last_name": "Smith"
}
POST http://node1:9200/my_index/_search
{
	"query":{
		"match":{
			"full_name":{
				"query":"John Smith",
				"operator":"and"
			}
		}
	}
}

3.6 doc_values

doc_values是爲了加快排序、聚合操做，在創建倒排索引的時候，額外增長一個列式存儲映射，是一個空間換時間的作法。默認是開啓的，對於肯定不須要聚合或者排序的字段能夠關閉。ui

POST http://node1:9200/my_index
{
	"mappings":{
		"my_type":{
			"properties":{
				"status_code":{
					"type":"keyword"
				},
				"session_id":{
					"type":"keyword",
					"doc_values":false
				}
			}
		}
	}
}

注:text類型不支持doc_values。

3.7 dynamic

dynamic屬性用於檢測新發現的字段，有三個取值：

true:新發現的字段添加到映射中。（默認）
flase:新檢測的字段被忽略。必須顯式添加新字段。
strict:若是檢測到新字段，就會引起異常並拒絕文檔。

例子：

POST http://node1:9200/my_index
{
	"mappings":{
		"my_data":{
			"dynamic":false,
			"properties":{
				"user":{
					"properties":{
						"name":{
							"type":"text"
						},
						"social_networks":{
							"dynamic":true,
							"properties":{}
						}
					}
				}
			}
		}
	}
}

PS：取值爲strict，非布爾值要加引號。

3.8 enabled

ELasticseaech默認會索引全部的字段，enabled設爲false的字段，es會跳過字段內容，該字段只能從_source中獲取，可是不可搜。並且字段能夠是任意類型。

POST http://node1:9200/my_index
{
  "user_id": "kimchy",
  "session_data": { 
    "arbitrary_object": {
      "some_array": [ "foo", "bar", { "baz": 2 } ]
    }
  },
  "last_updated": "2015-12-06T18:20:22"
}

POST http://node1:9200/my_index/session/session_1
{
  "user_id": "kimchy",
  "session_data": { 
    "arbitrary_object": {
      "some_array": [ "foo", "bar", { "baz": 2 } ]
    }
  },
  "last_updated": "2015-12-06T18:20:22"
}

POST http://node1:9200/my_index/session/session_2
{
  "user_id": "jpountz",
  "session_data": "none", 
  "last_updated": "2015-12-06T18:22:13"
}

3.9 fielddata

搜索要解決的問題是「包含查詢關鍵詞的文檔有哪些？」，聚合偏偏相反，聚合要解決的問題是「文檔包含哪些詞項」，大多數字段再索引時生成doc_values，可是text字段不支持doc_values。

取而代之，text字段在查詢時會生成一個fielddata的數據結構，fielddata在字段首次被聚合、排序、或者使用腳本的時候生成。ELasticsearch經過讀取磁盤上的倒排記錄表從新生成文檔詞項關係，最後在Java堆內存中排序。

text字段的fielddata屬性默認是關閉的，開啓fielddata很是消耗內存。在你開啓text字段之前，想清楚爲何要在text類型的字段上作聚合、排序操做。大多數狀況下這麼作是沒有意義的。

「New York」會被分析成「new」和「york」，在text類型上聚合會分紅「new」和「york」2個桶，也許你須要的是一個「New York」。這是能夠加一個不分析的keyword字段：

POST http://node1:9200/my_index
{
	"mappings":{
		"my_type":{
			"properties":{
				"my_field":{
					"type":"text",
					"fields":{
						"keyword":{
							"type":"keyword"
						}
					}
				}
			}
		}
	}
}

上面的mapping中實現了經過my_field字段作全文搜索，my_field.keyword作聚合、排序和使用腳本。

3.10 format

format屬性主要用於格式化日期：

POST http://node1:9200/my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "date": {
          "type":   "date",
          "format": "yyyy-MM-dd"
        }
      }
    }
  }
}

3.11 ignore_above

ignore_above用於指定字段索引和存儲的長度最大值，超過最大值的會被忽略：

PUT http://node1:9200/my_index
{
	"mappings":{
		"my_type":{
			"properties":{
				"message":{
					"type":"keyword",
					"ignore_above":15
				}
			}
		}
	}
}

POST http://node1:9200/my_index/my_type/1
{
  "message": "Syntax error"
}


POST http://node1:9200/my_index/my_type/2
{
  "message": "Syntax error with some long stacktrace"
}
POST http://node1:9200/my_index/_search
{
  "size": 0, 
  "aggs": {
    "messages": {
      "terms": {
        "field": "message"
      }
    }
  }
}

mapping中指定了ignore_above字段的最大長度爲15，第一個文檔的字段長小於15，所以索引成功，第二個超過15，所以不索引，返回結果只有」Syntax error」,結果以下：

{
    "took": 50,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 2,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "message": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "Syntax error",
                    "doc_count": 1
                }
            ]
        }
    }
}

3.12 ignore_malformed

ignore_malformed能夠忽略不規則數據，對於login字段，有人可能填寫的是date類型，也有人填寫的是郵件格式。給一個字段索引不合適的數據類型發生異常，致使整個文檔索引失敗。若是ignore_malformed參數設爲true，異常會被忽略，出異常的字段不會被索引，其它字段正常索引。

POST http://node1:9200/my_index
{
	"mappings":{
		"my_type":{
			"properties":{
				"number_one":{
					"type":"integer",
					"ignore_malformed":true
				},
				"number_two":{
					"type":"integer"
				}
			}
		}
	}
}

POST http://node1:9200/my_index/my_type/1
{
  "text":       "Some text value",
  "number_one": "foo" 
}
POST http://node1:9200/my_index/my_type/2
{
  "text":       "Some text value",
  "number_one": 123 
}

POST http://node1:9200/my_index/my_type/3  --> error
{
  "text":       "Some text value",
  "number_two": "abc" 
}


GET http://node1:9200/my_index/_search

{
    "took": 21,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 2,
        "max_score": 1,
        "hits": [
            {
                "_index": "my_index",
                "_type": "my_type",
                "_id": "2",
                "_score": 1,
                "_source": {
                    "text": "Some text value",
                    "number_one": 123
                }
            },
            {
                "_index": "my_index",
                "_type": "my_type",
                "_id": "1",
                "_score": 1,
                "_source": {
                    "text": "Some text value",
                    "number_one": "foo"
                }
            }
        ]
    }
}

上面的例子中number_one接受integer類型，ignore_malformed屬性設爲true，所以文檔一種number_one字段雖然是字符串但依然能寫入成功,而且索引成功；number_two接受integer類型，默認ignore_malformed屬性爲false，所以寫入失敗。

3.13 include_in_all

include_in_all屬性用於指定字段是否包含在_all字段裏面，默認開啓，除索引時index屬性爲no。
例子以下，title和content字段包含在_all字段裏，date不包含。

POST http://node1:9200/my_index
{
	"mappings":{
		"my_type":{
			"properties":{
				"title":{
					"type":"text"
				},
				"content":{
					"type":"text"
				},
				"date":{
					"type":"text",
					"include_in_all":false
				}
			}
		}
	}
}

include_in_all也可用於字段級別，以下my_type下的全部字段都排除在_all字段以外，author.first_name 和author.last_name 包含在in _all中：

POST http://node1:9200/my_index
{
	"mappings":{
		"my_type":{
			"include_in_all":false,
			"properties":{
				"title":{"type":"text"},
				"author":{
					"include_in_all":true,
					"properties":{
						"first_name":{"type":"text"},
						"last_name":{"type":"text"}
					}
				},
				"editor":{
					"properties":{
						"first_name":{"type":"text"},
						"last_name":{"type":"text","include_in_all":true}
					}
				}
			}
		}
	}
}

3.14 index

index屬性指定字段是否索引，不索引也就不可搜索，取值能夠爲true或者false。

3.15 index_options

index_options控制索引時存儲哪些信息到倒排索引中，接受如下配置：

參數	做用
docs	只存儲文檔編號
freqs	存儲文檔編號和詞項頻率
positions	文檔編號、詞項頻率、詞項的位置被存儲，偏移位置可用於臨近搜索和短語查詢
offsets	文檔編號、詞項頻率、詞項的位置、詞項開始和結束的字符位置都被存儲，offsets設爲true會使用Postings highlighter

3.16 fields

fields可讓同一文本有多種不一樣的索引方式，好比一個String類型的字段，可使用text類型作全文檢索，使用keyword類型作聚合和排序。

POST http://node1:9200/my_index
{
	"mappings":{
		"my_type":{
			"properties":{
				"city":{
					"type":"text",
					"fields":{
						"raw":{
							"type":"keyword"
						}
					}
				}
			}
		}
	}
}

POST http://node1:9200/my_index/my_type/1
{
	"city":"New York"
}

POST http://node1:9200/my_index/my_type/2
{
	"city":"York"
}

POST http://node1:9200/my_index/_search
{
	"query":{
		"match":{
			"city":"york"
		}
	},
	"sort":{
		"city.raw":"asc"
	},
	"aggs":{
		"cities":{
			"terms":{
				"field":"city.raw"
			}
		}
	}
}

{
    "took": 141,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 2,
        "max_score": null,
        "hits": [
            {
                "_index": "my_index",
                "_type": "my_type",
                "_id": "1",
                "_score": null,
                "_source": {
                    "city": "New York"
                },
                "sort": [
                    "New York"
                ]
            },
            {
                "_index": "my_index",
                "_type": "my_type",
                "_id": "2",
                "_score": null,
                "_source": {
                    "city": "York"
                },
                "sort": [
                    "York"
                ]
            }
        ]
    },
    "aggregations": {
        "cities": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "New York",
                    "doc_count": 1
                },
                {
                    "key": "York",
                    "doc_count": 1
                }
            ]
        }
    }
}

3.17 norms

norms參數用於標準化文檔，以便查詢時計算文檔的相關性。norms雖然對評分有用，可是會消耗較多的磁盤空間，若是不須要對某個字段進行評分，最好不要開啓norms。

3.18 null_value

值爲null的字段不索引也不能夠搜索，null_value參數可讓值爲null的字段顯式的可索引、可搜索。例子：

PUT http://node1:9200/my_index
{
	"mappings":{
		"my_type":{
			"properties":{
				"status_code":{
					"type":"keyword",
					"null_value":"NULL"
				}
			}
		}
	}
}

POST http://node1:9200/my_index/my_type/1
{
	"status_code":null
}

POST http://node1:9200/my_index/my_type/2
{
	"status_code":[]
}
POST http://node1:9200/my_index/_search
{
	"query":{
		"term":{
			"status_code":"NULL"
		}
	}
}

文檔1能夠被搜索到，由於status_code的值爲null，文檔2不能夠被搜索到，由於status_code爲空數組，可是不是null。

3.19 position_increment_gap

爲了支持近似或者短語查詢，text字段被解析的時候會考慮此項的位置信息。舉例，一個字段的值爲數組類型：

"names": [ "John Abraham", "Lincoln Smith"]

爲了區別第一個字段和第二個字段，Abraham和Lincoln在索引中有一個間距，默認是100。例子以下，這是查詢」Abraham Lincoln」是查不到的：

POST http://node1:9200/my_index/groups/1
{
	"names": [ "John Abraham", "Lincoln Smith"]
}
//查詢不到
POST http://node1:9200/my_index/groups/_search
{
	"query":{
		"match_phrase":{
			"names":{
				 "query": "Abraham Lincoln"
			}
		}
	}
}

指定間距大於100能夠查詢到：

//查詢獲得
POST http://node1:9200/my_index/groups/_search
{
	"query":{
		"match_phrase":{
			"names":{
				 "query": "Abraham Lincoln" ,
				 "slop":101
			}
		}
	}
}

在mapping中經過position_increment_gap參數指定間距：

PUT http://node1:9200/my_index
{
	"mappings":{
		"groups":{
			"properties":{
				"names":{
					"type":"text",
					"position_increment_gap":0
				}
			}
		}
	}
}

POST http://node1:9200/my_index/groups/1
{
	"names": [ "John Abraham", "Lincoln Smith"]
}

http://node1:9200/my_index/groups/_search
{
	"query":{
		"match_phrase":{
			"names":{
				 "query": "Abraham Lincoln" 
			}
		}
	}
}

能夠查到數據

3.20 properties

Object或者nested類型，下面還有嵌套類型，能夠經過properties參數指定。

PUT http://node1:9200/my_index
{
	"mappings":{
		"my_type":{
			"properties":{
				"manager":{
					"properties":{
						"age":{"type":"integer"},
						"name":{"type":"text"}
					}
				},
				"employee":{
					"type":"nested",
					"properties":{
						"age":{"type":"integer"},
						"name":{"type":"text"}
					}
				}
			}
		}
	}
}
POST http://node1:9200/my_index/my_type/1
{
  "region": "US",
  "manager": {
    "name": "Alice White",
    "age": 30
  },
  "employees": [
    {
      "name": "John Smith",
      "age": 34
    },
    {
      "name": "Peter Brown",
      "age": 26
    }
  ]
}

能夠對manager.name、manager.age作搜索、聚合等操做。（未驗證經過回頭看）

POST http://node1:9200/my_index/_search
{
  "query": {
    "match": {
      "manager.name": "Alice White" 
    }
  },
  "aggs": {
    "Employees": {
      "nested": {
        "path": "employees"
      },
      "aggs": {
        "Employee Ages": {
          "histogram": {
            "field": "employees.age", 
            "interval": 5
          }
        }
      }
    }
  }
}

3.21 search_analyzer

大多數狀況下索引和搜索的時候應該指定相同的分析器，確保query解析之後和索引中的詞項一致。可是有時候也須要指定不一樣的分析器，例如使用edge_ngram過濾器實現自動補全。

默認狀況下查詢會使用analyzer屬性指定的分析器，但也能夠被search_analyzer覆蓋。例子：

PUT http://node1:9200/my_index
{
	"settings":{
		"analysis":{
			"filter":{
				"autocomplete_filter":{
					"type":"edge_ngram",
					"min_gram":1,
					"max_gram":20
				}
			},
			"analyzer":{
				"autocomplete":{
					"type":"custom",
					"tokenizer":"standard",
					"filter":[
						"lowercase",
						"autocomplete_filter"
						]
				}
			}
		}
	},
	"mappings":{
		"my_type":{
			"properties":{
				"text":{
					"type":"text",
					"analyzer":"autocomplete",
					"search_analyzer":"standard"
				}
			}
		}
	}
}

POST http://node1:9200/my_index/my_type/1
{
  "text": "Quick Brown Fox" 
}
POST http://node1:9200/my_index/_search
{
  "query": {
    "match": {
      "text": {
        "query": "Quick Br", 
        "operator": "and"
      }
    }
  }
}

3.22 similarity

similarity參數用於指定文檔評分模型，參數有三個：

BM25 ：ES和Lucene默認的評分模型
classic ：TF/IDF評分
boolean：布爾模型評分
例子：

POST http://node1:9200/my_index
{
	"mappings":{
		"my_type":{
			"properties":{
				"default_field":{
					"type":"text"
				},
				"classic_field":{
					"type":"text",
					"similarity":"classic"
				},
				"boolean_sim_field":{
					"type":"text",
					"similarity":"boolean"
				}
			}
		}
	}
}

default_field自動使用BM25評分模型，classic_field使用TF/IDF經典評分模型，boolean_sim_field使用布爾評分模型。

3.23 store

默認狀況下，自動是被索引的也能夠搜索，可是不存儲，這也不要緊，由於_source字段裏面保存了一份原始文檔。在某些狀況下，store參數有意義，好比一個文檔裏面有title、date和超大的content字段，若是隻想獲取title和date，能夠這樣：

PUT http://node1:9200/my_index
{
	"mappings":{
		"my_type":{
			"properties":{
				"title":{
					"type":"text",
					"store":true
				},
				"date":{
					"type":"date",
					"store":true
				},
				"content":{
					"type":"text"
				}
			}
		}
	}
}
POST http://node1:9200/my_index/my_type/1
{
  "title":   "Some short title",
  "date":    "2015-01-01",
  "content": "A very long content field..."
}
POST http://node1:9200/my_index/_search
{
  "stored_fields": [ "title", "date"] 
}

{
    "took": 12,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 1,
        "max_score": 1,
        "hits": [
            {
                "_index": "my_index",
                "_type": "my_type",
                "_id": "1",
                "_score": 1,
                "fields": {
                    "date": [
                        "2015-01-01T00:00:00.000Z"
                    ],
                    "title": [
                        "Some short title"
                    ]
                }
            }
        ]
    }
}

Stored fields返回的老是數組，若是想返回原始字段，仍是要從_source中取。

3.24 term_vector

詞向量包含了文本被解析之後的如下信息：

詞項集合
詞項位置
詞項的起始字符映射到原始文檔中的位置。

term_vector參數有如下取值：

參數取值	含義
no	默認值，不存儲詞向量
yes	只存儲詞項集合
with_positions	存儲詞項和詞項位置
with_offsets	詞項和字符偏移位置
with_positions_offsets	存儲詞項、詞項位置、字符偏移位置

例子：

PUT http://node1:9200/my_index
{
	"mappings":{
		"my_type":{
			"properties":{
				"text":{
					"type":"text",
					"term_vector":"with_positions_offsets"
				}
			}
		}
	}
}

POST http://node1:9200/my_index/my_type/1
{
  "text": "Quick brown fox"
}

POST http://node1:9200/my_index/_search
{
  "query": {
    "match": {
      "text": "brown fox"
    }
  },
  "highlight": {
    "fields": {
      "text": {} 
    }
  }
}

{
    "took": 89,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 1,
        "max_score": 0.5063205,
        "hits": [
            {
                "_index": "my_index",
                "_type": "my_type",
                "_id": "1",
                "_score": 0.5063205,
                "_source": {
                    "text": "Quick brown fox"
                },
                "highlight": {
                    "text": [
                        "Quick <em>brown</em> <em>fox</em>"
                    ]
                }
            }
        ]
    }
}

相關標籤/搜索

elasticsearch5+logstash

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。