ES9-mapping參數

時間 2019-12-19

標籤 es9 mapping 參數简体版

原文原文鏈接

1.概述

ElasticSearch提供了豐富的參數對文檔字段進行定義，好比字段的分詞器、字段權重、日期格式、檢索模型等等。能夠查看官網每一個參數的定義及使用：https://www.elastic.co/guide/en/elasticsearch/reference/6.1/mapping-params.html。html

2.analyzer

分詞器對索引和查詢有效：https://www.elastic.co/guide/en/elasticsearch/reference/6.1/analyzer.htmlgit

咱們要測試分詞器參數使用首先要安裝分詞器組件，從https://github.com/medcl/elasticsearch-analysis-ik/releases下載和elasticsearch相匹配的組件版本，這裏下載elasticsearch-analysis-ik-6.2.3.zip文件，拷貝到elasticsearch安裝目錄的plugins文件夾下面，解壓，刪除zip文件，重啓elasticsearch（必定要重啓才生效）。github

定義索引：json

DELETE my_index

PUT my_index

使用ik_smart分詞session

GET my_index/_analyze
{
  "analyzer": "ik_smart",
  "text": "安徽省長江流域"
}

結果app

{
  "tokens": [
    {
      "token": "安徽省",
      "start_offset": 0,
      "end_offset": 3,
      "type": "CN_WORD",
      "position": 0
    },
    {
      "token": "長江流域",
      "start_offset": 3,
      "end_offset": 7,
      "type": "CN_WORD",
      "position": 1
    }
  ]
}

定義mapping，指定字段分詞器elasticsearch

PUT my_index/fulltext/_mapping
{
  "properties": {
    "content":{
      "type": "text",
      "analyzer": "ik_max_word",
      "search_analyzer": "ik_max_word"
    }
  }
}

添加文檔ide

PUT my_index/fulltext/1
{
  "content":"軟件測試是很是複雜的工做"
}

PUT my_index/fulltext/2
{
  "content":"發改委表示，上半年審覈批准固定資產項目102個"
}

PUT my_index/fulltext/3
{
  "content":"全球最大資產管理公司貝萊德成立區塊鏈研究組"
}

PUT my_index/fulltext/4
{
  "content":"資本投資瘋狂，工業產能過剩"
}

經過關鍵字查詢區塊鏈

GET my_index/fulltext/_search
{
  "query": {
    "match": {
      "content": "資產"
    }
  }
}

查詢結果測試

{
  "took": 11,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.5897495,
    "hits": [
      {
        "_index": "my_index",
        "_type": "fulltext",
        "_id": "2",
        "_score": 0.5897495,
        "_source": {
          "content": "發改委表示，上半年審覈批准固定資產項目102個"
        }
      },
      {
        "_index": "my_index",
        "_type": "fulltext",
        "_id": "3",
        "_score": 0.2876821,
        "_source": {
          "content": "全球最大資產管理公司貝萊德成立區塊鏈研究組"
        }
      }
    ]
  }
}

3.normalizer

normalizer用於解析前的標準化配置，好比把全部的字符轉化爲小寫等。

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/normalizer.html

定義映射

DELETE my_index

PUT my_index
{
  "settings": {
    "analysis": {
      "normalizer": {
        "my_normalizer": {
          "type": "custom",
          "char_filter": [],
          "filter": ["lowercase", "asciifolding"]
        }
      }
    }
  },
  "mappings": {
    "my_type": {
      "properties": {
        "foo": {
          "type": "keyword",
          "normalizer": "my_normalizer"
        }
      }
    }
  }
}

索引文檔

PUT my_index/my_type/1
{
  "foo": "BÀR"
}

PUT my_index/my_type/2
{
  "foo": "bar"
}

PUT my_index/my_type/3
{
  "foo": "baz"
}

POST my_index/_refresh

GET my_index/_search
{
  "query": {
    "match": {
      "foo": "BAR"
    }
  }
}

因爲設置foo字段索引時會進行標準化，保存是「BAR」會被轉化爲「bar」進行保存，在搜索時也會將搜索條件中的「BAR」轉化爲「bar」進行匹配。

{
  "took": 7,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "2",
        "_score": 0.2876821,
        "_source": {
          "foo": "bar"
        }
      },
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "foo": "BÀR"
        }
      }
    ]
  }
}

經過查詢能夠統計字段「foo」被反向索引個數

GET my_index/_search
{
  "size": 0,
  "aggs": {
    "foo_terms": {
      "terms": {
        "field": "foo"
      }
    }
  }
}

能夠看到"bar"被索引2個，"baz"被索引1個

{
  "took": 14,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "foo_terms": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "bar",
          "doc_count": 2
        },
        {
          "key": "baz",
          "doc_count": 1
        }
      ]
    }
  }
}

4.boost

能夠經過指定一個boost值來控制每一個查詢子句的相對權重，該值默認爲1。一個大於1的boost會增長該查詢子句的相對權重。

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/mapping-boost.html#mapping-boost

DELETE my_index

PUT my_index

PUT my_index/my_type/1
{
  "title":"quick brown fox"
}

GET my_index/_search
{
    "query": {
        "match" : {
            "title": {
                "query": "quick brown fox",
                "boost":2
            }
        }
    }
}

設定權重2，默認1

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1.7260926,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 1.7260926,
        "_source": {
          "title": "quick brown fox"
        }
      }
    ]
  }
}

5.coerce

數據並不老是乾淨的，在json中有些熟悉的值的類型不必定就是該數據格式定義的類型，例如json中一個字符串類型"5"表示的意思有可能就是數字類型5。coerce默認爲true，elasticsearch會自動將"5"轉化爲5保存。

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/coerce.html#coerce

建立索引，定義文檔結構：該文檔中包含兩個字段，都是integer類型，一個關閉coerce

DELETE my_index

PUT my_index
{
  "mappings": {
    "my_type":{
      "properties": {
        "number_one":{
          "type": "integer"
        },
        "number_tow":{
          "type": "integer",
          "coerce":false
        }
      }
    }
  }
}

保存數據

PUT my_index/my_type/1
{
  "number_one":"5"
}

PUT my_index/my_type/2
{
  "number_tow":"5"
}

第一個保存成功，第二個保存失敗

{
  "error": {
    "root_cause": [
      {
        "type": "mapper_parsing_exception",
        "reason": "failed to parse [number_tow]"
      }
    ],
    "type": "mapper_parsing_exception",
    "reason": "failed to parse [number_tow]",
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "Integer value passed as String"
    }
  },
  "status": 400
}

6.copy_to

copy_to屬性用於配置自定義的_all字段。換言之，就是多個字段能夠合併成一個超級字段。好比，first_name和last_name能夠合併爲full_name字段。

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/copy-to.html

建立索引，定義文檔結構，包含三個字段"first_name"、"last_name"、"full_name"，將first_name和last_name的值賦給full_name。

DELETE my_index

PUT my_index
{
  "mappings": {
    "my_type":{
      "properties": {
        "first_name":{
          "type": "text",
          "copy_to": "full_name"
        },
        "last_name":{
          "type": "text",
          "copy_to": "full_name"
        },
        "full_name":{
          "type": "text"
        }
      }
    }
  }
}

保存數據

PUT my_index/my_type/1
{
  "first_name":"John",
  "last_name":"Smith"
}

GET my_index/my_type/_search
{
  "query": {
    "match": {
      "full_name": "John Smith"
    }
  }
}

查詢時能夠經過first_name對應的值，或者last_name對應的值也能夠經過full_name查詢同時對應first_name或者last_name。

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 0.5753642,
        "_source": {
          "first_name": "John",
          "last_name": "Smith"
        }
      }
    ]
  }
}

7.doc_values

doc_values是爲了加快排序、聚合操做，在創建倒排索引的時候，額外增長一個列式存儲映射，是一個空間換時間的作法。默認是開啓的，對於肯定不須要聚合或者排序的字段能夠關閉。

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/doc-values.html#doc-values

DELETE my_index

PUT my_index
{
  "mappings": {
    "my_type":{
      "properties": {
        "status_code":{
          "type": "keyword"
        },
        "session_id":{
          "type": "keyword",
          "doc_values":false
        }
      }
    }
  }
}

8.dynamic

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/dynamic.html

屬性用於檢測新發現的字段，有三個取值:

true：新發型的字段添加到映射中（默認）。

false：新檢測的字段被忽略，必須顯示添加新字段。

strict：若是檢測到新字段就會觸發異常，並拒絕保存。

定義索引

DELETE my_index

PUT my_index
{
  "mappings": {
    "my_type": {
      "dynamic":"strict",
      "properties": {
        "title":{
          "type": "text"
        }
      }
    }
  }
}

保存文檔數據

PUT my_index/my_type/2
{
  "title":"this is a test",
  "content":"上半年上海市貨幣信貸運行平穩 我的住房貸款增速回落"
}

由於content字段沒有在mapping中定義，且設置dynamic爲strict。保存是異常

{
  "error": {
    "root_cause": [
      {
        "type": "strict_dynamic_mapping_exception",
        "reason": "mapping set to strict, dynamic introduction of [content] within [my_type] is not allowed"
      }
    ],
    "type": "strict_dynamic_mapping_exception",
    "reason": "mapping set to strict, dynamic introduction of [content] within [my_type] is not allowed"
  },
  "status": 400
}

9.enabled

ELasticseaech默認會索引全部的字段，enabled設爲false的字段，es會跳過字段內容，該字段只能從_source中獲取，可是不可搜。

以下建立索引，插入數據

DELETE my_index

PUT my_index
{
  "mappings": {
    "my_type":{
      "properties": {
        "name":{
          "enabled":false
        }
      }
    }
  }
}

PUT my_index/my_type/1
{
  "name":"sean",
  "title":"this is a test"
}

搜索name

GET /my_index/_search
{
  "query": {
    "match": {
      "name": "sean"
    }
  }
}

由於name字段設置enabled爲false，因此不能做爲條件搜索

{
  "took": 8,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。