Elasticsearch由淺入深(十一)索引管理

索引的基本操做

  • 建立索引
    PUT /{index}
    {
      "settings": {},
      "mappings": {
        "properties": {
        }
      }
    }

    建立索引示例:html

    PUT /my_index
    {
      "settings": {
        "number_of_shards": 1,
        "number_of_replicas": 0
      },
      "mappings": {
        "my_type":{
          "properties": {
            "my_field":{
              "type": "text"
            }
          }
        }
      }
    }
  • 修改索引
    PUT /{index}/_settings
    {
        "setttings": {}
    }
    
    PUT /my_index/_settings
    {
      "settings": {
        "number_of_replicas": 1
      }
    }
  • 刪除索引
    DELETE /{index}

    示例java

    DELETE /my_index
    DELETE /index_one,index_two
    DELETE /index_*
    DELETE /_all

    刪除索引API也能夠經過使用逗號分隔列表應用於多個索引,或者經過使用_all或*做爲索引應用於全部索引(當心!)。
    要禁用容許經過通配符刪除索引,或者將 elasticsearch.yml 配置中的_all設置action.destructive_requires_name設置爲true。也能夠經過羣集更新設置api更改此設置。正則表達式

修改分詞器以及定義本身的分詞器

Elasticsearch附帶了各類內置分析器,無需進一步配置便可在任何索引中使用:算法

standard analyzer: 
所述standard分析器將文本分爲在字邊界條件,由Unicode的文本分割算法所定義的。它刪除了大多數標點符號,小寫術語,並支持刪除停用詞。
Simple analyzer:
該simple分析儀將文本分爲方面每當遇到一個字符是否是字母。而後所有變爲小寫
whitespace analyzer: 
whitespace只要遇到任何空格字符 ,分析器就會將文本劃分爲術語。它不會進行小寫轉換。
stop analyzer: 
該stop分析器是像simple,並且還支持去除中止詞。
keyword analyzer: 
所述keyword分析器是一個「空操做」分析器接受任何文本它被賦予並輸出徹底相同的文本做爲一個單一的術語,也就是不會分詞,進行精確匹配。
pattern analyzer: 
所述pattern分析器使用一個正則表達式對文本進行拆分。它支持小寫轉換和停用字。
language analyzer: 
Elasticsearch提供了許多特定於語言的分析器,如english或 french。
fingerprint analyzer: 
所述fingerprint分析器是一種專業的指紋分析器,它能夠建立一個指紋,用於重複檢測。

修改分詞器的設置

  • 啓動english停用詞token filter
    put /my_index
    {
      "settings": {
        "analysis": {
          "analyzer": {
            "es_std":{
              "type":"standard",
              "stopwords":"_english_"
            }
          }
        }
      }
    }
  • 測試分詞
    使用原來的standard分詞
    # standard分詞 
    GET /my_index/_analyze
    {
      "analyzer": "standard",
      "text": "a dog is in the house"
    }
    {
      "tokens": [
        {
          "token": "a",
          "start_offset": 0,
          "end_offset": 1,
          "type": "<ALPHANUM>",
          "position": 0
        },
        {
          "token": "dog",
          "start_offset": 2,
          "end_offset": 5,
          "type": "<ALPHANUM>",
          "position": 1
        },
        {
          "token": "is",
          "start_offset": 6,
          "end_offset": 8,
          "type": "<ALPHANUM>",
          "position": 2
        },
        {
          "token": "in",
          "start_offset": 9,
          "end_offset": 11,
          "type": "<ALPHANUM>",
          "position": 3
        },
        {
          "token": "the",
          "start_offset": 12,
          "end_offset": 15,
          "type": "<ALPHANUM>",
          "position": 4
        },
        {
          "token": "house",
          "start_offset": 16,
          "end_offset": 21,
          "type": "<ALPHANUM>",
          "position": 5
        }
      ]
    }
    View Code

    使用原來的es_sted中的english分詞數據庫

    # english分詞
    GET /my_index/_analyze
    {
      "analyzer": "es_std",
      "text": "a dog is in the house"
    }
    {
      "tokens": [
        {
          "token": "dog",
          "start_offset": 2,
          "end_offset": 5,
          "type": "<ALPHANUM>",
          "position": 1
        },
        {
          "token": "house",
          "start_offset": 16,
          "end_offset": 21,
          "type": "<ALPHANUM>",
          "position": 5
        }
      ]
    }
    View Code

定製本身的分詞器

PUT /my_index
{
  "settings": {
    "analysis": {
      "char_filter": {
        "&_to_and":{
          "type":"mapping",
          "mappings":["&=>and"]
        }
      },
      "filter": {
        "my_stopwords":{
          "type":"stop",
          "stopwords":["the","a"]
        }
      },
      "analyzer": {
        "my_analyzer":{
          "type": "custom",
          "char_filter":["html_strip","&_to_and"],
          "tokenizer":"standard",
          "filter":["lowercase", "my_stopwords"]
        }
      }
    }
  }
}

測試分詞json

GET /my_index/_analyze
{
  "text": "tom&jerry are a friend in the house, <a>, HAHA!!",
  "analyzer": "my_analyzer"
}
{
  "tokens": [
    {
      "token": "tomandjerry",
      "start_offset": 0,
      "end_offset": 9,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "are",
      "start_offset": 10,
      "end_offset": 13,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "friend",
      "start_offset": 16,
      "end_offset": 22,
      "type": "<ALPHANUM>",
      "position": 3
    },
    {
      "token": "in",
      "start_offset": 23,
      "end_offset": 25,
      "type": "<ALPHANUM>",
      "position": 4
    },
    {
      "token": "house",
      "start_offset": 30,
      "end_offset": 35,
      "type": "<ALPHANUM>",
      "position": 6
    },
    {
      "token": "haha",
      "start_offset": 42,
      "end_offset": 46,
      "type": "<ALPHANUM>",
      "position": 7
    }
  ]
}
View Code

設置使用分詞自定義api

PUT /my_index/_mapping/my_type
{
  "properties": {
    "content": {
      "type": "text",
      "analyzer": "my_analyzer"
    }
  }
}

mapping root object深刻剖析

  • root object
    就是某個type對應的mapping json,包括了properties,metadata(_id,_source,_type),settings(analyzer),其餘settings(好比include_in_all)
    PUT /my_index
    {
      "mappings": {
        "my_type": {
          "properties": {}
        }
      }
    }
  • properties多線程

    PUT /my_index/_mapping/my_type
    {
      "properties": {
        "title": {
          "type": "text"
        }
      }
    }
  • _source

    好處併發

    (1)查詢的時候,直接能夠拿到完整的document,不須要先拿document id,再發送一次請求拿document
    (2)partial update基於_source實現
    (3)reindex時,直接基於_source實現,不須要從數據庫(或者其餘外部存儲)查詢數據再修改
    (4)能夠基於_source定製返回field
    (5)debug query更容易,由於能夠直接看到_sourceapp

    若是不須要上述好處,能夠禁用_source

    PUT /my_index/_mapping/my_type2
    {
      "_source": {"enabled": false}
    }
  • _all
    將全部field打包在一塊兒,做爲一個_all field,創建索引。沒指定任何field進行搜索時,就是使用_all field在搜索。
    PUT /my_index/_mapping/my_type3
    {
      "_all": {"enabled": false}
    }

    也能夠在field級別設置include_in_all field,設置是否要將field的值包含在_all field中

    PUT /my_index/_mapping/my_type4
    {
      "properties": {
        "my_field": {
          "type": "text",
          "include_in_all": false
        }
      }
    }
  • 標識性metadata
    _index,_type,_id

定製化本身的dynamic mapping策略

dynamic參數

  • true: 遇到陌生字段就進行dynamic mapping
  • false: 遇到陌生字段就忽略
  • strict: 遇到陌生字段,就報錯

舉例:

PUT my_index
{
  "mappings": {
    "my_type":{
      "dynamic": "strict",
      "properties": {
        "title":{
          "type": "text"
        },
        "address":{
          "type": "object",
          "dynamic":"true"
        }
      }
    }
  }
}
PUT /my_index/my_type/1
{
  "title": "my article",
  "content": "this is my article",
  "address": {
    "province": "guangdong",
    "city": "guangzhou"
  }
}

{
  "error": {
    "root_cause": [
      {
        "type": "strict_dynamic_mapping_exception",
        "reason": "mapping set to strict, dynamic introduction of [content] within [my_type] is not allowed"
      }
    ],
    "type": "strict_dynamic_mapping_exception",
    "reason": "mapping set to strict, dynamic introduction of [content] within [my_type] is not allowed"
  },
  "status": 400
}
PUT /my_index/my_type/1
{
  "title": "my article",
  "address": {
    "province": "guangdong",
    "city": "guangzhou"
  }
}

GET /my_index/_mapping/my_type

{
  "my_index": {
    "mappings": {
      "my_type": {
        "dynamic": "strict",
        "properties": {
          "address": {
            "dynamic": "true",
            "properties": {
              "city": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              },
              "province": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              }
            }
          },
          "title": {
            "type": "text"
          }
        }
      }
    }
  }
}

定製dynamic mapping策略

  1. date_detection
    elasticsearch默認會按照必定格式識別date,好比yyyy-MM-dd。可是若是某個field先過來一個2017-01-01的值,就會被自動dynamic mapping成date,後面若是再來一個"hello world"之類的值,就會報錯。此時的解決方案是能夠手動關閉某個type的date_detention,若是有須要,本身手動指定某個field爲date類型。
    PUT /my_index/_mapping/my_type
    {
        "date_detection": false
    }
  2. dynamic template
    PUT my_index
    {
      "mappings": {
        "my_type":{
          "dynamic_templates": [
            {
              "en":{
                "match":"*_en",
                "match_mapping_type": "string",
                "mapping": {
                  "type":"string",
                  "analyzer":"english"
                  }
              }
            }
          ]
        }
      }
    }

    初始化數據

    PUT /my_index/my_type/1
    {
      "title": "this is my first article"
    }
    
    PUT /my_index/my_type/2
    {
      "title_en": "this is my first article"
    }

    無模板匹配

    GET /my_index/my_type/_search
    {
      "query": {
        "match": {
          "title":"is"
        }
      }
    }
    {
      "took": 1,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
      },
      "hits": {
        "total": 1,
        "max_score": 0.2824934,
        "hits": [
          {
            "_index": "my_index",
            "_type": "my_type",
            "_id": "1",
            "_score": 0.2824934,
            "_source": {
              "title": "this is my first article"
            }
          }
        ]
      }
    }
    View Code

    有模板匹配

    GET /my_index/my_type/_search
    {
      "query": {
        "match": {
          "title_en":"is"
        }
      }
    }
    {
      "took": 1,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
      },
      "hits": {
        "total": 0,
        "max_score": null,
        "hits": []
      }
    }
    View Code

    此時title沒有匹配到任何的dynamic模板,默認就是standard分詞器,不會過濾停用詞,is會進入倒排索引,用is來搜索就能夠搜索到。而title_en匹配到了dynamic模板,就是english分詞器,會過濾停用詞,is這種停用詞就會被過濾掉,用is來搜索就搜索不到了。

基於scoll+bulk+索引別名實現零停機重建索引

  1. 數據準備
    一個field的設置是不能被修改的,若是要修改一個field,那麼應該從新按照新的mapping,創建一個index,而後將數據批量查詢出來,從新用bulk api寫入index中,批量查詢的時候,建議採用scroll api,而且採用多線程併發的方式來reindex數據,每次scroll就查詢指定日期的一段數據,交給一個線程便可。
    一開始,依靠dynamic mapping,插入數據,可是不當心有些數據是2017-01-01這種日期格式的,因此title的這種field被自動映射爲了date類型,實際上它應該是string類型。
    DELETE /my_index
    PUT /my_index/my_type/1
    {
      "title": "2017-01-01"
    }
    
    PUT /my_index/my_type/2
    {
      "title": "2017-01-02"
    }
    
    PUT /my_index/my_type/3
    {
      "title": "2017-01-03"
    }
    GET /my_index/my_type/_search
    {
      "query": {
        "match_all": {}
      }
    }
    
    
    
    {
      "took": 1,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
      },
      "hits": {
        "total": 3,
        "max_score": 1,
        "hits": [
          {
            "_index": "my_index",
            "_type": "my_type",
            "_id": "2",
            "_score": 1,
            "_source": {
              "title": "2017-01-02"
            }
          },
          {
            "_index": "my_index",
            "_type": "my_type",
            "_id": "1",
            "_score": 1,
            "_source": {
              "title": "2017-01-01"
            }
          },
          {
            "_index": "my_index",
            "_type": "my_type",
            "_id": "3",
            "_score": 1,
            "_source": {
              "title": "2017-01-03"
            }
          }
        ]
      }
    }
    View Code
  2. 當後期向索引中加入string類型的title值的時候,就會報錯
    PUT /my_index/my_type/4
    {
      "title": "my first article"
    }
    {
      "error": {
        "root_cause": [
          {
            "type": "mapper_parsing_exception",
            "reason": "failed to parse [title]"
          }
        ],
        "type": "mapper_parsing_exception",
        "reason": "failed to parse [title]",
        "caused_by": {
          "type": "illegal_argument_exception",
          "reason": "Invalid format: \"my first article\""
        }
      },
      "status": 400
    }
  3. 若是此時想修改title的類型,是不可能的
    PUT /my_index/_mapping/my_type
    {
      "properties": {
        "title": {
          "type": "text"
        }
      }
    }
    {
      "error": {
        "root_cause": [
          {
            "type": "illegal_argument_exception",
            "reason": "mapper [title] of different type, current_type [date], merged_type [text]"
          }
        ],
        "type": "illegal_argument_exception",
        "reason": "mapper [title] of different type, current_type [date], merged_type [text]"
      },
      "status": 400
    } 
  4. 此時,惟一的辦法,就是進行reindex,也就是說,從新創建一個索引,將舊索引的數據查詢出來,再導入新索引

  5. 若是說舊索引的名字,是old_index,新索引的名字是new_index,終端java應用,已經在使用old_index在操做了,難道還要去中止java應用,修改使用的index爲new_index,才從新啓動java應用嗎?這個過程當中,就會致使java應用停機,可用性下降

  6. 因此說,給java應用一個別名,這個別名是指向舊索引的,java應用先用着,java應用先用goods_index alias來操做,此時實際指向的是舊的my_index
    PUT /my_index/_alias/goods_index
  7. 新建一個index,調整其title的類型爲string
    PUT my_index_new
    {
      "mappings": {
        "my_type":{
          "properties": {
            "title":{
              "type": "text"
            }
          }
        }
      }
    }
  8. 使用scroll api將數據批量查詢出來
    GET my_index/_search?scroll=1m
    {
      "query": {
        "match_all": {}
      },
      "sort": [
        "_doc"
      ],
      "size": 1
    }
    {
      "_scroll_id": "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAARhWFjFMZHFMRnF4UVFxNHhnMk1waElfZ3cAAAAAAAEYWBYxTGRxTEZxeFFRcTR4ZzJNcGhJX2d3AAAAAAABGFoWMUxkcUxGcXhRUXE0eGcyTXBoSV9ndwAAAAAAARhXFjFMZHFMRnF4UVFxNHhnMk1waElfZ3cAAAAAAAEYWRYxTGRxTEZxeFFRcTR4ZzJNcGhJX2d3",
      "took": 1,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
      },
      "hits": {
        "total": 3,
        "max_score": null,
        "hits": [
          {
            "_index": "my_index",
            "_type": "my_type",
            "_id": "2",
            "_score": null,
            "_source": {
              "title": "2017-01-02"
            },
            "sort": [
              0
            ]
          }
        ]
      }
    }
    View Code
  9. 採用bulk api將scoll查出來的一批數據,批量寫入新索引
    POST _bulk
    {"index":{"_index": "my_index_new", "_type": "my_type", "_id": "2"}}
    {"title":"2017-01-02"}
  10. 反覆循環8~9,查詢一批又一批的數據出來,採起bulk api將每一批數據批量寫入新索引
  11. 將goods_index alias切換到my_index_new上去,java應用會直接經過index別名使用新的索引中的數據,java應用程序不須要停機,零提交,高可用
    POST _aliases
    {
      "actions": [
        {
          "remove": {
            "index": "my_index",
            "alias": "goods_index"
          }
        },
        {
          "add": {
            "index": "my_index_new",
            "alias": "goods_index"
          }
        }
      ]
    }
  12. 直接經過goods_index別名來查詢,是否ok
    GET /goods_index/my_type/_search

基於alias對client客戶端透明切換index

格式:

POST /_aliases
{
    "actions" : [
        { "remove" : { "index" : "test1", "alias" : "alias1" } },
        { "add" : { "index" : "test2", "alias" : "alias1" } }
    ]
}
相關文章
相關標籤/搜索