elasticsearch學習筆記（三十五）——Elasticsearch 索引管理

時間 2019-11-10

原文原文鏈接

索引的基本操做

建立索引

PUT /{index}
{
  "settings": {},
  "mappings": {
    "properties": {
    }
  }
}

建立索引示例：html

PUT /test_index
{
  "settings": {
    "number_of_replicas": 1,
    "number_of_shards": 5
  },
  "mappings": {
    "properties": {
      "field1": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      },
      "ctime": {
        "type": "date"
      }
    }
  }
}

修改索引

PUT /{index}/_settings
{
    "setttings": {}
}

PUT /test_index/_settings
{
  "settings": {
    "number_of_replicas": 2
  }
}

刪除索引

DELETE /{index}

刪除索引API也能夠經過使用逗號分隔列表應用於多個索引，或者經過使用_all或*做爲索引應用於全部索引（當心！）。正則表達式

要禁用容許經過通配符刪除索引，或者將配置中的_all設置action.destructive_requires_name設置爲true。也能夠經過羣集更新設置api更改此設置。算法

修改分詞器以及定義本身的分詞器

Elasticsearch附帶了各類內置分析器，無需進一步配置便可在任何索引中使用：json

standard analyzer: 
所述standard分析器將文本分爲在字邊界條件，由Unicode的文本分割算法所定義的。它刪除了大多數標點符號，小寫術語，並支持刪除停用詞。
Simple analyzer:
該simple分析儀將文本分爲方面每當遇到一個字符是否是字母。而後所有變爲小寫
whitespace analyzer: 
whitespace只要遇到任何空格字符 ，分析器就會將文本劃分爲術語。它不會進行小寫轉換。
stop analyzer: 
該stop分析器是像simple，並且還支持去除中止詞。
keyword analyzer: 
所述keyword分析器是一個「空操做」分析器接受任何文本它被賦予並輸出徹底相同的文本做爲一個單一的術語，也就是不會分詞，進行精確匹配。
pattern analyzer: 
所述pattern分析器使用一個正則表達式對文本進行拆分。它支持小寫轉換和停用字。
language analyzer: 
Elasticsearch提供了許多特定於語言的分析器，如english或 french。
fingerprint analyzer: 
所述fingerprint分析器是一種專業的指紋分析器，它能夠建立一個指紋，用於重複檢測。

修改分詞器的設置

啓動english停用詞token filterapi

PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "es_std": {
          "type": "standard",
          "stopwords": "_english_"
        }
      }
    }
  }
}

GET /my_index/_analyze
{
  "analyzer": "standard",
  "text": "a dog is in the house"
}
{
  "tokens" : [
    {
      "token" : "a",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "dog",
      "start_offset" : 2,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "is",
      "start_offset" : 6,
      "end_offset" : 8,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "in",
      "start_offset" : 9,
      "end_offset" : 11,
      "type" : "<ALPHANUM>",
      "position" : 3
    },
    {
      "token" : "the",
      "start_offset" : 12,
      "end_offset" : 15,
      "type" : "<ALPHANUM>",
      "position" : 4
    },
    {
      "token" : "house",
      "start_offset" : 16,
      "end_offset" : 21,
      "type" : "<ALPHANUM>",
      "position" : 5
    }
  ]
}

GET /my_index/_analyze
{
  "analyzer": "es_std",
  "text": "a dog is in the house"
}
{
  "tokens" : [
    {
      "token" : "dog",
      "start_offset" : 2,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "house",
      "start_offset" : 16,
      "end_offset" : 21,
      "type" : "<ALPHANUM>",
      "position" : 5
    }
  ]
}

定製本身的分詞器

PUT /test_index
{
  "settings": {
    "analysis": {
      "char_filter": {
        "&_to_and": {
          "type": "mapping",
          "mappings": ["&=>and"]
        }
      },
      "filter": {
        "my_stopwords": {
          "type": "stop",
          "stopwords": ["the", "a"]
        }
      },
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "char_filter": ["html_strip", "&_to_and"],
          "tokenizer": "standard",
          "filter": ["lowercase", "my_stopwords"]
        }
      }
    }
  }
}

GET /test_index/_analyze
{
  "text": "tom&jerry are a friend in the house, <a>, HAHA!!",
  "analyzer": "my_analyzer"
}
{
  "tokens" : [
    {
      "token" : "tomandjerry",
      "start_offset" : 0,
      "end_offset" : 9,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "are",
      "start_offset" : 10,
      "end_offset" : 13,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "friend",
      "start_offset" : 16,
      "end_offset" : 22,
      "type" : "<ALPHANUM>",
      "position" : 3
    },
    {
      "token" : "in",
      "start_offset" : 23,
      "end_offset" : 25,
      "type" : "<ALPHANUM>",
      "position" : 4
    },
    {
      "token" : "house",
      "start_offset" : 30,
      "end_offset" : 35,
      "type" : "<ALPHANUM>",
      "position" : 6
    },
    {
      "token" : "haha",
      "start_offset" : 42,
      "end_offset" : 46,
      "type" : "<ALPHANUM>",
      "position" : 7
    }
  ]
}

定製化本身的dynamic mapping策略

dynamic參數

true: 遇到陌生字段就進行dynamic mapping
false: 遇到陌生字段就忽略
strict: 遇到陌生字段，就報錯
舉例：多線程

PUT /test_index
{
  "mappings": {
    "dynamic": "strict",
    "properties": {
      "title": {
        "type": "text"
      },
      "address": {
        "type": "object",
        "dynamic": "true"
      }
    }
  }
}

PUT /test_index/_doc/1
{
  "title": "my article",
  "content": "this is my article",
  "address": {
    "province": "guangdong",
    "city": "guangzhou"
  }
}
{
  "error": {
    "root_cause": [
      {
        "type": "strict_dynamic_mapping_exception",
        "reason": "mapping set to strict, dynamic introduction of [content] within [_doc] is not allowed"
      }
    ],
    "type": "strict_dynamic_mapping_exception",
    "reason": "mapping set to strict, dynamic introduction of [content] within [_doc] is not allowed"
  },
  "status": 400
}

PUT /test_index/_doc/1
{
  "title": "my article",
  "address": {
    "province": "guangdong",
    "city": "guangzhou"
  }
}
{
  "_index" : "test_index",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

date_detection

elasticsearch默認會按照必定格式識別date，好比yyyy-MM-dd。可是若是某個field先過來一個2017-01-01的值，就會被自動dynamic mapping成date，後面若是再來一個"hello world"之類的值，就會報錯。此時的解決方案是能夠手動關閉某個type的date_detention,若是有須要，本身手動指定某個field爲date類型。併發

PUT /{index}
{
  "mappings": {
    "date_detection": false
  }
}

dynamic template

"dynamic_templates": [
    {
        "my_template_name": {
            ... match conditions ...
            "mapping": {...}
        }
    }
]

示例：app

PUT /test_index
{
  "mappings": {
    "dynamic_templates": [
      {
        "en": {
          "match": "*_en",
          "match_mapping_type": "string",
          "mapping": {
            "type": "text",
            "analyzer": "english"
          }
        }
      }  
    ]
  }
}

PUT /test_index/_doc/1
{
  "title": "this is my first article"
}
PUT /test_index/_doc/2
{
  "title_en": "this is my first article"
}

GET /test_index/_mapping
{
  "test_index" : {
    "mappings" : {
      "dynamic_templates" : [
        {
          "en" : {
            "match" : "*_en",
            "match_mapping_type" : "string",
            "mapping" : {
              "analyzer" : "english",
              "type" : "text"
            }
          }
        }
      ],
      "properties" : {
        "title" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "title_en" : {
          "type" : "text",
          "analyzer" : "english"
        }
      }
    }
  }
}

GET /test_index/_search?q=is
{
  "took" : 12,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "test_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "title" : "this is my first article"
        }
      }
    ]
  }
}

此時title沒有匹配到任何的dynamic模板，默認就是standard分詞器，不會過濾停用詞，is會進入倒排索引，用is來搜索就能夠搜索到。而title_en匹配到了dynamic模板，就是english分詞器，會過濾停用詞，is這種停用詞就會被過濾掉，用is來搜索就搜索不到了。elasticsearch

基於scoll+bulk+索引別名實現零停機重建索引

一、重建索引

一個field的設置是不能被修改的，若是要修改一個field,那麼應該從新按照新的mapping，創建一個index,而後將數據批量查詢出來，從新用bulk api寫入index中，批量查詢的時候，建議採用scroll api，而且採用多線程併發的方式來reindex數據，每次scroll就查詢指定日期的一段數據，交給一個線程便可。
（1）一開始，依靠dynamic mapping，插入數據，可是不當心有些數據是2017-01-01這種日期格式的，因此title的這種field被自動映射爲了date類型，實際上它應該是string類型。ui

PUT /test_index/_doc/1
{
  "title": "2017-01-01"
}

GET /test_index/_mapping

{
  "test_index" : {
    "mappings" : {
      "properties" : {
        "title" : {
          "type" : "date"
        }
      }
    }
  }
}

（2）當後期向索引中加入string類型的title值的時候，就會報錯

PUT /test_index/_doc/2
{
  "title": "my first article"
}

{
  "error": {
    "root_cause": [
      {
        "type": "mapper_parsing_exception",
        "reason": "failed to parse field [title] of type [date] in document with id '2'"
      }
    ],
    "type": "mapper_parsing_exception",
    "reason": "failed to parse field [title] of type [date] in document with id '2'",
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "failed to parse date field [my first article] with format [strict_date_optional_time||epoch_millis]",
      "caused_by": {
        "type": "date_time_parse_exception",
        "reason": "Failed to parse with all enclosed parsers"
      }
    }
  },
  "status": 400
}

（3）若是此時想修改title的類型，是不可能的

PUT /test_index
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text"
      }
    }
  }
}
{
  "error": {
    "root_cause": [
      {
        "type": "resource_already_exists_exception",
        "reason": "index [test_index/mZALkQ8IQV67SjCVqkhq4g] already exists",
        "index_uuid": "mZALkQ8IQV67SjCVqkhq4g",
        "index": "test_index"
      }
    ],
    "type": "resource_already_exists_exception",
    "reason": "index [test_index/mZALkQ8IQV67SjCVqkhq4g] already exists",
    "index_uuid": "mZALkQ8IQV67SjCVqkhq4g",
    "index": "test_index"
  },
  "status": 400
}

（4）此時，惟一的辦法就是reindex，也就是說，從新創建一個索引，將舊索引的數據查詢出來，在導入新索引
（5）若是說舊索引的名字是old_index，新索引的名字是new_index，終端應用，已經在使用old_index進行操做了，難到還要去終止應用，修改使用的index爲new_index，在從新啓動應用嗎
（6）因此說此時應該採用別名的方式，給應用一個別名，這個別名指向舊索引，應用先用着。指向的仍是舊索引
格式：

POST /_aliases
{
  "actions": [
    {
      "add": {
        "index": "test",
        "alias": "alias1"
      }
    }
  ]
}

POST /_aliases
{
  "actions": [
    {
      "add": {
        "index": "test_index",
        "alias": "test_index_alias"
      }
    }
  ]
}

（7）新建一個index,調整title爲string

PUT /test_index_new
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text"
      }
    }
  }
}

（8）使用scroll api將數據批量查詢出來

GET /test_index/_search?scroll=1m
{
  "query": {
    "match_all": {}
  },
  "sort": [
    "_doc"
  ],
  "size": 1
}
{
  "_scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAACz3UWUC1iLVRFdnlRT3lsTXlFY01FaEFwUQ==",
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "test_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "title" : "2017-01-01"
        },
        "sort" : [
          0
        ]
      }
    ]
  }
}

POST /_search/scroll
{
  "scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAC0GYWUC1iLVRFdnlRT3lsTXlFY01FaEFwUQ==",
  "scroll": "1m"
}
{
  "_scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAC0GYWUC1iLVRFdnlRT3lsTXlFY01FaEFwUQ==",
  "took" : 14,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

（9）採用bulk api將scroll查出來的一批數據，批量寫入新索引

POST /_bulk
{"index": {"_index": "test_index_new", "_id": "1"}}
{"title": "2017-01-01"}

GET /test_index_new/_search
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test_index_new",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "title" : "2017-01-01"
        }
      }
    ]
  }
}

（10）反覆循環，查詢一批又一批的數據出來，再批量寫入新索引
（11）將test_index_alias切換到test_index_new上面去，應用會直接經過index別名使用新的索引中的數據，應用不須要停機，高可用

POST /_aliases
{
    "actions" : [
        { "remove" : { "index" : "test_index", "alias" : "test_index_alias" } },
        { "add" : { "index" : "test_index_new", "alias" : "test_index_alias" } }
    ]
}

GET /test_index_alias/_search

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test_index_new",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "title" : "2017-01-01"
        }
      }
    ]
  }
}

二、基於alias對client客戶端透明切換index

格式：

POST /_aliases
{
    "actions" : [
        { "remove" : { "index" : "test1", "alias" : "alias1" } },
        { "add" : { "index" : "test2", "alias" : "alias1" } }
    ]
}

注意actions裏面的json必定不要換行，不然沒法解析會報錯