Elasticsearch 的坑爹事 html
本文記錄一次Elasticsearch mapping field修改過程
團隊使用Elasticsearch作日誌的分類檢索分析服務,使用了相似以下的_mappingpython
?數據庫
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
{
"settings"
: {
"number_of_shards"
: 20
},
"mappings"
: {
"client"
: {
"properties"
: {
"ip"
: {
"type"
:
"long"
},
"cost"
: {
"type"
:
"long"
},
}
|
如今問題來了,日誌中輸出的"127.0.0.1"這類的IP地址在Elasticsearch中是不能轉化爲long的(報錯Java.lang.NumberFormatException),因此咱們必須將字段改成string型或者ip型(Elasticsearch支持, 數據類型可見mapping-core-types)才能達到理想的效果.
目標明確了,就是改掉mapping的ip的field type便可.
elasticsearch.org找了一圈 嘿嘿, update一下便可app
?less
1
2
3
4
5
6
7
8
|
curl -XPUT localhost:8301/store/client/_mapping -d '
{
"client"
: {
"properties"
: {
"local_ip"
: {
"type"
:
"string"
,
"store"
:
"yes"
}
}
}
}
|
報錯結果curl
?elasticsearch
1
|
{
"error"
:
"MergeMappingException[Merge failed with failures {[mapper [local_ip] of different type, current_type [long], merged_type [string]]}]"
,
"status"
:400}
|
尼瑪 真逗 我long想轉一下string 竟然失敗(elasticsearch產品層面理應支持這種無損轉化) 無果
Google了一下相似的案例 (案例)
在一個帖子中獲得的elasticsearch開發人員的準確答覆ide
"You can't change existing mapping type, you need to create a new index with the correct mapping and index the data again."
想一想 略坑啊 我無論是由於elasticsearch仍是由於底層Lucene的緣由,修改一個field須要對全部已有數據的全部field進行reindex,這自己就是一個逆天的思路,可是elasticsearch的研發人員還以爲這沒有什麼不合理的.
在Elasticsearch上游逛了一圈,上面這樣寫到
(http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/)
the problem — why you can’t change mappings
You can only find that which is stored in your index. In order to make your data searchable, your database needs to know what type of data each field contains and how it should be indexed. If you switch a field type from e.g. a string to a date, all of the data for that field that you already have indexed becomes useless. One way or another, you need to reindex that field.
...
OK,這一段話很合理,我改了一個field的類型 須要對這個field進行reindex,如論哪一種數據庫都須要這麼作,沒錯.
咱們再繼續往下看看,reindexing your data, 尼瑪一看,弱爆了,他的reindexing your data不是對修改的filed進行reindex,而是建立了一個新的index,對全部的filed進行reindexing, 太逆天了。
吐槽歸吐槽,這個事情逃不了,那我就按他的來吧.
首先建立一個新的索引ui
?this
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
curl -XPUT localhost:8305/store_v2 -d '
{
"settings"
: {
"number_of_shards"
: 20
},
"mappings"
: {
"client"
: {
"properties"
: {
"ip"
: {
"type"
:
"string"
},
"cost"
: {
"type"
:
"long"
},
}
|
等等,我建立了新索引,client往Elasticsearch的代碼不會須要修改吧,瞅了一眼,有解決方案,創建一個alias(別名,和C++引用差很少),經過alias來實現對後面索引數據的解耦合,看到這,舒了一口氣。
如今的問題是 這是一個線上服務,不能停服務,因此我須要一個倒數據到個人新索引的一個方案
Elasticsearch官網寫到
pull the documents in from your old index, using a scrolled search and index them into the new index using the bulk API. Many of the client APIs provide a reindex() method which will do all of this for you. Once you are done, you can delete the old index.
第一句,看起來很美好,找了一圈,尼瑪無圖無真相,Google都沒有例子,你讓我怎麼導數據?
第二句 client APIS, 看起來只有這個方法可搞了
python用起來比較熟,因此我就直接選 pyes了,裝了一大堆破依賴庫以後,終於能夠run起來了
1
2
3
4
5
6
7
8
|
import pyes
search = pyes.query.MatchAllQuery().search(bulk_read=1000)
hits = conn.search(search,
'store_v1'
,
'client'
, scan=True, scroll=
"30m"
, model=lambda _,hit: hit)
for
hit
in
hits:
#print hit
conn.index(hit[
'_source'
],
'store_v2'
,
'client'
, hit[
'_id'
], bulk=True)
conn.flush()
|
花了大概一個多小時,新的索引基本和老索引數據一致了,對於線上完成瞬間的增量,這裏沒心思關注了,數據準確性要求沒那麼高,得過且過。
接下來修改alias別名的指向(若是你以前沒有用alias來改mapping,納尼就等着哭吧)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
curl -XPOST localhost:8305/_aliases -d '
{
"actions"
: [
{
"remove"
: {
"alias"
:
"store"
,
"index"
:
"store_v1"
}},
{
"add"
: {
"alias"
:
"store"
,
"index"
:
"store_v2"
}}
]
}
'
|
啷啷鏘鏘,正在追數據中
等新索引的數據已經追上時
將老的索引刪掉
1
|
curl -XDELETE localhost:8303/store_v1
|
至此完成!
一件如此簡單的事情,Elasticsearch竟然能讓他變得如此複雜,真是牛逼啊...