參考連接: http://blog.coinidea.com/elasticsearch-1264.htmlphp
用太多的MySQL這樣的數據庫了,直到有一天,用了ES遇到一個大坑。 就是post mapping的時候有一個「字段」analyzed 和 not_analyzed沒區分好,一時失誤致使該列全部數據所有分詞了。數據量大概1.5億條。 天真的覺得可以像MySQL那樣修改一下字段的屬性便可。ES是基於Lucene的,沒有別的辦法,通俗一點講,要麼刪除索引,重行導入,要麼reindex。所謂的reindex就是創建一個新的index,把舊index的數據拷貝過去。這樣的教程網上不少。好比: http://blog.csdn.net/loveyaqin1990/article/details/77684599 https://www.cnblogs.com/wmx3ng/p/4112993.htmlhtml
目前網上來說,具體實現代碼不多,我找了很久只找到了Python的實現。本文基於ES官方代碼的PHP SDK和bulk有一個遷移實現。數據庫
<?php require 'vendor/autoload.php'; $hosts['hosts'] = array( "host"=>'127.0.0.1', "port"=>'9200', 'scheme' => 'http' ); $client = Elasticsearch\ClientBuilder::create() ->setSSLVerification(false) ->setHosts($hosts) ->build(); for ($i = 1; $i <= 10; $i++) { if ($i != 10) { $params['index'] = 'index-0'.$i; } else { $params['index'] = 'index-'.$i; } echo $params["index"]."\r\n"; $params['type'] = 'raw'; $params['scroll'] = '120s'; $params["size"] = 50000; $params["body"] = array( "query" => array( "match_all" => array() ) ); $response = $client->search($params); $step = 1; while (isset($response['hits']['hits']) && count($response['hits']['hits']) > 0) { echo $step++."\t"; $scroll_id = $response['_scroll_id']; unset($response); $response = $client->scroll( array( "scroll_id" => $scroll_id, "scroll" => "120s" ) ); if (count($response['hits']['hits']) > 0) { $bulk = array('index'=>$params['index']."-reindex",'type'=>$params['type']); foreach ($response["hits"]["hits"] as $key=>$val) { $bulk['body'][]=array( 'index' => array( '_id'=>$val['_id'] ), ); $bulk['body'][] = $val['_source']; } // insert reindex $res = $client->bulk($bulk); unset($bulk); } else { break; } } }