elasticsearch delete by query 在 2.0 的時候已經被去除了,官網上說是由於delete by query 強制refresh可能致使在並行索引時發生OutOfMemoryError ,也可能致使主副本不一致,官網建議使用 scroll/scan API 查出ID,而後在根據ID批量刪除。html
根據官網建議寫了一個delete by query 插件。git
主要代碼以下:
github
SearchResponse scrollResp = client.prepareSearch(index).setTypes(type) .setScroll(new TimeValue(60000)).setSize(defaultBatchSize).setQuery(query) .execute().actionGet(); long total = scrollResp.getHits().getTotalHits(); while (true) { BulkRequestBuilder requestBuilder = client.prepareBulk().setRefresh(true); for (SearchHit hit : scrollResp.getHits().getHits()) requestBuilder.add(new DeleteRequest(index, type, hit.getId())); BulkResponse reponse = requestBuilder.execute().actionGet(); if (reponse.hasFailures()) { for (BulkItemResponse item : reponse) { if (item.isFailed()) { LOGGER.warn(item.getFailureMessage()); } } } total = total - reponse.getItems().length; LOGGER.info("has removed " + reponse.getItems().length + " rows, remain " + total + " rows ..."); scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(60000)).execute() .actionGet(); if (scrollResp.getHits().getHits().length == 0) break; }
寫了一個插件,插件地址:https://github.com/weiyuc/delete_by_query,
目前這個插件是針對es 2.4.1版本的。其餘版本能夠修改一下便可。elasticsearch