關於 elasticsearch delete by query

elasticsearch delete by query  在 2.0 的時候已經被去除了,官網上說是由於delete by query  強制refresh可能致使在並行索引時發生OutOfMemoryError ,也可能致使主副本不一致,官網建議使用 scroll/scan API 查出ID,而後在根據ID批量刪除。html

根據官網建議寫了一個delete by query 插件。git

主要代碼以下:
    github

SearchResponse scrollResp = client.prepareSearch(index).setTypes(type)
				.setScroll(new TimeValue(60000)).setSize(defaultBatchSize).setQuery(query)
				.execute().actionGet();
		long total = scrollResp.getHits().getTotalHits();
		
		while (true) {
			BulkRequestBuilder requestBuilder = client.prepareBulk().setRefresh(true);

			for (SearchHit hit : scrollResp.getHits().getHits())
				requestBuilder.add(new DeleteRequest(index, type, hit.getId()));

			BulkResponse reponse = requestBuilder.execute().actionGet();
			if (reponse.hasFailures()) {
				for (BulkItemResponse item : reponse) {
					if (item.isFailed()) {
						LOGGER.warn(item.getFailureMessage());
					}
				}
			}
			
            total = total - reponse.getItems().length;
			LOGGER.info("has removed " + reponse.getItems().length + " rows, remain " + total + " rows ...");
			
			scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(60000)).execute()
					.actionGet();

			if (scrollResp.getHits().getHits().length == 0)
				break;
		}

寫了一個插件,插件地址:https://github.com/weiyuc/delete_by_query
目前這個插件是針對es 2.4.1版本的。其餘版本能夠修改一下便可。elasticsearch

相關文章
相關標籤/搜索