Elasticsearch的CRUD:REST與Java API

CRUD(Create, Retrieve, Update, Delete)是數據庫系統的四種基本操做,分別表示建立、查詢、更改、刪除,俗稱「增刪改查」。Elasticsearch做爲NoSQL數據庫(雖然ES是爲搜索引擎而生的,但我更願意將其看做帶有強大文本搜索功能的NoSQL)。html

如下示例基於Elasticsearch 2.4版本。java

Create

在默認狀況下,ES的REST接口的端口號爲9200,對接Java client的端口號爲9300。shell

Create操做爲向index中索引文檔,若index不存在則ES會自動建立;數據庫

$ curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{<json data>}'

Java API("org.elasticsearch" % "elasticsearch" % "2.4.1")經過TransportClient與ES集羣鏈接,CRUD操做即是基於此而實現的。json

final Settings settings = Settings.settingsBuilder()
          .put("client.transport.sniff", true)
          .put("client.transport.ping_timeout", 20, TimeUnit.SECONDS)
          .put("client", true)
          .put("data", false)
          .put("cluster.name", "<cluster name>")
          .build();
          
Client client = TransportClient.builder()
              .settings(settings).build()
              .addTransportAddresses(
                      new InetSocketTransportAddress(InetAddress.getByName("host1"), 9300),
                      new InetSocketTransportAddress(InetAddress.getByName("host2"), 9300));

Index Java API建立index或索引document:api

import org.elasticsearch.action.index.IndexResponse;

IndexResponse response = client.prepareIndex("twitter", "tweet")
    .setSource(documentJson)
    .get();

Retrieve

ES的查詢DSL大體能夠分爲兩種:curl

  • Query DSL,主要配合bool、match等使用,至關於SQL中的where子句;
  • Aggregations,至關於SQL中的group by部分,細分爲以下三類
  1. Bucketing,聚合函數只能是count(*),表示的是doc命中數,能夠嵌套子aggs;
  2. Metric,相比於Bucketing其很是靈活,可配合avgmaxsum等聚合函數,可是不能嵌套子aggs;
  3. Pipeline,以其餘aggs的結果做爲輸入,而不是直接在文檔集合上進行操做。

ES的Query DSL功能實在是強大,在本文短短的篇幅中很難闡述徹底,故只列舉了兩個簡單實例。在之前的項目中,我使用過1.7版本ES,後來發現2.0.0-beta1版本及以後DSL語法發生很大的變化,好比filteredandor等被廢棄掉了,而被bool取而代之;對應的Java API支持鏈式操做,與Java 8配合寫起來很是舒服。elasticsearch

REST經過_search接口進行DSL查詢:ide

$ curl -XGET 'localhost:9200/<index>/_search?pretty' -d'{<dsl>}'

實戰List<List<String>> idsList做爲過濾條件,其中內一層爲and關係、內二層爲or關係;而後多字段(爲bucketSizeMap的key)aggs,Java 8實現:函數

BoolQueryBuilder mustQueryBuilder = boolQuery();
if (!(idsList.size() == 1 && idsList.get(0).isEmpty())) {
  mustQueryBuilder = idsList.stream().reduce(
          boolQuery(),
          (mustQB, ids) -> {
            BoolQueryBuilder shouldQB = ids.stream().reduce(boolQuery(),
                    (qb, id) -> qb.should(termQuery(SearchSystem.getEsType(id, idMap), id)),
                    BoolQueryBuilder::should);
            return mustQB.must(shouldQB);
          },
          BoolQueryBuilder::must);
}
SearchRequestBuilder searchRequestBuilder = client.prepareSearch(indexName)
        .setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
        .setQuery(mustQueryBuilder);

for (Map.Entry<String, Integer> entry : bucketSizeMap.entrySet()) {
  AggregationBuilder aggregationBuilder = AggregationBuilders
          .terms(entry.getKey())
          .field(entry.getKey()).size(entry.getValue());
  searchRequestBuilder.addAggregation(aggregationBuilder);
}
SearchResponse response = searchRequestBuilder.execute().actionGet();

Bucket Aggregations支持filter aggs,即知足過濾條件後作aggs,

aggs:
    <aggs_name>:
        filter:
        aggs:

其與filter query + aggs在功能上是等價的,

query:
    bool:
        filter:
aggs:

可是,經測試發現filter query + aggs是比filter aggs查詢要快。

Update

update爲document級別的操做,即僅支持對某個具體document進行更新;REST經過_update接口:

$ curl -XPOST 'localhost:9200/<_index>/<_type>/<_id>/_update' -d '{<data>}'

Java API則有兩種實現方式:UpdateRequest + updateprepareUpdate

// case 1
UpdateRequest updateRequest = new UpdateRequest();
updateRequest.index("index");
updateRequest.type("type");
updateRequest.id("1");
updateRequest.doc(jsonBuilder()
        .startObject()
            .field("gender", "male")
        .endObject());
client.update(updateRequest).get();

// case 2
client.prepareUpdate("ttl", "doc", "1")
        .setDoc(jsonBuilder()               
            .startObject()
                .field("gender", "male")
            .endObject())
        .get();

Delete

delete操做一般都伴隨着檢查index是否存在(exist),exist的RESTful接口與Java API分別以下:

$ curl -XHEAD -i 'http://localhost:9200/twitter'
client.admin().indices()
    .prepareExists(indexName)
    .execute().actionGet().isExists();

ES提供了三種粗細粒度的刪除操做:

  • 刪除整個index;
  • 刪除index中某一type;
  • 刪除特定的document.

RESTful接口:

-- delete complete index
$ curl -XDELETE 'http://localhost:9200/<indexname>'
-- delete a type in index
$ curl -XDELETE 'http://localhost:9200/<indexname>/<typename>'
-- delete a particular document
$ curl -XDELETE 'http://localhost:9200/<indexname>/<typename>/<documentId>

Java API實現:

// delete complete index
client.admin().indices().delete(new DeleteIndexRequest("<indexname>")).actionGet();

// delete a type in index
client.prepareDelete().setIndex("<indexname>").setType("<typename>").setId("*").execute().actionGet();

// delete a particular document
client.prepareDelete().setIndex("<indexname>").setType("<typename>").setId("<documentId>").execute().actionGet();

// or
DeleteResponse response = client.prepareDelete("twitter", "tweet", "1")
    .execute()
    .actionGet();
相關文章
相關標籤/搜索