elasticsearch入門

時間 2019-11-15

原文原文鏈接

前言html

以前咱們通常在作搜索功能時，通常都使用sql的like進行查詢，這種作法在數據量小時沒什麼影響，但隨着數據的不斷增大，會變的很沒效率，影響體驗。java

爲何使用like查詢效率低？linux

like '%a%'任何狀況下都不會走索引，由於索引只是排了個序，對於like '%a%'這種操做索引根本用不上，但對like 'a%'這種是有效的。可是這種方式每每不能知足咱們的需求，因此使用elasticsearch就變得很是有必要。sql

簡介數據庫

中文學習文檔：http://learnes.net/getting_started/installing_es.htmlapache

1.elasticsearch是什麼？json

Elasticsearch 是一個創建在全文搜索引擎 Apache Lucene(TM) 基礎上的搜索引擎，能夠說 Lucene 是當今最早進，最高效的全功能開源搜索引擎框架。windows

可是 Lucene 只是一個框架，要充分利用它的功能，你須要使用 JAVA，而且在你的程序中集成 Lucene。更糟的是，你須要作不少的學習瞭解，才能明白它是如何運行的，Lucene 確實很是複雜。api

Elasticsearch 使用 Lucene 做爲內部引擎，可是在你使用它作全文搜索時，只須要使用統一開發好的API便可，而並不須要瞭解其背後複雜的 Lucene 的運行原理。服務器

固然 Elasticsearch 並不只僅是 Lucene 那麼簡單，它不只包括了全文搜索功能，還能夠進行如下工做:

分佈式實時文件存儲，並將每個字段都編入索引，使其能夠被搜索。
實時分析的分佈式搜索引擎。
能夠擴展到上百臺服務器，處理PB級別的結構化或非結構化數據。

Elasticsearch 的上手是很是簡單的。它附帶了不少很是合理的默認值，這讓初學者很好地避免一上手就要面對複雜的理論，它安裝好了就可使用了，用很小的學習成本就能夠變得頗有生產力。

與數據庫的性能比較

測試環境：400萬+的數據

oracle:

elasticsearch:

能夠看到，數據用時9.3秒，而es僅僅只用了264毫秒！性能相差了35倍。固然，這只是我測試的結果，具體數據跟環境也有必定關係。

安裝

下載elasticsearch：elasticsearch.org/download.

將下載好的包解壓,切換到bin目錄

linux下運行：./elasticsearch

window下運行：elasticsearch.bat

數據

文檔經過索引API被索引——存儲並使其可搜索。可是最開始咱們須要決定咱們將文檔存儲在哪裏。一篇文檔經過index, type以及id來肯定它的惟一性。咱們能夠本身提供一個_id，或者也使用indexAPI 幫咱們生成一個。

index:索引，相似咱們的數據庫

type:類型，類咱們的表

id:主鍵

shard：分片，是 工做單元 底層的一員，它只負責保存索引中全部數據的一小片。一個索引能夠指向一個或多個分片

API的使用

package com.sunsharing.idream.elasticsearch;

import org.apache.lucene.index.Terms;
import org.apache.lucene.util.QueryBuilder;
import org.elasticsearch.action.bulk.BulkRequestBuilder;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.count.CountResponse;
import org.elasticsearch.action.delete.DeleteResponse;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.get.MultiGetItemResponse;
import org.elasticsearch.action.get.MultiGetResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.action.search.MultiSearchResponse;
import org.elasticsearch.action.search.SearchRequestBuilder;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.search.SearchType;
import org.elasticsearch.action.update.UpdateRequest;
import org.elasticsearch.client.Client;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.transport.InetSocketTransportAddress;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.common.xcontent.XContentBuilder;
import org.elasticsearch.index.query.*;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.aggregations.AggregationBuilders;
import org.elasticsearch.search.aggregations.bucket.histogram.DateHistogramInterval;
import org.elasticsearch.search.sort.SortOrder;
import org.elasticsearch.search.sort.SortParseElement;

import java.io.IOException;
import java.net.InetAddress;
import java.net.UnknownHostException;
import java.util.Date;
import java.util.Iterator;
import java.util.Map;
import java.util.concurrent.ExecutionException;

import static org.elasticsearch.common.xcontent.XContentFactory.jsonBuilder;
import static org.elasticsearch.index.query.QueryBuilders.termQuery;

public class Elasticsearch {
    /**
     * 獲取客戶端示例
     *
     * @return
     */
    public static Client getClient() {
        Client client = null;
        try {
            client = TransportClient.builder().build()
                    .addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("192.168.2.112"), 9300));
        } catch (UnknownHostException e) {
            e.printStackTrace();
        }
        return client;
    }

    /**
     * 建立json示例
     *
     * @throws IOException
     */
    public static void buildJson() throws IOException {
        XContentBuilder builder = jsonBuilder()
                .startObject()
                .field("user", "kimchy")
                .field("postDate", new Date())
                .field("message", "trying out Elasticsearch")
                .endObject();
        System.out.println(builder.string());
    }
    /**
     * 新增文檔示例
     *
     * @param index  索引名
     * @param type   類型名
     * @param source
     */
    public static void add(String index, String type, String source) {
        Client client = getClient();
        //文檔Id不傳，則交由Elasticsearch建立id（默認自定義uuid）
        IndexResponse response = client.prepareIndex(index, type).setSource(source).get();
        //索引名
        System.out.println(response.getIndex());
        //文檔id
        System.out.println(response.getId());
        //類型名
        System.out.println(response.getType());
        //版本號，如如果覆蓋，版本號會疊加
        System.out.println(response.getVersion());
        //是不是被建立，如若文檔已存在則被覆蓋，返回false
        System.out.println(response.isCreated());
        //關閉
        client.close();
    }

    /**
     * 獲取文檔示例
     *
     * @param index 索引名
     * @param type  類型名
     * @param id    文檔ID
     * @return
     */
    public static void get(String index, String type, String id) {
        Client client = getClient();
        GetResponse response = client.prepareGet(index, type, id).get();
        //返回文檔的內容（支持各類返回格式）
        Map sourceMap = response.getSource();
        String sourceString = response.getSourceAsString();
        byte[] sourceByte = response.getSourceAsBytes();
        //文檔是否存在
        boolean isExists = response.isExists();
        client.close();
    }

    /**
     * 刪除文檔示例
     *
     * @param index
     * @param type
     * @param id
     */
    public static void delete(String index, String type, String id) {
        Client client = getClient();
        DeleteResponse response = client.prepareDelete(index, type, id).get();
        //文檔是否找到
        System.out.println(response.isFound());
        client.close();
    }

    /**
     * 更新文檔示例
     *
     * @param index
     * @param type
     * @param id
     */
    public static void update(String index, String type, String id) {
        Client client = getClient();
        UpdateRequest updateRequest = new UpdateRequest();
        updateRequest.index(index);
        updateRequest.type(type);
        updateRequest.id(id);

        try {
            updateRequest.doc(jsonBuilder()
                    .startObject()
                            //要修改的字段
                    .field("message", "aaa")
                    .endObject());
            client.update(updateRequest).get();
            //另外一種方式
//            client.prepareUpdate(index, type, id)
//                    .setDoc(jsonBuilder()
//                            .startObject()
//                            .field("gender", "male")
//                            .endObject())
//                    .get();
        } catch (IOException e) {
            e.printStackTrace();
        } catch (InterruptedException e) {
            e.printStackTrace();
        } catch (ExecutionException e) {
            e.printStackTrace();
        } finally {
            client.close();
        }
    }


    /**
     * 使用 multiget api獲取一組數據
     */
    public static void multiGet() {
        Client client = getClient();
        MultiGetResponse multiGetItemResponses = client.prepareMultiGet()
                .add("testindex", "tweet", "1")
                        //能夠獲取相同索引/類型下的多個文檔
                .add("testindex", "tweet", "2", "3", "4")
                        //也能夠獲取其餘索引/類型下的文檔
                .add("cisp", "type", "foo")
                .get();

        for (MultiGetItemResponse itemResponse : multiGetItemResponses) {
            GetResponse response = itemResponse.getResponse();
            //索引必須存在，不然在此會報空指針異常
            if (response.isExists()) {
                String json = response.getSourceAsString();
                System.out.println(json);
            }
        }
        client.close();
    }

    /**
     * bulk API 一次請求能夠進行多個操做
     */
    public static void bulkApi() {
        Client client = getClient();
        BulkRequestBuilder bulkRequest = client.prepareBulk();

        try {
            bulkRequest.add(client.prepareIndex("twitter", "tweet", "1")
                            .setSource(jsonBuilder()
                                            .startObject()
                                            .field("user", "kimchy")
                                            .field("postDate", new Date())
                                            .field("message", "trying out Elasticsearch")
                                            .endObject()
                            )
            );

            bulkRequest.add(client.prepareDelete("twitter", "tweet", "2"));

            BulkResponse bulkResponse = bulkRequest.get();
            if (bulkResponse.hasFailures()) {
                //處理錯誤
                System.out.println(bulkResponse.buildFailureMessage());
            }
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            client.close();
        }
    }

    /**
     * 查詢示例
     */
    public static void search() {
        Client client = getClient();
        //全文檢索單一段
        MatchAllQueryBuilder qb = QueryBuilders.matchAllQuery();
        //全文檢索(多字段)
        MultiMatchQueryBuilder qb1 = QueryBuilders.multiMatchQuery("同", "worksNum", "picAddr", "userInfo.name");
        //terms level query 條件查詢，通常在結構化的數據中使用，如表瑪、枚舉、時間、年齡等..
        TermQueryBuilder qb2 = QueryBuilders.termQuery("userInfo.sex", "1");
        //多條件
        TermsQueryBuilder qb3 = QueryBuilders.termsQuery("tags", "blue", "pill");
        //數字篩選
        RangeQueryBuilder qb4 = QueryBuilders.rangeQuery("price").from(5).to(10);
        SearchResponse response = client.prepareSearch("story")
                .setTypes("picstory")
                        //QUERY_AND_FETCH: 向索引的全部分片（shard）都發出查詢請求，各分片返回的時候把元素文檔（document）和計算
                        // 後的排名信息一塊兒返回。這種搜索方式是最快的。由於相比下面的幾種搜索方式，這種查詢方法只須要去shard查詢一次。
                        // 可是各個shard返回的結果的數量之和多是用戶要求的size的n倍。
                        //QUERY_THEN_FETCH:    若是你搜索時，沒有指定搜索方式，就是使用的這種搜索方式。這種搜索方式，大概分兩個步驟，第
                        // 一步，先向全部的shard發出請求，各分片只返回排序和排名相關的信息（注意，不包括文檔document)，而後按照各分片
                        // 返回的分數進行從新排序和排名，取前size個文檔。而後進行第二步，去相關的shard取document。這種方式返回的docu
                        // ment與用戶要求的size是相等的。
                        //DFS_QUERY_AND_FETCH:與QUERY_AND_FETCH相同，預期一個初始的散射相伴用來爲更準確的score計算分配了
                        // 的term頻率
                        //DFS_QUERY_THEN_FETCH:    與QUERY_THEN_FETCH相同，預期一個初始的散射相伴用來爲更準確的score計算分
                        // 配了的term頻率。
                .setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
                        //搜索條件對象
                .setQuery(qb2)                 // Query
                        //過濾，12<age<18
                //.setPostFilter(QueryBuilders.rangeQuery("age").from(12).to(18))     // Filter
                        //從第0條顯示到第60條，且按匹配度排序
                .setFrom(0).setSize(60).setExplain(true)
                .execute()
                .actionGet();

        //也能夠這麼搜
        //SearchResponse response1 = client.prepareSearch().execute().actionGet();
        System.out.println(response.getHits().totalHits());
        client.close();
    }

    /**
     * search 請求返回一個單一的結果「頁」，而 scroll API 能夠被用來檢索大量的結果（甚至全部的結果），
     * 就像在傳統數據庫中使用的遊標 cursor，滾動並非爲了實時的用戶響應，而是爲了處理大量的數據，相似
     * 咱們常常寫存儲過程來處理數據同樣（個人理解是這樣）
     */
    public static void scroll() {
        Client client = getClient();
        SearchResponse scrollResp = client.prepareSearch("testindex")
                .addSort(SortParseElement.DOC_FIELD_NAME, SortOrder.ASC)
                        //這能夠告訴 Elasticsearch 須要保持搜索的上下文環境多久
                .setScroll(new TimeValue(60000))//單位秒
                .setQuery(termQuery("gender", "male"))
                .setSize(1).execute().actionGet();
        //Scroll知道沒有數據返回
        while (true) {
            for (SearchHit hit : scrollResp.getHits().getHits()) {
                //處理命中的數據
                System.out.println(hit.getSourceAsString());
            }
            //使用上面的請求返回的結果中包含一個 scroll_id，這個 ID 能夠被傳遞給 scroll API 來檢索下一個批次的結果。
            scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(60000)).execute().actionGet();
            //沒有數據了就跳出循環
            if (scrollResp.getHits().getHits().length == 0) {
                break;
            }
        }
        client.close();
    }

    /**
     * multiSearch 批量查詢，會將全部結果同時返回
     */
    public static void multiSearch() {
        Client client = getClient();
        SearchRequestBuilder srb1 = client
                .prepareSearch().setQuery(QueryBuilders.queryStringQuery("elasticsearch")).setSize(1);
        //matchQuery
        SearchRequestBuilder srb2 = client
                .prepareSearch().setQuery(QueryBuilders.matchQuery("name", "kimchy")).setSize(1);

        MultiSearchResponse sr = client.prepareMultiSearch()
                .add(srb1)
                .add(srb2)
                .execute().actionGet();

        // 將會獲得全部單個請求的響應
        long nbHits = 0;
        for (MultiSearchResponse.Item item : sr.getResponses()) {
            SearchResponse response = item.getResponse();
            nbHits += response.getHits().getTotalHits();
            System.out.println(response.getHits().getTotalHits());
        }
        System.out.println(nbHits);
    }

    /**
     * aggregation 聚合查詢 至關於傳統數據庫的group by
     */
    public static void aggregation() {
        Client client = getClient();
        SearchResponse sr = client.prepareSearch()
                .setQuery(QueryBuilders.matchAllQuery())
                .addAggregation(
                        AggregationBuilders.terms("colors").field("color")
                ).execute().actionGet();

        // Get your facet results
        org.elasticsearch.search.aggregations.bucket.terms.Terms colors = sr.getAggregations().get("colors");
        for (org.elasticsearch.search.aggregations.bucket.terms.Terms.Bucket bucket : colors.getBuckets()) {
            System.out.println("類型: " + bucket.getKey() + "  分組統計數量 " + bucket.getDocCount() + "  ");
        }
    }

    /**
     * 在搜索到固定文檔數後中止搜素
     *
     * @param docsNum
     */
    public static void teminateAfter(int docsNum) {
        Client client = getClient();
        SearchResponse sr = client.prepareSearch("testindex")
                //搜到1個文檔後中止搜索
                .setTerminateAfter(docsNum)
                .get();

        if (sr.isTerminatedEarly()) {
            System.out.println(sr.getHits().totalHits());
        }
        client.close();
    }

    /**
     * 獲取文檔數（2.3api已經不推薦使用）
     */
    public static void count() {
        Client client = getClient();
        CountResponse response = client.prepareCount("testindex")
                .setQuery(termQuery("user", "panda"))
                .execute()
                .actionGet();
        System.out.println(response.getCount());
        client.close();
    }


}

使用過程當中須要注意的幾點

1.jdk版本必須1.7以上，且client與server的jdk版本必須一致，不然沒法識別。

2.不支持無心義詞彙搜索，例如單個字母。

3.elasticsearch-jdbc 2.0之後不支持windows,因此不要在windows上試了。