《ElasticSearch6.x實戰教程》之簡單搜索、Java客戶端（上）

時間 2019-11-06

標籤 elasticsearch6.x elasticsearch 實戰教程簡單搜索 java 客戶端欄目日誌分析简体版

原文原文鏈接

第五章-簡單搜索

衆裏尋他千百度java

搜索是ES的核心，本節講解一些基本的簡單的搜索。node

掌握ES搜索查詢的RESTful的API猶如掌握關係型數據庫的SQL語句，儘管Java客戶端API爲咱們不須要咱們去實際編寫RESTful的API，但在生產環境中，免不了在線上執行查詢語句作數據統計供產品經理等使用。git

數據準備

首先建立一個名爲user的Index，並建立一個student的Type，Mapping映射一共有以下幾個字段：程序員

建立名爲user的Index PUT http://localhost:9200/usergithub

建立名爲student的Type，且指定字段name和address的分詞器爲ik_smart。spring

POST http://localhost:9200/user/student/_mapping
{
 "properties":{
     "name":{
         "type":"text",
         "analyzer":"ik_smart"
     },
     "age":{
         "type":"short"
     }
 }
}

通過上一章分詞的學習咱們把text類型都指定爲ik_smart分詞器。數據庫

插入如下數據。json

POST localhost:9200/user/student
{
    "name":"kevin",
    "age":25
}

POST localhost:9200/user/student
{
    "name":"kangkang",
    "age":26
}

POST localhost:9200/user/student
{
    "name":"mike",
    "age":22
}

POST localhost:9200/user/student
{
    "name":"kevin2",
    "age":25
}

POST localhost:9200/user/student
{
    "name":"kevin yu",
    "age":21
}

按查詢條件數量維度

無條件搜索

GET http://localhost:9200/user/student/_search?prettyspringboot

查看索引user的student類型數據，獲得剛剛插入的數據返回:數據結構

單條件搜索

ES查詢主要分爲term精確搜索、match模糊搜索。

term精確搜索

咱們用term搜索name爲「kevin」的數據。

POST http://localhost:9200/user/student/_search?pretty
{
    "query":{
        "term":{
            "name":"kevin"
        }
    }
}

既然term是精確搜索，按照非關係型數據庫的理解來說就等同於=，那麼搜索結果也應該只包含1條數據。然而出乎意料的是，搜索結果出現了兩條數據：name="kevin"和name="keivin yu"，這看起來彷佛是進行的模糊搜索，但又沒有搜索出name="kevin2"的數據。咱們先繼續觀察match的搜索結果。

match模糊搜索

一樣，搜索name爲「kevin」的數據。

POST http://localhost:9200/user/student/_search?pretty
{
    "query":{
        "match":{
            "name":"kevin"
        }
    }
}

match的搜索結果居然仍然是兩條數據：name="kevin"和name="keivin yu"。一樣，name="kevin2"也沒有出如今搜索結果中。

緣由在於term和match的精確和模糊針對的是搜索詞而言，term搜索不會將搜索詞進行分詞後再搜索，而match則會將搜索詞進行分詞後再搜索。例如，咱們對name="kevin yu"進行搜索，因爲term搜索不會對搜索詞進行搜索，因此它進行檢索的是"kevin yu"這個總體，而match搜索則會對搜索詞進行分詞搜索，因此它進行檢索的是包含"kevin"和"yu"的數據。而name字段是text類型，且它是按照ik_smart進行分詞，就算是"kevin yu"這條數據因爲被分詞後變成了"kevin"和"yu"，因此term搜索不到任何結果。

若是必定要用term搜索name="kevin yu"，結果出現"kevin yu"，辦法就是在定義映射Mapping時就爲該字段設置一個keyword類型。

爲了下文的順利進行，刪除DELETE http:localhost:9200/user/student從新按照開頭建立索引以及插入數據吧。惟一須要修改的是在定義映射Mapping時，name字段修改成以下所示：

{
    "properties":{
          "name":{
              "type":"text",
              "analyzer":"ik_smart",
              "fields":{
                  "keyword":{
                      "type":"keyword",
            "ignore_abore":256
                  }
              }
          },
    "age":{
        "type":integer
    }
    }
}

待咱們從新建立好索引並插入數據後，此時再按照term搜索name="kevin yu"。

POST http://localhost:9200/user/student/_search
{
    "query":{
        "term":{
            "name.keyword":"kevin yu"
        }
    }
}

返回一條name="kevin yu"的數據。按照match搜索一樣出現name="kevin yu"，由於name.keyword不管如何都不會再分詞。

在已經創建索引且定義好映射Mapping的狀況下，若是直接修改name字段，此時能修改爲功，可是卻沒法進行查詢，這與ES底層實現有關，若是必定要修改要麼是新增字段，要麼是重建索引。

因此，與其說match是模糊搜索，倒不如說它是分詞搜索，由於它會將搜索關鍵字分詞；與其將term稱之爲模糊搜索，倒不如稱之爲不分詞搜索，由於它不會將搜索關鍵字分詞。

match查詢還有不少更爲高級的查詢方式：match_phrase短語查詢，match_phrase_prefix短語匹配查詢，multi_match多字段查詢等。將在複雜搜索一章中詳細介紹。

相似like的模糊搜索

wildcard通配符查詢。

POST http://localhost:9200/user/student/_search?pretty
{
  "query": {
    "wildcard": {
      "name": "*kevin*"
    }
  }
}

ES返回結果包括name="kevin"，name="kevin2"，name="kevin yu"。

fuzzy更智能的模糊搜索

fuzzy也是一個模糊查詢，它看起來更加」智能「。它相似於搜狗輸入法中容許語法錯誤，但仍能搜出你想要的結果。例如，咱們查詢name等於」kevin「的文檔時，不當心輸成了」kevon「，它仍然能查詢出結構。

POST http://localhost:9200/user/student/_search?pretty
{
  "query": {
    "fuzzy": {
      "name": "kevin"
    }
  }
}

ES返回結果包括name="kevin"，name="kevin yu"。

多條件搜索

上文介紹了單個條件下的簡單搜索，而且介紹了相關的精確和模糊搜索（分詞與不分詞）。這部分將介紹多個條件下的簡單搜索。

當搜索須要多個條件時，條件與條件之間的關係有」與「，」或「，「非」，正如非關係型數據庫中的」and「，」or「，「not」。

在ES中表示」與「關係的是關鍵字must，表示」或「關係的是關鍵字should，還有表示表示」非「的關鍵字must_not。

must、should、must_not在ES中稱爲bool查詢。當有多個查詢條件進行組合查詢時，此時須要上述關鍵字配合上文提到的term，match等。

精確查詢（term，搜索關鍵字不分詞）name="kevin"且age="25"的學生。

POST http://localhost:9200/user/student/_search?pretty
{
    "query":{
        "bool":{
            "must":[{
                "term":{
                    "name.keyword":"kevin"
                }
            },{
                "term":{
                    "age":25
                }
            }]
        }
    }
}

返回name="kevin"且age="25"的數據。

精確查詢（term，搜索關鍵字不分詞）name="kevin"或age="21"的學生。

POST http://localhost:9200/user/student/_search?pretty
{
    "query":{
        "bool":{
            "should":[{
                "term":{
                    "name.keyword":"kevin"
                }
            },{
                "term":{
                    "age":21
                }
            }]
        }
    }
}

返回name="kevin"，age=25和name="kevin yu"，age=21的數據

精確查詢（term，搜索關鍵字不分詞）name!="kevin"且age="25"的學生。

POST http://localhost:9200/user/student/_search?pretty
{
    "query":{
        "bool":{
            "must":[{
                "term":{
                    "age":25
                }
            }],
            "must_not":[{
                "term":{
                    "name.keyword":"kevin"
                }
            }]
        }
    }
}

返回name="kevin2"的數據。

若是查詢條件中同時包含must、should、must_not，那麼它們三者是"且"的關係

多條件查詢中查詢邏輯(must、should、must_not)與查詢精度(term、match)配合能組合成很是豐富的查詢條件。

按等值、範圍查詢維度

上文中講到了精確查詢、模糊查詢，已經"且"，"或"，"非"的查詢。基本上都是在作等值查詢，實際查詢中還包括，範圍（大於小於）查詢（range）、存在查詢（exists）、~~~不存在查詢（missing）~~。

範圍查詢

範圍查詢關鍵字range，它包括大於gt、大於等於gte、小於lt、小於等於lte。

查詢age>25的學生。

POST http://localhost:9200/user/student/_search?pretty
{
    "query":{
        "range":{
            "age":{
                "gt":25
            }
        }
    }
}

返回name="kangkang"的數據。

查詢age >= 21且age < 26的學生。

POST http://localhost:9200/user/search/_search?pretty
{
    "query":{
        "range":{
            "age":{
                "gte":21,
                "lt":25
            }
        }
    }
}

查詢age >= 21 且 age < 26且name="kevin"的學生

POST http://localhost:9200/user/search/_search?pretty
{
    "query":{
        "bool":{
            "must":[{
                "term":{
                    "name":"kevin"
                }
            },{
                "range":{
                    "age":{
                        "gte":21,
                        "lt":25
                    }
                }
            }]
        }
    }
}

存在查詢

存在查詢意爲查詢是否存在某個字段。

查詢存在name字段的數據。

POST http://localhost:9200/user/student/_search?pretty
{
    "query":{
        "exists":{
            "field":"name"
        }   
    }
}

不存在查詢

不存在查詢顧名思義查詢不存在某個字段的數據。在之前ES有missing表示查詢不存在的字段，後來的版本中因爲must not和exists能夠組合成missing，故去掉了missing。

查詢不存在name字段的數據。

POST http://localhost:9200/user/student/_search?pretty
{
    "query":{
        "bool":{
            "must_not":{
                "exists":{
                    "field":"name"
                }
            }
        }   
    }
}

分頁搜索

談到ES的分頁永遠都繞不開深分頁的問題。但在本章中暫時避開這個問題，只說明在ES中如何進行分頁查詢。

ES分頁查詢包含from和size關鍵字，from表示起始值，size表示一次查詢的數量。

查詢數據的總數

POST http://localhost:9200/user/student/_search?pretty

返回文檔總數。

分頁（一頁包含1條數據）模糊查詢(match，搜索關鍵字不分詞)name="kevin"

POST http://localhost:9200/user/student/_search?pretty
{
    "query":{
        "match":{
            "name":"kevin"
        }
    },
    "from":0,
    "size":1
}

結合文檔總數便可返回簡單的分頁查詢。

分頁查詢中每每咱們也須要對數據進行排序返回，MySQL中使用order by關鍵字，ES中使用sort關鍵字指定排序字段以及降序升序。

分頁（一頁包含1條數據）查詢age >= 21且age <=26的學生，按年齡降序排列。

POST http://localhost:9200/user/student/_search?pretty
{
    "query":{
        "range":{
            "age":{
                "gte":21,
                "lte":26
            }
        }
    },
    "from":0,
    "size":1,
    "sort":{
        "age":{
            "order":"desc"
        }
    }
}

ES默認升序排列，若是不指定排序字段的排序），則sort字段可直接寫爲"sort":"age"。

第六章-Java客戶端（上）

ES提供了多種方式使用Java客戶端：

TransportClient，經過Socket方式鏈接ES集羣，傳輸會對Java進行序列化
RestClient，經過HTTP方式請求ES集羣

目前經常使用的是TransportClient方式鏈接ES服務。但ES官方表示，在將來TransportClient會被永久移除，只保留RestClient方式。

一樣，Spring Boot官方也提供了操做ES的方式Spring Data ElasticSearch。本章節將首先介紹基於Spring Boot所構建的工程經過Spring Data ElasticSearch操做ES，再介紹一樣是基於Spring Boot所構建的工程，但使用ES提供的TransportClient操做ES。

Spring Data ElasticSearch

本節完整代碼（配合源碼使用更香）：https://github.com/yu-linfeng/elasticsearch6.x_tutorial/tree/master/code/spring-data-elasticsearch

使用Spring Data ElasticSearch後，你會發現一切變得如此簡單。就連鏈接ES服務的類都不須要寫，只須要配置一條ES服務在哪兒的信息就能開箱即用。

做爲簡單的API和簡單搜索兩章節的啓下部分，本節示例仍然是基於上一章節的示例。

經過IDEA建立Spring Boot工程，而且在建立過程當中選擇Spring Data ElasticSearch，主要步驟以下圖所示：

第一步，建立工程，選擇Spring Initializr。

第二步，選擇SpringBoot的依賴NoSQL -> Spring Data ElasticSearch。

建立好Spring Data ElasticSearch的Spring Boot工程後，按照ES慣例是定義Index以及Type和Mapping。在Spring Data ElasticSearch中定義Index、Type以及Mapping很是簡單。ES文檔數據實質上對應的是一個數據結構，也就是在Spring Data ElasticSearch要咱們把ES中的文檔數據模型與Java對象映射關聯。

定義StudentPO對象，對象中定義Index以及Type，Mapping映射咱們引入外部json文件（json格式的Mapping就是在簡單搜索一章中定義的Mapping數據）。

package com.coderbuff.es.easy.domain;

import lombok.Getter;
import lombok.Setter;
import lombok.ToString;
import org.springframework.data.annotation.Id;
import org.springframework.data.elasticsearch.annotations.Document;
import org.springframework.data.elasticsearch.annotations.Field;
import org.springframework.data.elasticsearch.annotations.FieldType;
import org.springframework.data.elasticsearch.annotations.Mapping;

import java.io.Serializable;

/**
 * ES mapping映射對應的PO
 * Created by OKevin on 2019-06-26 22:52
 */
@Getter
@Setter
@ToString
@Document(indexName = "user", type = "student")
@Mapping(mappingPath = "student_mapping.json")
public class StudentPO implements Serializable {

    private String id;

    /**
     * 姓名
     */
    private String name;

    /**
     * 年齡
     */
    private Integer age;
}

Spring Data ElasticSearch爲咱們屏蔽了操做ES太多的細節，以致於真的就是開箱即用，它操做ES主要是經過ElasticsearchRepository接口，咱們在定義本身具體業務時，只須要繼承它，擴展本身的方法。

package com.coderbuff.es.easy.dao;

import com.coderbuff.es.easy.domain.StudentPO;
import org.springframework.data.elasticsearch.repository.ElasticsearchRepository;
import org.springframework.stereotype.Repository;

/**
 * Created by OKevin on 2019-06-26 23:45
 */
@Repository
public interface StudentRepository extends ElasticsearchRepository<StudentPO, String> {
}

ElasticsearchTemplate能夠說是Spring Data ElasticSearch最爲重要的一個類，它對ES的Java API進行了封裝，建立索引等都離不開它。在Spring中要使用它，必然是要先注入，也就是實例化一個bean。而Spring Data ElasticSearch早爲咱們作好了一切，只須要在application.properties中定義spring.data.elasticsearch.cluster-nodes=127.0.0.1:9300，就可大功告成（網上有人的教程還在使用applicationContext.xml定義一個bean，事實證實，受到了Spring多年的「毒害」，Spring Boot遠比咱們想象的智能）。

單元測試建立Index、Type以及定義Mapping。

package com.coderbuff.es;

import com.coderbuff.es.easy.domain.StudentPO;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.data.elasticsearch.core.ElasticsearchTemplate;
import org.springframework.test.context.junit4.SpringRunner;

@RunWith(SpringRunner.class)
@SpringBootTest
public class SpringDataElasticsearchApplicationTests {

    @Autowired
    private ElasticsearchTemplate elasticsearchTemplate;

    /**
     * 測試建立Index，type和Mapping定義
     */
    @Test
    public void createIndex() {
        elasticsearchTemplate.createIndex(StudentPO.class);
        elasticsearchTemplate.putMapping(StudentPO.class);
    }
}

使用GET http://localhost:9200/user請求命令，可看到經過Spring Data ElasticSearch建立的索引。

索引建立完成後，接下來就是定義操做student文檔數據的接口。在StudentService接口的實現中，經過組合StudentRepository類對ES進行操做。StudentRepository類繼承了ElasticsearchRepository接口，這個接口的實現已經爲咱們提供了基本的數據操做，保存、修改、刪除只是一句代碼的事。就算查詢、分頁也爲咱們提供好了builder類。"最難"的實際上不是實現這些方法，而是如何構造查詢參數SearchQuery。建立SearchQuery實例，有兩種方式：

構建NativeSearchQueryBuilder類，經過鏈式調用構造查詢參數。
構建NativeSearchQuery類，經過構造方法傳入查詢參數。

這裏以"不分頁range範圍和term查詢age>=21且age<26且name=kevin"爲例。

SearchQuery searchQuery = new NativeSearchQueryBuilder()
                .withQuery(QueryBuilders.boolQuery()
                        .must(QueryBuilders.rangeQuery("age").gte(21).lt(26))
                        .must(QueryBuilders.termQuery("name", "kevin"))).build();

搜索條件的構造必定要對ES的查詢結構有比較清晰的認識，若是是在瞭解了簡單的API和簡單搜索兩章的前提下，學習如何構造多加練習必定能掌握。這裏就不一一驗證前面章節的示例，必定要配合代碼使用練習(https://github.com/yu-linfeng/elasticsearch6.x_tutorial/tree/master/code/spring-data-elasticsearch)