Elasticsearch 基礎使用

時間 2019-11-30

原文原文鏈接

使用 cURL 執行 REST 命令

能夠對 Elasticsearch 發出 cURL 請求，這樣很容易從命令行 shell 體驗該框架。html

「Elasticsearch 是無模式的。它能夠接受您提供的任何命令，並處理它以供之後查詢。」java

Elasticsearch 是無模式的，這意味着它能夠接受您提供的任何命令，並處理它以供之後查詢。Elasticsearch 中的全部內容都被存儲爲文檔，因此您的第一個練習是存儲一個包含歌詞的文檔。首先建立一個索引，它是您的全部文檔類型的容器 — 相似於 MySQL 等關係數據庫中的數據庫。而後，將一個文檔插入該索引中，以即可以查詢該文檔的數據。shell

建立一個索引

Elasticsearch 命令的通常格式是：REST VERBHOST:9200/index/doc-type— 其中 REST VERB 是 PUT、GET 或 DELETE。（使用 cURL -X 動詞前綴來明確指定 HTTP 方法。）數據庫

要建立一個索引，可在您的 shell 中運行如下命令：apache

curl -XPUT "http://localhost:9200/music/"

插入一個文檔

要在 /music 索引下建立一個類型，可插入一個文檔。在第一個示例中，您的文檔包含數據（包含一行）「Deck the Halls」的歌詞，這是一首最初由威爾士詩人 John Ceirog Hughes 於 1885 年編寫的傳統的聖誕歌曲。json

要將包含「Deck the Halls」的文檔插入索引中，可運行如下命令（將該命令和本教程的其餘 cURL 命令都鍵入到一行中）：數組

curl -XPUT "http://localhost:9200/music/songs/1" -d '
{ "name": "Deck the Halls", "year": 1885, "lyrics": "Fa la la la la" }'

前面的命令使用 PUT 動詞將一個文檔添加到 /songs 文檔類型，併爲該文檔分配 ID 1。URL 路徑顯示爲 index/doctype/ID。服務器

查看文檔

要查看該文檔，可以使用簡單的 GET 命令：app

curl -XGET "http://localhost:9200/music/songs/1"

Elasticsearch 使用您以前 PUT 進索引中的 JSON 內容做爲響應：

{"_index":"music","_type":"songs","_id":"1","_version":1,"found":true,"_source":
{ "name": "Deck the Halls", "year": 1885, "lyrics": "Fa la la la la" }}

更新文檔

若是您認識到日期寫錯了，並想將它更改成 1886 怎麼辦？可運行如下命令來更新文檔：

curl -XPUT "http://localhost:9200/music/lyrics/1" -d '{ "name": 
"Deck the Halls", "year": 1886, "lyrics": "Fa la la la la" }'

由於此命令使用了相同的惟一 ID 1，因此該文檔會被更新。

刪除文檔（但暫時不要刪除）

暫時不要刪除該文檔，知道如何刪除它就好了：

curl -XDELETE "http://localhost:9200/music/lyrics/1"

從文件插入文檔

這是另外一個技巧。您可使用一個文件的內容來從命令行插入文檔。嘗試此方法，添加另外一首針對傳統歌曲「Ballad of Casey Jones」的文檔。將清單 1 複製到一個名爲 caseyjones.json 的文件中；也可使用示例代碼包中的 caseyjones.json 文件（參見下載）。將該文件放在任何方便對它運行 cURL 命令的地方。（在下載的代碼中，該文件位於根目錄中。）

清單 1. 「Ballad of Casey Jones」的 JSON 文檔

{
  "artist": "Wallace Saunders",
  "year": 1909,
  "styles": ["traditional"],
  "album": "Unknown",
  "name": "Ballad of Casey Jones",
  "lyrics": "Come all you rounders if you want to hear
The story of a brave engineer
Casey Jones was the rounder's name....
Come all you rounders if you want to hear
The story of a brave engineer
Casey Jones was the rounder's name
On the six-eight wheeler, boys, he won his fame
The caller called Casey at half past four
He kissed his wife at the station door
He mounted to the cabin with the orders in his hand
And he took his farewell trip to that promis'd land

Chorus:
Casey Jones--mounted to his cabin
Casey Jones--with his orders in his hand
Casey Jones--mounted to his cabin
And he took his... land"
}

運行如下命令，將此文檔 PUT 到您的 music 索引中：

$ curl -XPUT "http://localhost:9200/music/lyrics/2" -d @caseyjones.json

在該索引中時，將清單 2 的內容（包含另外一手民歌「Walking Boss」）保存到 walking.json 文件中。

清單 2. 「Walking Boss」 JSON

{
  "artist": "Clarence Ashley",
  "year": 1920
  "name": "Walking Boss",
  "styles": ["folk","protest"],
  "album": "Traditional",
  "lyrics": "Walkin' boss
Walkin' boss
Walkin' boss
I don't belong to you

I belong
I belong
I belong
To that steel driving crew

Well you work one day
Work one day
Work one day
Then go lay around the shanty two"
}

將此文檔推送到索引中：

$ curl -XPUT "http://localhost:9200/music/lyrics/3" -d @walking.json

搜索 REST API

是時候運行一次基本查詢了，此查詢比您運行來查找「Get the Halls」文檔的簡單 GET 要複雜一些。文檔 URL 有一個內置的 _search 端點用於此用途。在歌詞中找到全部包含單詞 you 的歌曲：

curl -XGET "http://localhost:9200/music/lyrics/_search?q=lyrics:'you'"

q 參數表示一個查詢。

響應是：

{"took":107,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max
_score":0.15625,"hits":[{"_index":"music","_type":"songs","_id":"2","_
score":0.15625,"_source":{"artist": "Wallace Saunders","year": 1909,"styles":
["traditional"],"album": "Unknown","name": "Ballad of Casey Jones","lyrics": "Come all you rounders
if you want to hear The story of a brave engineer Casey Jones was the rounder's name.... Come all
you rounders if you want to hear The story of a brave engineer Casey Jones was the rounder's name
On the six-eight wheeler, boys, he won his fame The caller called Casey at half past four He kissed
his wife at the station door He mounted to the cabin with the orders in his hand And he took his
farewell trip to that promis'd land Chorus: Casey Jones--mounted to his cabin Casey Jones--with his
orders in his hand Casey Jones--mounted to his cabin And he took his... land"
}},{"_index":"music","_type":"songs","_id":"3","_score":0.06780553,"_source":{"artist": "Clarence
Ashley","year": 1920,"name": "Walking Boss","styles": ["folk","protest"],"album":
"Traditional","lyrics": "Walkin' boss Walkin' boss Walkin' boss I don't belong to you I belong I
belong I belong To that steel driving crew Well you work one day Work one day Work one day Then go
lay around the shanty two"}}]}}

使用其餘比較符

還有其餘各類比較符可供使用。例如，找到全部 1900 年之前編寫的歌曲：

curl -XGET "http://localhost:9200/music/lyrics/_search?q=year:<1900

此查詢將返回完整的「Casey Jones」和「Walking Boss」文檔。

限制字段

要限制您在結果中看到的字段，可將 fields 參數添加到您的查詢中：

curl -XGET "http://localhost:9200/music/lyrics/_search?q=year:>1900&fields=year"

檢查搜索返回對象

清單 3 給出了 Elasticsearch 從前面的查詢返回的數據。

清單 3. 查詢結果

{
    "took": 6,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 2,
        "max_score": 1.0,
        "hits": [{
            "_index": "music",
            "_type": "lyrics",
            "_id": "1",
            "_score": 1.0,
            "fields": {
                "year": [1920]
            }
        }, {
            "_index": "music",
            "_type": "lyrics",
            "_id": "3",
            "_score": 1.0,
            "fields": {
                "year": [1909]
            }
        }]
    }
}

在結果中，Elasticsearch 提供了多個 JSON 對象。第一個對象包含請求的元數據：看看該請求花了多少毫秒 (took) 和它是否超時 (timed_out)。_shards 字段須要考慮 Elasticsearch 是一個集羣化服務的事實。甚至在這個單節點本地部署中，Elasticsearch 也在邏輯上被集羣化爲分片。

繼續查看清單 3 中的搜索結果，能夠觀察到 hits 對象包含：

total 字段，它會告訴您得到了多少個結果
max_score，用於全文搜索
實際結果

實際結果包含 fields 屬性，由於您將 fields 參數添加到了查詢中。不然，結果中會包含 source，並且包含完整的匹配文檔。_index、_type 和 _id 的用途不言自明；_score 指的是全文搜索命中長度。這 4 個字段始終會在結果中返回。

使用 JSON 查詢 DSL

基於查詢字符串的搜索很快會變得很複雜。對於更高級的查詢，Elasticsearch 提供了一種徹底基於 JSON 的特定於領域的語言 (DSL)。例如，要搜索 album 值爲 traditional 的每首歌曲，可建立一個包含如下內容的 query.json 文件：

{
    "query" : {
        "match" : {
            "album" : "Traditional"
        }
    }
}

而後運行：

curl -XGET "http://localhost:9200/music/lyrics/_search" -d @query.json

回頁首

從 Java 代碼使用 Elasticsearch

「Elasticsearch 強大功能會在經過語言 API 使用它時體現出來。」

Elasticsearch 強大功能會在經過語言 API 使用它時體現出來。如今我將介紹 Java API，您將從一個應用程序執行搜索。請參見下載部分，獲取相關的示例代碼。該應用程序使用了 Spark 微型框架，因此能夠很快設置它。

示例應用程序

爲一個新項目建立一個目錄，而後運行（將該命令鍵入到一行上）：

mvn archetype:generate -DgroupId=com.dw -DartifactId=es-demo 
-DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false

要生成一個項目來在 Eclipse 中使用，可經過 cd 進入 Maven 建立的項目目錄，並運行 mvn eclipse:eclipse。

在 Eclipse 中，選擇 File > Import > Existing Project into Workspace。導航到您使用 Maven 的文件夾，選擇該項目，單擊 Finish。

在 Eclipse 中，您能夠看到一個基本的 Java 項目佈局，包括根目錄中的 pom.xml 文件和一個 com.dw.App.java 主要類文件。將您所需的依賴項添加到 pom.xml 文件中。清單 4 給出了完整的 pom.xml 文件。

清單 4. 完整的 pom.xml

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.dw</groupId>
  <artifactId>es-demo</artifactId>
  <packaging>jar</packaging>
  <version>1.0-SNAPSHOT</version>
  <name>es-demo</name>
  <url>http://maven.apache.org</url>
  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <configuration>
          <compilerVersion>1.8</compilerVersion>
          <source>1.8</source>
          <target>1.8</target>
        </configuration>
      </plugin>
    </plugins>
  </build>
  <dependencies>
    <dependency>
    <groupId>com.sparkjava</groupId>
    <artifactId>spark-core</artifactId>
    <version>2.3</version>
</dependency>
<dependency>
    <groupId>com.sparkjava</groupId>
    <artifactId>spark-template-freemarker</artifactId>
    <version>2.3</version>
</dependency>
<dependency>
    <groupId>org.elasticsearch</groupId>
    <artifactId>elasticsearch</artifactId>
    <version>2.1.1</version>
</dependency>
  </dependencies>
</project>

清單 4 中的依賴項獲取 Spark 框架核心、Spark Freemarker 模板支持和 Elasticsearch。另請注意，我將 <source> 版本設置爲 Java 8，Spark 須要該版本（由於它大量使用了 lambda）。

我不知道您的狀況，但我不久前構建了許多 RESTful 應用程序，因此爲了改變如下步調，您將爲應用程序提供一個更加傳統的「提交和加載 (submit-and-load)」 UI。

在 Eclipse 中，在導航器中右鍵單擊項目，選擇 Configure > Convert to Maven Project，以便 Eclipse 能夠解析 Maven 依賴項。轉到項目，右鍵單擊該項目，而後選擇 Maven > Update Project。

Java 客戶端配置

Elasticsearch 的 Java 客戶端很是強大；它能夠創建一個嵌入式實例並在必要時運行管理任務。但我在這裏將重點介紹如何運行鍼對您已運行的節點的應用程序任務。

運行一個 Java 應用程序和 Elasticsearch 時，有兩種操做模式可供使用。該應用程序可在 Elasticsearch 集羣中扮演更加主動或更加被動的角色。在更加主動的狀況下（稱爲 Node Client），應用程序實例將從集羣接收請求，肯定哪一個節點應處理該請求，就像正常節點所作的同樣。（應用程序甚至能夠託管索引和處理請求。）另外一種模式稱爲 Transport Client，它將全部請求都轉發到另外一個 Elasticsearch 節點，由後者來肯定最終目標。

獲取 Transport Client

對於演示應用程序，（經過 App.java 中執行的初始化）選擇 Transport Client，並保持 Elasticsearch 執行最低級別的處理：

Client client = TransportClient.builder().build()
   .addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("localhost"), 9300));

若是鏈接到一個 Elasticsearch 集羣，構建器能夠接受多個地址。（在本例中，您只有一個 localhost 節點。）鏈接到端口 9300，而不是像以前在 REST API 的 cURL 中同樣鏈接到 9200。Java 客戶端將會使用這個特殊端口，使用端口 9200 不起做用。（其餘 Elasticsearch 客戶端，Python 客戶端就是其中之一，將會使用 9200 來訪問 REST API。）

在服務器啓動時建立該客戶端，並在整個請求處理過程當中使用它。Spark 經過 Mustache 模板引擎的 Java 實現來呈現該頁面，並且 Spark 定義了請求端點 — 但我不會太多地解釋這些簡單的用例。（請參見參考資料，獲取 Spark 的詳細信息的連接。）

該應用程序的索引頁面顯示了 Java 客戶端的功能：

UI：

呈現現有歌曲的列表
提供一個添加歌曲的按鈕
實現按藝術家和歌詞進行搜索
返回突出顯示了匹配內容的結果

搜索和處理結果

在清單 5 中，根 URL / 被映射到 index.mustache 頁面。

清單 5. 基本搜索

Spark.get("/", (request, response) -> {
        SearchResponse searchResponse = 
            client.prepareSearch("music").setTypes("lyrics").execute().actionGet();
        SearchHit[] hits = searchResponse.getHits().getHits();

            Map<String, Object> attributes = new HashMap<>();
            attributes.put("songs", hits);

            return new ModelAndView(attributes, "index.mustache");
        }, new MustacheTemplateEngine());

清單 5 中的有趣部分始於：

SearchResponse searchResponse = client.prepareSearch("music").setTypes("lyrics").execute().actionGet();

這一行顯示了搜索 API 的簡單用法。使用 prepareSearch 方法指定一個索引（在本例中爲 music），而後執行查詢。查詢基本上顯示爲「Give me all of the records in the music index.」。另外，將文檔類型設置爲 lyrics，但在這個簡單用例中沒有必要這麼作，由於索引僅包含一種文檔類型。在更大的應用程序，須要執行這種設置。這個 API 調用相似於您以前看到的 curl -XGET "http://localhost:9200/music/lyrics/_search" 調用。

SearchResponse 對象包含有趣的功能（例如命中數量和評分），但就目前而言，您只想要一個結果數組，可以使用 searchResponse.getHits().getHits(); 得到它。

最後，將結果數組添加到視圖上下文中，並讓 Mustache 呈現它。Mustache 模板以下所示：

清單 6. index.mustache

<html>
<body>
<form name="" action="/search">
  <input type="text" name="artist" placeholder="Artist"></input>
  <input type="text" name="query" placeholder="lyric"></input>
  <button type="submit">Search</button>
</form>
<button onclick="window.location='/add'">Add</button>
<ul>
{{#songs}}
  <li>{{id}} - {{getSource.name}} - {{getSource.year}}
    {{#getHighlightFields}} -
      {{#lyrics.getFragments}}
        {{#.}}{{{.}}}{{/.}}
      {{/lyrics.getFragments}}
    {{/getHighlightFields}}
  </li>
{{/songs}}
</ul>

</body>
</html>

突出顯示高級查詢和匹配內容

要支持突出顯示更高級的查詢和匹配內容，可使用 /search，以下所示：

清單 7. 搜索和突出顯示

Spark.get("/search", (request, response) -> {
        SearchRequestBuilder srb = client.prepareSearch("music").setTypes("lyrics");

        String lyricParam = request.queryParams("query");
        QueryBuilder lyricQuery = null;
        if (lyricParam != null && lyricParam.trim().length() > 0){
            lyricQuery = QueryBuilders.matchQuery("lyrics", lyricParam);
        }
        String artistParam = request.queryParams("artist");
        QueryBuilder artistQuery = null;
        if (artistParam != null && artistParam.trim().length() > 0){
          artistQuery = QueryBuilders.matchQuery("artist", artistParam);
        }

        if (lyricQuery != null && artistQuery == null){
          srb.setQuery(lyricQuery).addHighlightedField("lyrics", 0, 0);
        } else if (lyricQuery == null && artistQuery != null){
          srb.setQuery(artistQuery);
        } else if (lyricQuery != null && artistQuery != null){
          srb.setQuery(QueryBuilders.andQuery(artistQuery, 
              lyricQuery)).addHighlightedField("lyrics", 0, 0);
        }

        SearchResponse searchResponse = srb.execute().actionGet();

SearchHit[] hits = searchResponse.getHits().getHits();

    Map<String, Object> attributes = new HashMap<>();
    attributes.put("songs", hits);

    return new ModelAndView(attributes, "index.mustache");
}, new MustacheTemplateEngine());

在清單 7 中，要注意的第一個有趣的 API 用法是 QueryBuilders.matchQuery("lyrics", lyricParam);。這是您設置對 lyrics 字段的查詢的地方。另外要注意的是 QueryBuilders.andQuery(artistQuery, lyricQuery)，它是將查詢的 artist 和 lyrics 部分合併到 AND 查詢中的一種方法。

.addHighlightedField("lyrics", 0, 0); 調用告訴 Elasticsearch 生成 lyrics 字段上的搜索命中突出顯示結果。第二和第三個參數分別指定無線大小的分段和無限數量的分段。

在呈現搜索結果時，將突出顯示結果放入 HTML 中。使用 Elasticsearch 就能生成有效的 HTML，使用 <em> 標記來突出顯示匹配字符串所在的位置。

插入文檔

讓咱們來看看如何以編程方式將文檔插入索引中。清單 8 給出了添加過程。

清單 8. 插入索引中

Spark.post("/save", (request, response) -> {
      StringBuilder json = new StringBuilder("{");
      json.append("\"name\":\""+request.raw().getParameter("name")+"\",");
      json.append("\"artist\":\""+request.raw().getParameter("artist")+"\",");
      json.append("\"year\":"+request.raw().getParameter("year")+",");
      json.append("\"album\":\""+request.raw().getParameter("album")+"\",");
      json.append("\"lyrics\":\""+request.raw().getParameter("lyrics")+"\"}");

      IndexRequest indexRequest = new IndexRequest("music", "lyrics",
          UUID.randomUUID().toString());
      indexRequest.source(json.toString());
      IndexResponse esResponse = client.index(indexRequest).actionGet();

      Map<String, Object> attributes = new HashMap<>();
      return new ModelAndView(attributes, "index.mustache");
    }, new MustacheTemplateEngine());

使用 StringBuilder 直接生成一個 JSON 字符串來建立它。在生產應用程序中，可以使用 Boon 或 Jackson 等庫。

執行 Elasticsearch 工做的部分是：

IndexRequest indexRequest = new IndexRequest("music", "lyrics", UUID.randomUUID().toString());

在本例中，使用了 UUID 來生成 ID。

相關標籤/搜索

基本使用

使用

基礎

elasticsearch+elasticsearch

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。