Elastic Search快速上手（2）：將數據存入ES

時間 2019-12-11

標籤 elastic search 快速上手數據存入简体版

原文原文鏈接

前言

在上手使用前，須要先了解一些基本的概念。html

推薦
能夠到 https://www.elastic.co/guide/cn/elasticsearch/guide/current/index.html 閱讀《Elastic Search 權威指南》，有很是詳細和全面的說明。mysql

ES中的一些概念

index（索引）

至關於mysql中的數據庫web

type（類型）

至關於mysql中的一張表sql

document（文檔）

至關於mysql中的一行（一條記錄）數據庫

field（域）

至關於mysql中的一列（一個字段）服務器

節點

一個服務器，由一個名字來標識app

集羣

一個或多個節點組織在一塊兒運維

分片

將一份數據劃分爲多小份的能力，容許水平分割和擴展容量。多個分片能夠響應請求，提升性能和吞吐量。elasticsearch

副本

複製數據，一個節點出問題時，其他節點能夠頂上。ide

倒排索引

可參考https://www.elastic.co/guide/cn/elasticsearch/guide/current/inverted-index.html。

索引&類型

對索引的基本操做

建立索引

經過如下命令可建立一個索引：

PUT job
{
  "settings":{
    "index":{
      "number_of_shards":5,
      "number_of_replicas":1
    }
  }
}

{
  "acknowledged": true,
  "shards_acknowledged": true
}

Elasticsearch 是利用分片將數據分發到集羣內各處的。分片是數據的容器，文檔保存在分片內，分片又被分配到集羣內的各個節點裏。
當你的集羣規模擴大或者縮小時， Elasticsearch 會自動的在各節點中遷移分片，使得數據仍然均勻分佈在集羣裏。

一個分片能夠是主分片或者副本分片。索引內任意一個文檔都歸屬於一個主分片，因此主分片的數目決定着索引可以保存的最大數據量。

一個副本分片只是一個主分片的拷貝。副本分片做爲硬件故障時保護數據不丟失的冗餘備份，併爲搜索和返回文檔等讀操做提供服務。

在上面例子中，主分片爲5，副本分片爲1.

查看索引的信息

GET job

查看job這個索引的信息：

{
  "job": {
    "aliases": {},
    "mappings": {},
    "settings": {
      "index": {
        "creation_date": "1502342603160",
        "number_of_shards": "5",
        "number_of_replicas": "1",
        "uuid": "LGalsb3eRKeGb5SbWCxO8w",
        "version": {
          "created": "5010199"
        },
        "provided_name": "job"
      }
    }
  }
}

能夠只查看某一項信息：

GET job/_settings

能夠查看job這個索引的settings信息：

{
  "job": {
    "settings": {
      "index": {
        "creation_date": "1502342603160",
        "number_of_shards": "5",
        "number_of_replicas": "1",
        "uuid": "LGalsb3eRKeGb5SbWCxO8w",
        "version": {
          "created": "5010199"
        },
        "provided_name": "job"
      }
    }
  }
}

修改索引信息

例如，將副本分片數量修改成2：

PUT job/_settings
{
  "number_of_replicas":2
}

映射

在建立索引時，咱們能夠預先設定映射，規定好各個字段及其數據類型，便於es更好地進行管理。好比說，以文章庫爲例，一篇文章的關鍵詞字段應看成爲完整的詞語，而文章的正文字段必須經過中文分詞器進行分詞。

經過設置映射mapping，能夠告知es這些字段的規則。

更詳細文檔參見：https://www.elastic.co/guide/cn/elasticsearch/guide/current/mapping-intro.html

數據類型

Elasticsearch支持以下類型：

字符串: text, keyword（注：5以前的版本里有string類型，5以後再也不支持此類型）
數字: byte, short, integer, long, float, double
布爾型:boolean
日期: date
複雜類型：如object, nested等

查看映射

輸入

GET job/_mapping

能夠查看job索引下的全部映射。

默認映射

在建立索引存入數據時，若是不指定類型，es會自動根據實際數據爲其添加類型。
例如，經過下面的語句插入文檔：

PUT job/type1/1
{
  "title":"abc",
  "words":123,
  "date":"2017-01-01",
  "isok":true
}

而後查看映射，結果爲：

{
  "job": {
    "mappings": {
      "type1": {
        "properties": {
          "date": {
            "type": "date"
          },
          "isok": {
            "type": "boolean"
          },
          "title": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "words": {
            "type": "long"
          }
        }
      }
    }
  }
}

可見，es自動根據類型對字段進行了映射。

設置映射

在建立索引時，能夠設置映射規則，具體格式形如上面查看映射時的返回結果。

PUT job
{
  "mappings":{
    "type2":{
      "properties":{
        "title":{
          "type":"keyword"
        },
        "salary":{
          "type":"integer"
        },
        "desc":{
          "type":"text",
          "analyzer": "ik_max_word"
        },
        "date":{
          "type":"date",
          "format":"yyyy-MM-dd"
        }
      }
    }
  }
}

注意，在上面爲desc字段指定了analyzer，就是一個自定義分詞器。在es-rtf中，默認給安裝了ik_smart和ik_max_word兩個分詞器，區別在於後者會分出更多的詞。
爲text類型的字段會被進行分詞，而後索引，而keyword字段不會被分詞。

自動轉換

建立索引和映射後，插入文檔時，字段會自動轉換成映射中規定的類型。好比，插入"123"到integer字段，會自動嘗試對字符串進行類型轉換。若是沒法轉換，則會報錯，沒法插入。

文檔

一個「文檔」即所謂的一條記錄。可對文檔進行增刪改操做。

插入文檔

能夠指定文檔id，即 PUT index_name/type_name/id。

PUT job/type2/1
{
  "title":"Python工程師",
  "salary":1000,
  "desc":"1. 參與devops相關係統開發，包括雲資源管理平臺，cmdb平臺、資源申請流程、基礎支撐平臺開發；2. 參與公司業務系統及自動化運維平臺的開發；3. 積累並規範化系統開發的最佳實踐並文檔化；4. 完善並遵照團隊的編碼規範，編寫高質量、結構清晰、易讀、易維護的代碼。",
  "date":"2017-08-08"
}

返回：
{
"_index": "job",
"_type": "type2",
"_id": "1",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"created": true
}

也可不指定id，則會自動分配id。注意這裏要使用POST方式。

POST job/type2/
{
  "title":"Python工程師2",
  "salary":1000,
  "desc":"1. 參與devops相關係統開發，包括雲資源管理平臺，cmdb平臺、資源申請流程、基礎支撐平臺開發；2. 參與公司業務系統及自動化運維平臺的開發；3. 積累並規範化系統開發的最佳實踐並文檔化；4. 完善並遵照團隊的編碼規範，編寫高質量、結構清晰、易讀、易維護的代碼。",
  "date":"2017-08-08"
}

查看文檔

只需經過GET方式查看，

GET job/type2/1

返回文檔信息：

{
  "_index": "job",
  "_type": "type2",
  "_id": "1",
  "_version": 3,
  "found": true,
  "_source": {
    "title": "Java",
    "salary": 2000,
    "desc": "易維護的代碼",
    "date": "2017-08-08"
  }
}

能夠只查看_source中的部分字段：

GET job/type2/1?_source=title,salary

{
  "_index": "job",
  "_type": "type2",
  "_id": "1",
  "_version": 3,
  "found": true,
  "_source": {
    "title": "Java",
    "salary": 2000
  }
}

修改文檔

一種是經過PUT的全覆蓋方式，舊數據將被刪除，以新的代替。

PUT job/type2/1
{
  "title":"Java",
  "salary":1400,
  "desc":"易維護的代碼",
  "date":"2017-08-08"
}

另外一種是經過POST方式，只對部分字段進行修改。

POST job/type2/1/_update
{
  "doc":{
    "salary":2000
  }
}

刪除文檔

經過DELETE方式可刪除文檔：

DELETE job/type2/1

mget取回多個文檔

可參考：https://www.elastic.co/guide/cn/elasticsearch/guide/current/_Retrieving_Multiple_Documents.html
經過將查詢合併，能夠減小鏈接次數，提升效率。

GET _mget
{
   "docs" : [
      {
         "_index" : "job",
         "_type" :  "type2",
         "_id" :    1
      },
      {
         "_index" : "job",
         "_type" :  "type2",
         "_id" :    2,
         "_source": "salary"
      }
   ]
}

返回兩個文檔：

{
  "docs": [
    {
      "_index": "job",
      "_type": "type2",
      "_id": "1",
      "_version": 3,
      "found": true,
      "_source": {
        "title": "Java",
        "salary": 2000,
        "desc": "易維護的代碼",
        "date": "2017-08-08"
      }
    },
    {
      "_index": "job",
      "_type": "type2",
      "_id": "2",
      "found": false
    }
  ]
}

還可進行簡寫，好比，index和type都相同，查找兩個id，能夠寫做：

GET job/type2/_mget
{
  "ids":["1", "2"]
    
  }
}

bulk批量操做

bulk API 容許在單個步驟中進行屢次 create 、 index 、 update 或 delete 請求。

詳細參考：https://www.elastic.co/guide/cn/elasticsearch/guide/current/bulk.html

bulk批量操做的請求比較特殊，格式爲：

{ action: { metadata }}\n
{ request body }\n
{ action: { metadata }}\n
{ request body }\n ...

通常兩行爲一條請求，第一行說明操做和元數據，第二行是操做數據。不過delete請求只有一行。

POST _bulk
{ "delete": { "_index": "website", "_type": "blog", "_id": "123" }} 
{ "create": { "_index": "website", "_type": "blog", "_id": "123" }}
{ "title":    "My first blog post" }
{ "index":  { "_index": "website", "_type": "blog" }}
{ "title":    "My second blog post" }
{ "update": { "_index": "website", "_type": "blog", "_id": "123", "_retry_on_conflict" : 3} }
{ "doc" : {"title" : "My updated blog post"} }

返回結果會列出每一個請求的處理狀態。

{
   "took": 4,
   "errors": false, 
   "items": [
      {  "delete": {
            "_index":   "website",
            "_type":    "blog",
            "_id":      "123",
            "_version": 2,
            "status":   200,
            "found":    true
      }},
      {  "create": {
            "_index":   "website",
            "_type":    "blog",
            "_id":      "123",
            "_version": 3,
            "status":   201
      }},
      {  "create": {
            "_index":   "website",
            "_type":    "blog",
            "_id":      "EiwfApScQiiy7TIKFxRCTw",
            "_version": 1,
            "status":   201
      }},
      {  "update": {
            "_index":   "website",
            "_type":    "blog",
            "_id":      "123",
            "_version": 4,
            "status":   200
      }}
   ]
}

經過以上操做，能夠將數據以必定的組織方式，寫入到es中。下一篇將總結如何進行搜索和查找。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。