(二)Basic Concepts 基本概念

Basic Concepts

There are a few concepts that are core to Elasticsearch. Understanding these concepts from the outset will tremendously help ease the learning process.html

有一些概念是Elasticsearch的核心。從一開始就理解這些概念將極大地幫助簡化學習過程。node

Near Realtime (NRT)

Elasticsearch is a near real time search platform. What this means is there is a slight latency (normally one second) from the time you index a document until the time it becomes searchable.git

Elasticsearch是一個近乎實時的搜索平臺。這意味着從索引文檔到可搜索文檔的時間有一點延遲(一般爲一秒)。

Cluster

A cluster is a collection of one or more nodes (servers) that together holds your entire data and provides federated indexing and search capabilities across all nodes. A cluster is identified by a unique name which by default is "elasticsearch". This name is important because a node can only be part of a cluster if the node is set up to join the cluster by its name.github

集羣是一個或多個節點(服務器)的集合,它們共同保存您的整個數據,並提供跨全部節點的聯合索引和搜索功能。羣集由惟一名稱標識,默認狀況下爲「elasticsearch」。此名稱很重要,由於若是節點設置爲按名稱加入羣集,則該節點只能是羣集的一部分。
 
Make sure that you don’t reuse the same cluster names in different environments, otherwise you might end up with nodes joining the wrong cluster. For instance you could use  logging-devlogging-stage, and  logging-prod for the development, staging, and production clusters.
確保不要在不一樣的環境中重用相同的羣集名稱,不然最終會致使節點加入錯誤的羣集。例如,您能夠將logging-dev,logging-stage和logging-prod用於開發,登臺和生產集羣。
 
Note that it is valid and perfectly fine to have a cluster with only a single node in it. Furthermore, you may also have multiple independent clusters each with its own unique cluster name.
請注意,若是羣集中只有一個節點,那麼它是徹底正常的。此外,您還能夠擁有多個獨立的集羣,每一個集羣都有本身惟一的集羣名稱。

Node

A node is a single server that is part of your cluster, stores your data, and participates in the cluster’s indexing and search capabilities. Just like a cluster, a node is identified by a name which by default is a random Universally Unique IDentifier (UUID) that is assigned to the node at startup. You can define any node name you want if you do not want the default. This name is important for administration purposes where you want to identify which servers in your network correspond to which nodes in your Elasticsearch cluster.express

節點是做爲羣集一部分的單個服務器,存儲數據並參與羣集的索引和搜索功能。就像集羣同樣,節點由名稱標識,默認狀況下,該名稱是在啓動時分配給節點的隨機通用惟一標識符(UUID)。若是不須要默認值,能夠定義所需的任何節點名稱。此名稱對於管理目的很是重要,您能夠在其中識別網絡中哪些服務器與Elasticsearch集羣中的哪些節點相對應。
 
A node can be configured to join a specific cluster by the cluster name. By default, each node is set up to join a cluster named  elasticsearch which means that if you start up a number of nodes on your network and—assuming they can discover each other—they will all automatically form and join a single cluster named  elasticsearch.
能夠將節點配置爲按羣集名稱加入特定羣集。默認狀況下,每一個節點都設置爲加入名爲elasticsearch的集羣,這意味着若是您在網絡上啓動了許多節點而且假設它們能夠相互發現 - 它們將自動造成並加入名爲elasticsearch的單個集羣。
 
In a single cluster, you can have as many nodes as you want. Furthermore, if there are no other Elasticsearch nodes currently running on your network, starting a single node will by default form a new single-node cluster named  elasticsearch.
在單個羣集中,您能夠擁有任意數量的節點。此外,若是您的網絡上當前沒有其餘Elasticsearch節點正在運行,則默認狀況下,啓動單個節點將造成名爲elasticsearch的新單節點集羣。

Index

An index is a collection of documents that have somewhat similar characteristics. For example, you can have an index for customer data, another index for a product catalog, and yet another index for order data. An index is identified by a name (that must be all lowercase) and this name is used to refer to the index when performing indexing, search, update, and delete operations against the documents in it.apache

In a single cluster, you can define as many indexes as you want.json

索引是具備某些相似特徵的文檔集合。例如,您能夠擁有客戶數據的索引,產品目錄的另外一個索引以及訂單數據的另外一個索引。索引由名稱標識(必須所有小寫),此名稱用於在對其中的文檔執行索引,搜索,更新和刪除操做時引用索引。
在單個羣集中,您能夠根據須要定義任意數量的索引。
 

Type

Warning

Deprecated in 6.0.0.  See Removal of mapping types

A type used to be a logical category/partition of your index to allow you to store different types of documents in the same index, eg one type for users, another type for blog posts. It is no longer possible to create multiple types in an index, and the whole concept of types will be removed in a later version. See Removal of mapping types for more.服務器

一種類型,曾經是索引的邏輯類別/分區,容許您在同一索引中存儲不一樣類型的文檔,例如,一種類型用於用戶,另外一種類型用於博客帖子。再也不可能在索引中建立多個類型,而且將在更高版本中刪除類型的整個概念。請參閱刪除映射類型以獲取更多信息。

Document

A document is a basic unit of information that can be indexed. For example, you can have a document for a single customer, another document for a single product, and yet another for a single order. This document is expressed in JSON (JavaScript Object Notation) which is a ubiquitous internet data interchange format.網絡

文檔是能夠編制索引的基本信息單元。例如,您能夠爲單個客戶提供文檔,爲單個產品提供另外一個文檔,爲單個訂單提供另外一個文檔。該文檔以JSON(JavaScript Object Notation)表示,JSON是一種廣泛存在的互聯網數據交換格式。
 
Within an index/type, you can store as many documents as you want. Note that although a document physically resides in an index, a document actually must be indexed/assigned to a type inside an index.
在索引/類型中,您能夠根據須要存儲任意數量的文檔。請注意,儘管文檔實際上駐留在索引中,但實際上必須將文檔編入索引/分配給索引中的類型。

Shards & Replicas

An index can potentially store a large amount of data that can exceed the hardware limits of a single node. For example, a single index of a billion documents taking up 1TB of disk space may not fit on the disk of a single node or may be too slow to serve search requests from a single node alone.app

索引可能存儲大量可能超過單個節點的硬件限制的數據。例如,佔用1TB磁盤空間的十億個文檔的單個索引可能不適合單個節點的磁盤,或者可能太慢而沒法單獨從單個節點提供搜索請求。
 
To solve this problem, Elasticsearch provides the ability to subdivide your index into multiple pieces called shards. When you create an index, you can simply define the number of shards that you want. Each shard is in itself a fully-functional and independent "index" that can be hosted on any node in the cluster.
爲了解決這個問題,Elasticsearch提供了將索引細分爲多個稱爲分片的功能。建立索引時,只需定義所需的分片數便可。每一個分片自己都是一個功能齊全且獨立的「索引」,能夠託管在集羣中的任何節點上。
 
Sharding is important for two primary reasons:
分片很重要,主要有兩個緣由:
      一、It allows you to horizontally split/scale your content volume
       它容許您水平拆分/縮放內容量
   二、It allows you to distribute and parallelize operations across shards (potentially on multiple nodes) thus increasing performance/throughput
      它容許您跨分片(可能在多個節點上)分佈和並行化操做,從而提升性能/吞吐量
 
The mechanics of how a shard is distributed and also how its documents are aggregated back into search requests are completely managed by Elasticsearch and is transparent to you as the user.
分片的分佈方式以及如何將其文檔聚合回搜索請求的機制徹底由Elasticsearch管理,對用戶而言是透明的。
 
In a network/cloud environment where failures can be expected anytime, it is very useful and highly recommended to have a failover mechanism in case a shard/node somehow goes offline or disappears for whatever reason. To this end, Elasticsearch allows you to make one or more copies of your index’s shards into what are called replica shards, or replicas for short.
在能夠隨時發生故障的網絡/雲環境中,很是有用,強烈建議使用故障轉移機制,以防分片/節點以某種方式脫機或因任何緣由消失。爲此,Elasticsearch容許您將索引的分片的一個或多個副本製做成所謂的副本分片或簡稱副本。
 
Replication is important for two primary reasons:
複製很重要,主要有兩個緣由:
    一、It provides high availability in case a shard/node fails. For this reason, it is important to note that a replica shard is never allocated on the same node as the original/primary shard that it was copied from.
    它在碎片/節點發生故障時提供高可用性。所以,請務必注意,副本分片永遠不會在與從中複製的原始/主分片相同的節點上分配。
    二、It allows you to scale out your search volume/throughput since searches can be executed on all replicas in parallel.
  它容許您擴展搜索量/吞吐量,由於能夠在全部副本上並行執行搜索。
 
To summarize, each index can be split into multiple shards. An index can also be replicated zero (meaning no replicas) or more times. Once replicated, each index will have primary shards (the original shards that were replicated from) and replica shards (the copies of the primary shards).
總而言之,每一個索引能夠拆分爲多個分片。索引也能夠複製爲零(表示沒有副本)或更屢次。複製後,每一個索引都將具備主分片(從中複製的原始分片)和副本分片(主分片的副本)。
 
The number of shards and replicas can be defined per index at the time the index is created. After the index is created, you may also change the number of replicas dynamically anytime. You can change the number of shards for an existing index using the  _shrink and  _split APIs, however this is not a trivial task and pre-planning for the correct number of shards is the optimal approach.
能夠在建立索引時爲每一個索引定義分片和副本的數量。建立索引後,您還能夠隨時動態更改副本數。您可使用_shrink和_split API更改現有索引的分片數,但這不是一項簡單的任務,預先計劃正確數量的分片是最佳方法。
 
By default, each index in Elasticsearch is allocated 5 primary shards and 1 replica which means that if you have at least two nodes in your cluster, your index will have 5 primary shards and another 5 replica shards (1 complete replica) for a total of 10 shards per index.
默認狀況下,Elasticsearch中的每一個索引都分配了5個主分片和1個副本,這意味着若是羣集中至少有兩個節點,則索引將包含5個主分片和另外5個副本分片(1個完整副本),總計爲每一個索引10個分片。
 
Note

Each Elasticsearch shard is a Lucene index. There is a maximum number of documents you can have in a single Lucene index. As of LUCENE-5843, the limit is 2,147,483,519 (= Integer.MAX_VALUE - 128) documents. You can monitor shard sizes using the _cat/shards API.

每一個Elasticsearch分片都是Lucene索引。單個Lucene索引中能夠包含最大數量的文檔。自LUCENE-5843起,限制爲2,147,483,519(= Integer.MAX_VALUE - 128)個文件。您可使用_cat / shards API監視分片大小。
 
With that out of the way, let’s get started with the fun part…
有了這個,讓咱們開始有趣的部分......
相關文章
相關標籤/搜索