Elasticsearch學習筆記一

時間 2019-11-11

原文原文鏈接

Elasticsearch

　　Elasticsearch（如下簡稱ES）是一款Java語言開發的基於Lucene的高效全文搜索引擎。它提供了一個分佈式多用戶能力的基於RESTful web接口的全文搜索和分析服務，並做爲Apache許可條款下的開放源碼發佈，是當前流行的企業級搜索引擎。設計用於雲計算中，可以實現實時搜索，能夠搜索日誌或者交易數據，用來分析商業趨勢、蒐集日誌、分析系統瓶頸或者運行發展等等，能夠提供預警功能（持續的查詢分析某個數據，若是超過必定的值，就進行警告），分析商業信息，在百萬級的大數據中輕鬆的定位關鍵信息，等等..html

　　PS：Lucene不是一個完整的全文索引應用，而是一個用Java語言開發的全文索引引擎工具包，它能夠方便的嵌入到各類應用中實現針對應用的全文索引/檢索功能。詳見Lucene：基於Java的全文檢索引擎簡介node

基本概念　

　　想了解ES首先就要弄清楚下面的幾個概念，這樣能夠更加方便的學習ES,也不會對ES產生一些誤解：web

　　近實時（NRT）數據庫

　　ES並非一個標準的數據庫，它不像MongoDB，它側重於對存儲的數據進行搜索。所以要注意到它不是 實時讀寫 的，這也就意味着，剛剛存儲的數據，並不能立刻查詢到。固然這裏還要區分查詢的方式，ES也有數據的查詢以及搜索，這裏的近實時強調的是搜索....json

　　集羣（Cluster）bootstrap

　　在ES中，對用戶來講集羣是很透明的。你只須要指定一個集羣的名字（默認是elasticsearch），啓動的時候，凡是集羣是這個名字的，都會默認加入到一個集羣中。你不須要作任何操做，選舉或者管理都是自動完成的。服務器

　　注意，若是羣集中只有一個節點，那麼它是徹底正常的。此外，您還能夠擁有多個獨立的集羣，每一個集羣都有本身惟一的集羣名稱。網絡

　　節點（Node）app

　　節點是做爲羣集一部分的單個服務器，存儲數據並參與羣集的索引和搜索功能。就像集羣同樣，節點由名稱標識，默認狀況下，該名稱是在啓動時分配給節點的隨機通用惟一標識符（UUID）。能夠將節點配置爲按羣集名稱加入特定羣集。默認狀況下，每一個節點都設置爲加入一個名爲cluster的集羣elasticsearch，這意味着若是您在網絡上啓動了許多節點而且假設它們能夠相互發現 - 它們將自動造成並加入一個名爲的集羣elasticsearch。curl

　　注意，在單個羣集中，您能夠擁有任意數量的節點。此外，若是您的網絡上當前沒有其餘Elasticsearch節點正在運行，則默認狀況下啓動單個節點將造成一個名爲的新單節點集羣elasticsearch。

　　索引（Index）　

　　索引是具備某些相似特徵的文檔集合。索引由名稱標識（必須所有小寫），此名稱用於在對其中的文檔執行索引，搜索，更新和刪除操做時引用索引。在單個羣集中，您能夠根據須要定義任意數量的索引。

　　類型（Type）

　　類型能夠理解成一個索引的邏輯分區，用於標識不一樣的文檔字段信息的集合。可是因爲ES仍是以索引爲粗粒度的單位，所以一個索引下的全部的類型，都存放在一個索引下。這也就致使不一樣類型相同字段名字的字段會存在類型定義衝突的問題。在6.0.0中已棄用。

　　文檔（Document）

　　文檔是存儲數據信息的基本單元，使用json來表示。在索引/類型中，您能夠根據須要存儲任意數量的文檔。

　　注意，儘管文檔實際上駐留在索引中，但實際上必須將文檔編入索引/分配給索引中的類型。

　　分片與副本（Share & Replicas）

　　在ES中，索引會備份成分片，每一個分片是獨立的lucene索引，能夠完成搜索分析存儲等工做。建立索引時，只需定義所需的分片數便可。每一個分片自己都是一個功能齊全且獨立的「索引」，能夠託管在集羣中的任何節點上。在能夠隨時發生故障的網絡/雲環境中，很是有用，強烈建議使用故障轉移機制，以防分片/節點以某種方式脫機或因任何緣由消失。爲此，Elasticsearch容許您將索引的分片的一個或多個副本製做成所謂的副本分片或簡稱副本。

　　總而言之，每一個索引能夠拆分爲多個分片。索引也能夠複製爲零（表示沒有副本）或更屢次。複製後，每一個索引都將具備主分片（從中複製的原始分片）和副本分片（主分片的副本）。能夠在建立索引時爲每一個索引定義分片和副本的數量。建立索引後，您能夠隨時動態更改副本數，但不能在過後更改分片數。默認狀況下，Elasticsearch中的每一個索引都分配了5個主分片和1個副本，這意味着若是羣集中至少有兩個節點，則索引將包含5個主分片和另外5個副本分片（1個完整副本），總計爲每一個索引10個分片。

安裝

　　ES在開發整合時至少須要Java 8，因此在安裝以前ES以前，須要確保安裝了Java 8 並配置好環境變量。這裏我很少介紹，能夠查看我往期的博客Linux服務部署--Java（一）

　　本文介紹的是單機版，Linux環境經常使用的wget下載elasticsearch-6.0.1.tar.gz，其餘環境或方式能夠參考官方文檔。

wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.0.1.tar.gz

解壓

tar -xvf elasticsearch-6.0.1.tar.gz

啓動

cd  /elasticsearch-6.0.1
./bin/elasticsearch

若是這時報錯"max virtual memory areas vm.maxmapcount [65530] is too low"，運行下面的命令。

sudo sysctl -w vm.max_map_count=262144

修改集羣和節點名稱

./elasticsearch -Ecluster.name=my_cluster -Enode.name=my_node

默認狀況下，Elasticsearch使用port 9200來提供對其REST API的訪問。啓動完成後，打開另外一個命令行窗口，請求該端口，會獲得說明信息。

curl localhost:9200

{
  "name" : "my_node",
  "cluster_name" : "my_cluster",
  "cluster_uuid" : "tf9250XhQ6ee4h7YI11anA",
  "version" : {
    "number" : "6.0.1",
    "build_hash" : "19c13d0",
    "build_date" : "2018-10-24T20:44:24.823Z",
    "build_snapshot" : false,
    "lucene_version" : "6.6.0"
  },
  "tagline" : "You Know, for Search"
}

PS:默認狀況下，ES 只容許本機訪問，若是須要遠程訪問，能夠修改 Elastic 安裝目錄的config/elasticsearch.yml文件，去掉network.host的註釋，將它的值改爲0.0.0.0,讓任何人均可以訪問(線上環境別這麼設置哦),而後從新啓動 ES。固然，也能夠在這裏修改集羣和節點名稱。

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster 集羣名稱:
#
cluster.name: my_cluster
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node 節點名稱:
#
node.name: my_node
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma) 數據路徑:
#
#path.data: /path/to/data　　　　　　
#
# Path to log files 日誌路徑:
#
#path.logs: /path/to/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6)綁定的Ip地址:
#
network.host: 0.0.0.0
#
# Set a custom port for HTTP 端口:
#
http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started 設置集羣:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.zen.ping.unicast.hosts: ["host1", "host2"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1) 防止腦裂:
#
#discovery.zen.minimum_master_nodes: 3
#
# For more information, consult the zen discovery module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true