接上文:Elasticsearch 5.1.1升級6.7.2小結(1)html
接上文啓動失敗,仔細檢視安裝過程,安裝過程當中的幾個warning引發了個人注意:java
Updating / installing... 1:elasticsearch-0:6.7.2-1 warning: /etc/elasticsearch/elasticsearch.yml created as /etc/elasticsearch/elasticsearch.yml.rpmnew warning: /etc/sysconfig/elasticsearch created as /etc/sysconfig/elasticsearch.rpmnew warning: /usr/lib/systemd/system/elasticsearch.service created as /usr/lib/systemd/system/elasticsearch.service.rpmnew
很顯然,由於配置文件已經存在,elasticsearch安裝過程並無幫你覆蓋,而是將新的配置文件以.rpmnew結尾存在同路徑下。因此,咱們須要手工來合併elasticsearch的配置文件。這個屬於elasticsearch安裝的過程,相信作升級的人都有所瞭解,我就不詳述了。node
配置文件修改好以後,你覺得就能夠啓動了嗎?Naive!反手就是一個報錯:linux
[root@LPT0268 elasticsearch]# service elasticsearch start Starting elasticsearch (via systemctl): [ OK ] [yuliangwang@LPT0268 ~]$ systemctl status elasticsearch.service ● elasticsearch.service - Elasticsearch Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; disabled; vendor preset: disabled) Active: failed (Result: exit-code) since Fri 2019-06-28 16:51:37 CST; 4s ago Docs: http://www.elastic.co Process: 11905 ExecStart=/usr/share/elasticsearch/bin/elasticsearch -p ${PID_DIR}/elasticsearch.pid --quiet -Edefault.path.logs=${LOG_DIR} -Edefault.path.data=${DATA_DIR} -Edefault.path.conf=${CONF_DIR} (code=exited, status=1/FAILURE) Process: 13624 ExecStartPre=/usr/share/elasticsearch/bin/elasticsearch-systemd-pre-exec (code=exited, status=203/EXEC) Main PID: 11905 (code=exited, status=1/FAILURE)
能夠看到錯誤並無消除,而且/var/log/elasticsearch/
目錄下沒有任何日誌。這個問題我搞了好久,突發奇想,我去$ES_HOME/bin
目錄直接執行elasticsearch腳本,結果終於看到錯誤信息了:shell
Jul 01 10:18:06 LPT0268 elasticsearch[1345]: Exception in thread "main" org.elasticsearch.bootstrap.BootstrapException: java.io.EOFException: read past EOF: SimpleFSIndexInput(path="/etc/ela...rch.keystore") Jul 01 10:18:06 LPT0268 elasticsearch[1345]: Likely root cause: java.io.EOFException: read past EOF: SimpleFSIndexInput(path="/etc/elasticsearch/elasticsearch.keystore") Jul 01 10:18:06 LPT0268 elasticsearch[1345]: at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:336) Jul 01 10:18:06 LPT0268 elasticsearch[1345]: at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:54) Jul 01 10:18:06 LPT0268 elasticsearch[1345]: at org.apache.lucene.store.BufferedChecksumIndexInput.readByte(BufferedChecksumIndexInput.java:41) Jul 01 10:18:06 LPT0268 elasticsearch[1345]: at org.apache.lucene.store.DataInput.readInt(DataInput.java:101) Jul 01 10:18:06 LPT0268 elasticsearch[1345]: at org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:194) Jul 01 10:18:06 LPT0268 systemd[1]: elasticsearch.service: main process exited, code=exited, status=1/FAILURE Jul 01 10:18:06 LPT0268 systemd[1]: Unit elasticsearch.service entered failed state. Jul 01 10:18:06 LPT0268 systemd[1]: elasticsearch.service failed. Hint: Some lines were ellipsized, use -l to show in full.
原來是咱們剛纔建立的elasticsearch.keystore這個貨致使的,經過查閱文檔發現,不能手動建立elasticsearch.keystore文件,由於他是ES自帶的密鑰庫相關的文件,必須使用命令來建立,apache
sudo bin/elasticsearch-keystore create
官方說明:bootstrap
https://www.elastic.co/guide/en/elasticsearch/reference/current/secure-settings.htmlapp
經過官方文檔也能解釋,爲何咱們修改配置文件沒有影響報錯。由於elasticsearch自己是先加載elasticsearch.keystore,再去加載配置文件的。elasticsearch
再啓動ES,然而仍是報錯(心裏崩潰,通過一個週末的修復才康復):ide
[yuliangwang@LPT0268 bin]$ systemctl status elasticsearch ● elasticsearch.service - Elasticsearch Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; disabled; vendor preset: disabled) Drop-In: /etc/systemd/system/elasticsearch.service.d └─override.conf Active: failed (Result: exit-code) since Mon 2019-07-01 11:04:32 CST; 20s ago Docs: http://www.elastic.co Process: 4898 ExecStart=/usr/share/elasticsearch/bin/elasticsearch -p ${PID_DIR}/elasticsearch.pid --quiet (code=exited, status=78) Main PID: 4898 (code=exited, status=78)
好消息是,終於咱們有log了,打開/var/log/elasticsearch/
,摘錄有問題的信息:
[2019-07-01T10:54:12,406][WARN ][o.e.b.JNANatives ] [unknown] Unable to lock JVM Memory: error=12, reason=Cannot allocate memory [2019-07-01T10:54:12,409][WARN ][o.e.b.JNANatives ] [unknown] This can result in part of the JVM being swapped out. [2019-07-01T10:54:12,409][WARN ][o.e.b.JNANatives ] [unknown] Increase RLIMIT_MEMLOCK, soft limit: 65536, hard limit: 65536 [2019-07-01T10:54:12,409][WARN ][o.e.b.JNANatives ] [unknown] These can be adjusted by modifying /etc/security/limits.conf, for example: # allow user 'elasticsearch' mlockall elasticsearch soft memlock unlimited elasticsearch hard memlock unlimited [2019-07-01T10:54:12,409][WARN ][o.e.b.JNANatives ] [unknown] If you are logged in interactively, you will have to re-login for the new limits to take effect. ...... [2019-07-01T10:54:21,051][ERROR][o.e.b.Bootstrap ] [G1bC4Hf] node validation exception [1] bootstrap checks failed [1]: memory locking requested for elasticsearch process but memory is not locked
能夠看到鎖定內存有關。這個是由於咱們在配置中添加了關閉swap的配置:
bootstrap.memory_lock: true
關閉swap能夠防止OS將內存也置換到磁盤中,根據官方文檔的說法,能夠防止很慢的GC:
https://www.elastic.co/guide/en/elasticsearch/reference/6.7/setup-configuration-memory.html
根據文檔,來配置/etc/systemd/system/elasticsearch.service.d/override.conf
,設置值爲:
[Service] LimitMEMLOCK=infinity
而後刷新
sudo systemctl daemon-reload
再次啓動後終於成功:
[root@LPT0268 elasticsearch]# sudo service elasticsearch restart Restarting elasticsearch (via systemctl): [ OK ] [root@LPT0268 elasticsearch]# systemctl status elasticsearch ● elasticsearch.service - Elasticsearch Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; disabled; vendor preset: disabled) Drop-In: /etc/systemd/system/elasticsearch.service.d └─override.conf Active: active (running) since Mon 2019-07-01 13:57:51 CST; 14s ago Docs: http://www.elastic.co Main PID: 15294 (java) CGroup: /system.slice/elasticsearch.service ├─15294 /bin/java -Xms1g -Xmx1g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negat... └─15375 /usr/share/elasticsearch/modules/x-pack-ml/platform/linux-x86_64/bin/controller
9200端口返回版本爲6.7.2,:
{ "name": "G1bC4Hf", "cluster_name": "psylocke-fws-oy", "cluster_uuid": "PDI23Ik4TAGx10mMocqGLQ", "version": { "number": "6.7.2", "build_flavor": "default", "build_type": "rpm", "build_hash": "56c6e48", "build_date": "2019-04-29T09:05:50.290371Z", "build_snapshot": false, "lucene_version": "7.7.0", "minimum_wire_compatibility_version": "5.6.0", "minimum_index_compatibility_version": "5.0.0" }, "tagline": "You Know, for Search" }
依次升級集羣內的每臺機器後,啓動集羣,經過GET _cat/health
查看集羣狀態。此時集羣狀態是red,經過GET _cat/shards
能夠看到主分片都已經started了,可是從分片仍是失效狀態,經過如下命令恢復集羣routing
PUT _cluster/settings { "transient": { "cluster.routing.allocation.enable": "all" } }
能夠看到集羣變爲yellow,開始恢復:
1561970784 08:46:24 psylocke-fws-oy yellow 1 1 17 17 0 0 15 0 - 53.1%
等待集羣恢復後,升級完畢!