mariadb galera cluster集羣故障恢復時,常常會遇到節點沒法啓動的狀況,啓動服務時報錯:mysql
systemctl start mariadb Job for mariadb.service failed because the control process exited with error code. See "systemctl status mariadb.service" and "journalctl -xe" for details.
通常狀況下,若是集羣中還有存活的節點,那麼離線的節點只須要執行systemctl start mariadb便可從新加入集羣,但若是全部節點均已離線,就會出現這種狀況,此時須要人工肯定啓動順序,先檢查每一個節點的/var/lib/mysql/grastate.dat文件,以測試環境爲例,當前兩節點上的mariadb服務均處於中止狀態,grastate.dat的內容分別爲:
sql
# GALERA saved state version: 2.1 uuid: 44f8dbe5-1271-11eb-8206-1e1a48859dc8 seqno: 157035 safe_to_bootstrap: 0
# GALERA saved state version: 2.1 uuid: 44f8dbe5-1271-11eb-8206-1e1a48859dc8 seqno: 157036 safe_to_bootstrap: 1
能夠看到二者有相同的uuid,但seqno和safe_to_bootstrap不一樣。集羣中seqno最大的節點是優先啓動節點,通常它的safe_to_bootstrap=1。此時以galera_new_cluster方式啓動優先節點,而後再以systemctl start mariadb方式啓動其它節點,集羣就順利恢復了。
bootstrap