galera mariadb集羣恢復策略

時間 2019-12-09

原文原文鏈接

1 galera mariadb
首先MariaDB是一個數據庫，能夠當作是MySQL的一個分支，因爲MySQL被SUN收購，因此MySQL面臨着閉源的風險，當時MySQL之父Widenius並無加入SUN，而是基於MySQL的代碼開發新的分支，命名爲MariaDB，並所有開源。html

Galera是Galera Cluster，是一種爲數據庫設計的新型的、數據不共享的、高度冗餘的高可用方案，galera mariadb就是集成了Galera插件的MariaDB集羣，Galera自己是具備多主特性的，因此galera mariadb不是傳統的主備模式的集羣，而是多主節點架構。node

2 galera mariadb的配置方式
個人一篇OpenStack高可用模塊博客中其中有一段是描述搭建galera mariadb的（2.2.1數據庫服務高可用配置）：OpenStack高可用方案及配置python

3 galera mariadb的一些基本概念
（1）當前節點數據庫狀態mysql

MariaDB [(none)]> show status like 'wsrep_local_state_comment'；
+---------------------------+--------+
| Variable_name | Value |
+---------------------------+--------+
| wsrep_local_state_comment | Synced |
+---------------------------+--------+

狀態查詢表：sql

狀態	狀態說明
Open	節點啓動成功，嘗試鏈接到集羣
Primary	節點已處於集羣中，在新節點加入時，選取donor進行數據庫同步時會產生的狀態
Joiner	節點處於等待接收或正在接收同步文件的狀態
Joined	節點完成數據同步，但還有部分數據不是最新的，在追趕與集羣數據一致的狀態
Synced	節點正常提供服務的狀態，表示當前節點數據狀態與集羣數據狀態是一致的
Donor	表示該節點被選爲Donor節點，正在爲新加進來的節點進行全量數據同步，此時該節點對客戶端不提供服務

（2）Primary Component
在網絡發生故障時，因爲網絡鏈接緣由，集羣可能被分紅好幾個小集羣，但只能有一個集羣能夠繼續進行數據修改，集羣的這部分稱爲Primary Component數據庫

（3）GTID
英文全稱爲Global Transaction ID，由UUID和sequence number偏移量組成，wsrep api中定義的集羣內部全局事務id，一個順序id，用來集羣集羣中狀態改變的惟一標誌及隊列中的偏移量json

（4）SST
英文全稱爲State Snapshot Transfer，即狀態快照遷移：經過從一個節點到另外一個節點遷移完整的數據拷貝（全量拷貝）。當一個新的節點加入到集羣中，新的節點從集羣中已有節點進行數據同步，開始進行狀態快照遷移。
Galera中有兩種不一樣的狀態遷移方法：
<1>邏輯數據遷移：採用mysqldump命令，這是一個阻塞式的方法。
<2>物理數據遷移：該方法採用rsync、rsync_wan、xtrabackup等方法直接在服務器之間拷貝數據，接收的服務器在拷貝完數據後啓動服務。
能夠經過配置文件中修改SST的方式：
wsrep_sst_method=rsyncbootstrap

（5）IST
英文全稱爲Increamental State Transfer，即增量狀態遷移：集羣一個節點經過識別新加入的節點缺失的事務操做，將該操做發送，而並不像SST那樣的全量數據拷貝。最多見狀況就是該節點以前已經存在於該集羣，只是關機重啓了，從新加入該集羣會使用IST進行同步。api

（6）grastate.dat
能夠經過該文件查看到該節點記錄的uuid和seqno，也就是上面說的GTID，當節點正常退出Galera集羣時，會將GTID的值更新到該文件中，以下：服務器

[root@abc3 ~]# cat /var/lib/mysql/grastate.dat
# GALERA saved state
version: 2.1
uuid: 30ae87da-8e8e-11e8-810c-6a8da854119b
seqno: 33557
safe_to_bootstrap: 0

若是該節點數據庫服務正在運行，則seqno的值是-1的

（7）gvwstate.dat
當節點造成或改變Primary Component時，節點會建立或更新該文件，確保節點保留最新Primary Component的狀態，若是節點正常關閉，該文件會被刪除。

4 一些故障場景的恢復
（1）場景1

其中1個節點掛了，通常只須要重啓A節點的服務便可

（2）場景2

全部節點都掛了，重啓服務時不能單純的所有重啓，須要找狀態最新的那個節點啓動，且啓動時須要加上--wsrep-new-cluster參數，該節點啓動後其它節點再正常啓動服務便可。
這裏就涉及到一個關鍵點，那就是怎麼找哪一個是狀態最新的那個節點，第5點介紹查找最新節點的策略。

5 恢復策略和自動恢復腳本
（1）恢復策略
<1>首先判斷當前數據庫集羣中是否有服務在啓動着，若是有則直接啓動服務便可
<2>若是當前全部節點的數據庫服務都掛了，則須要找狀態最新的那個節點讓它攜帶--wsrep-new-cluster參數啓動，啓動起來以後其它節點直接啓動服務便可。
查找最新節點策略：
首先獲取各節點的grastate.dat文件中的seqno值，值最大的那個就是最新的節點；若是全部節點的seqno都是-1，則去比較全部節點的gvwstate.dat文件中的my_uuid和view_id是否相等，相等的那個則做爲第一個啓動節點，第一個啓動節點啓動後，其它節點正常啓動便可；若是依然未找到則須要人工干預來恢復了。
如下是我本身寫的自動恢復腳本：

#!/usr/bin/python2
# -*- coding: utf-8 -*-

import os
import time
import traceback
import logging
import sys

# 初始化日誌對象
logger = logging.getLogger("check-or-recover-galera")
log_file='/var/log/check-or-recover-galera/check-or-recover-galera.log'
if not os.path.exists(log_file):
    os.system('mkdir -p /var/log/check-or-recover-galera/')
    os.system('touch ' + log_file)
    
formatter = logging.Formatter('%(asctime)s (filename)s[line:%(lineno)d] %(levelname)s %(message)s')
file_handler = logging.FileHandler(log_file)
file_handler.setFormatter(formatter)

logger.addHandler(file_handler)
logger.setLevel(logging.DEBUG)

import socket

PORT = 10000
BUFF_SIZE = 10240

def test_connect_ok(ip):
    client_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    client_sock.settimeout(3)
    client_sock.connect((ip, PORT))
    client_sock.close()

# 這個方法要求在要遠程的節點上須要有個進程在監聽PORT端口等待處理命令
def send_request(ip, data, timeout=60):
    test_connect_ok(ip)
    client_sock = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
    client_sock.settimeout(timeout)
    client_sock.connect((ip, PORT))
    client_sock.send(data)
    ret_data = client_sock.recv(BUFF_SIZE)
    client_sock.close()
    return ret_data
    
def remote_send_request(ip, data, timeout=60):
    res_remote = send_request(ip, json.dumps(data), timeout=timeout)
    if res_remote is None or res_remote == '':
        raise Exception('res_remote is null')
    res_remote = json.loads(res_remote)
    if res_remote['ret_state'] != 'success':
        raise Exception('ret_state is not success')
    return res_remote
    
# 默認vmbr0是本地ip
def get_local_ip():
    cmd_out = os.popen('cat /etc/sysconfig/network-scripts/ifcfg-vmbr0 2>/dev/null |grep IPADDR').read()
    if cmd_out and cmd_out != '':
        cmd_out = cmd_out.strip()
        cmd_out = cmd_out.replace('"', '').replace(' ', '')
        tmp = cmd_out.split('=')
        if len(tmp) >= 2:
            ip = tmp[1]
            return ip
    return None
    
# 獲取各節點的seqno值
def get_all_nodes_seqno(node_ips_arr):
    seqno_dict = {}
    data = {'req_type': 'get_seqno'}
    for node_ip in node_ips_arr:
        try:
            res_remote = remote_send_request(node_ip, data)
            seqno_dict[node_ip] = res_remote['seqno']
        except Exception,e:
            seqno_dict[node_ip] = -1
            logger.error(traceback.format_exc())
    return seqno_dict

# 獲取各節點的gvwstate.dat文件的my_uuid和view_id的比對值結果
def get_all_nodes_uv_is_equal(node_ips_arr):
    uv_equal_dict = {}
    data = {'req_type': 'get_uv_equal_value'}
    for node_ip in node_ips_arr:
        try:
            res_remote = remote_send_request(node_ip, data)
            uv_equal_dict[node_ip] = res_remote['equal']
        except Exception,e:
            uv_equal_dict[node_ip] = 0
            logger.error(traceback.format_exc())
    return uv_equal_dict

# 檢查自身mariadb服務是否已經啓動
def check_is_active_now():
    is_active = os.popen('systemctl is-active mysqld_safe 2>/dev/null').read()
    is_active = is_active.strip()
    if is_active and is_active == 'active':
        logger.info('the mariadb is already up')
        return True
    return False
    
# 第一個啓動的節點
def start_mariadb_with_wsrep():
    os.system("sed -i 's/--wsrep-new-cluster//' /usr/lib/systemd/system/mysqld_safe.service")
    os.system("sed -i 's/user=mysql/user=mysql --wsrep-new-cluster/' /usr/lib/systemd/system/mysqld_safe.service")
    os.system("sed -i 's/safe_to_bootstrap:.*/safe_to_bootstrap: 1/' /var/lib/mysql/grastate.dat")
    os.system('systemctl daemon-reload')
    os.system('systemctl start mysqld_safe')
    # 將配置文件恢復回去
    os.system("sed -i 's/--wsrep-new-cluster//' /usr/lib/systemd/system/mysqld_safe.service")
    os.system('systemctl daemon-reload')
    time.sleep(10)
    if check_is_active_now() is True:
        return True
    else:
        logger.error('use option wsrep-new-cluster start mariadb failed')
    return False
    
    
def main():
    while True:
        try:
            time.sleep(10)
            # 先檢測本身的mariadb是否已經本身啓動
            if check_is_active_now() is True:
                time.sleep(60)
                continue
            
            # 這裏應該先檢測下thintaskd服務是否已經啓動，若是還沒啓動則需等待
            is_thintaskd_active = os.popen('/etc/init.d/thintaskd status 2>/dev/null |grep active |grep running').read()
            if not is_thintaskd_active or is_thintaskd_active == '':
                logger.info('wait thintaskd service start')
                time.sleep(5)
            
            # 獲取當前galera的集羣的各節點的ip
            node_ips_info = os.popen("cat /etc/my.cnf.d/mariadb-server.cnf |grep '^wsrep_cluster_address'").read()
            node_ips_str = node_ips_info.split('gcomm://')[1]
            node_ips_str = node_ips_str.strip()
            node_ips_arr = node_ips_str.split(',')
            
            # 檢測其它節點是否已經有在運行着的
            data = {'req_type': 'check_mariadb_service'}
            has_mariadb_service_on = False
            for node_ip in node_ips_arr:
                try:
                    res_remote = remote_send_request(node_ip, data)
                    state = res_remote['state']
                    if state == 'active':
                        has_mariadb_service_on = True
                        # 找到在運行着的節點
                        logger.info('find the running mariadb service node:' + node_ip)
                        # 直接啓動本身服務
                        os.system('systemctl start mysqld_safe')
                        time.sleep(10)
                        if check_is_active_now() is True:
                            time.sleep(60)
                        else:
                            logger.info('start mariadb service error')
                        break
                except Exception,e:
                    logger.error(traceback.format_exc())
                    logger.error('check_mariadb_service for ' + node_ip + ' failed, error:' + e.message)
            if has_mariadb_service_on is True:
                continue
                    
            # 若是全部節點的mariadb都沒在運行，則須要尋找一個節點進行啓動
            seqno_dict = get_all_nodes_seqno(node_ips_arr)
            logger.info('get seqno_dict:%s', seqno_dict)
            # 根據seqno值判斷哪一個節點爲啓動節點
            first_boot_node = None
            max_seqno = -2
            for key in seqno_dict:
                if seqno_dict[key] > max_seqno:
                    max_seqno = seqno_dict[key]
                    first_boot_node = key
            if first_boot_node is not None:
                logger.info('find the first_boot_node by seqno, first_boot_node:' + first_boot_node)
                # 判斷這個啓動節點是否是本身，若是是就啓動，不然等待其它節點啓動起來
                if first_boot_node == get_local_ip():
                    if start_mariadb_with_wsrep() is True:
                        time.sleep(60)
                else:
                    logger.info('wait node ' + first_boot_node + ' start mariadb service')
                    time.sleep(5)
                continue
            else:
                logger.info("all node's seqno is -1")
                
            # 若是全部節點的seqno都是-1則說明多是所有主機非正常中止的，好比斷電等
            # 這時則經過比對gvwstate.dat文件的my_uuid和view_id是否相等來決定從這個節點啓動
            # 當集羣時乾淨狀態中止的時候該文件是被刪除的
            uv_equal_dict = get_all_nodes_uv_is_equal(node_ips_arr)
            # 根據返回的值判斷哪一個是啓動節點，1表示是，0表示否
            for key in uv_equal_dict:
                if uv_equal_dict[key] == 1:
                    first_boot_node = key
                    logger.info('find the first_boot_node by uv_equal_dict, first_boot_node:' + first_boot_node)
                    break
            if first_boot_node is not None:
                # 判斷這個啓動節點是否是本身，若是是就啓動，不然等待其它節點啓動起來
                if first_boot_node == get_local_ip():
                    if start_mariadb_with_wsrep() is True:
                        time.sleep(60)
                    else:
                        logger.info('wait node ' + first_boot_node + ' start mariadb service')
                        time.sleep(5)
                continue
            else:
                logger.info("can not find first_boot_node by gvwstate.dat file")
                
            # 若是通過上述步驟依然找不到啓動節點，須要人工進行干預了，或者能夠隨機挑選個節點進行啓動
            logger.error('can not find first_boot_node, maybe you should ask admin to deal with this problem')
            time.sleep(5)
        except Exception,e:
            logger.error(traceback.format_exc())
            logger.error('error:' + e.message)
        
if __name__ == "__main__":
    sys.exit(main())

如下是自定義的mysqld_safe.service服務的文件，你能夠將它放在/usr/lib/systemd/system/mysqld_safe.service

[Unit]
Description=Thinputer API Server
After=syslog.target network.target

[Service]
Type=notify
NotifyAccess=all
TimeoutStartSec=0
User=root

ExecStartPre=/usr/libexec/mysql-check-socket
ExecStartPre=/usr/libexec/mysql-prepare-db-dir %n
ExecStart=/bin/mysqld_safe --defaults-file=/etc/my.cnf.d/mariadb-server.cnf --user=mysql


[Install]
WantedBy=multi-user.target

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。