MHA:Master HA,對主節點進行監控,可實現自動故障轉 移至其它從節點;經過提高某一從節點爲新的主節點,基於主 從複製實現,還須要客戶端配合實現,目前MHA主要支持一 主多從的架構,要搭建MHA,要求一個複製集羣中必須最少有 三臺數據庫服務器,一主二從,即一臺充當master,一臺充 當備用master,另一臺充當從庫,若是財大氣粗,也能夠用一臺專門的服務器來當MHA監控管理服務器node
MHA工做原理mysql
相關軟件包sql
下面是實驗環境數據庫
首先去官網下載MHA軟件包,注意:須要×××vim
vim /etc/my.cnf 在[mysqld]語句塊中添加以下配置 [mysqld] log-bin #開啓二進制日誌 server_id=1 #設置全部節點中惟一的id編號 innodb_file_per_table #開啓數據於表結構分離,兩個文件存放 skip_name_resolve=1 #跳過DNS解析
yum install mha4mysql-node-0.54-1.el5.noarch
systemctl start mariadb systemctl enable mariadb
"mysql_secure_installation" 第一項問你:輸入root密碼 回車便可,由於沒有 第二項問你:須要設置root密碼麼, 第三項問你:須要刪除空帳號用戶麼, 第四項問你:禁止root用戶遠程登入麼, 第五項問你:須要刪除test測試數據庫麼, 第六項問你:如今從新加載權限表嗎 ,
1,建立擁有複製權限的用戶帳號 GRANT REPLICATION SLAVE ON *.* TO 'repluser'@'HOST' IDENTIFIED BY 'replpass'; 命令解析: 'repluser'@'HOST' :設置用戶名便可登入的主機ip或網段,網段用%表示 例如10.0.0.% IDENTIFIED BY:設置密碼 *.* :表示全部數據庫,全部表 GRANT REPLCATION SLAVE:就是容許該用戶複製數據 該命令做用就是受權repluser能拷貝數據庫的全部內容 2,建立MHA管理用戶 GRANT REPLICATION SLAVE ON *.* TO 'mha'@'HOST' IDENTIFIED BY 'replpass'; 命令解析: 'mha'@'HOST' :設置用戶名便可登入的主機ip或網段,網段用%表示 例如10.0.0.% IDENTIFIED BY:設置密碼 *.* :表示全部數據庫,全部表 GRANT ALL :就表示該用戶擁有全部權限的意思
vim /etc/my.cnf server_id=2 read_only #普通用戶只有讀權限,對超級用戶沒有限制 log-bin relay_log_purge=0 #不自動清理日誌 skip_name_resolve=1 innodb_file_per_table "log-bin" "注意:正常來說MySQL主從複製,從服務器是不須要啓用二進制日誌的, 這裏爲何從服務器要啓用二進制日誌呢?由於基於MHA管理,當原master服務器down機了, MHA會自動提高一臺數據變化不是很大的從服務器爲新的主,由於主必須開啓二進制日誌因此必須添加"
yum install mha4mysql-node-0.54-1.el5.noarch
systemctl start mariadb systemctl enable mariadb
"mysql_secure_installation" 第一項問你:輸入root密碼 回車便可,由於沒有 第二項問你:須要設置root密碼麼, 第三項問你:須要刪除空帳號用戶麼, 第四項問你:禁止root用戶遠程登入麼, 第五項問你:須要刪除test測試數據庫麼, 第六項問你:如今從新加載權限表嗎 ,
1,使用有複製權限的用戶帳號鏈接至主服務器 CHANGE MASTER TO MASTER_HOST='master_host', #指定master主機IP MASTER_USER='repluser', #指定master被受權的用戶名 MASTER_PASSWORD='replpass', #指定被master受權的用戶密碼 MASTER_LOG_FILE='mysql-bin.xxxxx', #指定master服務器的哪一個二進制日誌開始複製 MASTER_LOG_POS=#; #二進制日誌位置,能夠在master服務器上執行該命令查看,show master logs; 2,啓動複製線程IO_THREAD和SQL_THREAD START SLAVE;
MariaDB [(none)]> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.68.17 Master_User: repluser Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mariadb-bin.000001 Read_Master_Log_Pos: 557 Relay_Log_File: mariadb-relay-bin.000002 Relay_Log_Pos: 843 Relay_Master_Log_File: mariadb-bin.000001 Slave_IO_Running: Yes "重點關注若是是NO表示線程沒起來" Slave_SQL_Running: Yes "重點關注 若是是NO表示該線程沒起來" Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 557 Relay_Log_Space: 1139 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 "該項表示同步時間 0表示即便同步" Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 1
1,在M1上建立數據庫 MariaDB [(none)]> create database a1; Query OK, 1 row affected (0.00 sec) M1 [(none)]> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | a1 | | mysql | | performance_schema | | test | +--------------------+ 5 rows in set (0.00 sec) 2,在S1,S2上查看同步狀況。 S1 [(none)]> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | a1 | | mysql | | performance_schema | | test | +--------------------+ 5 rows in set (0.01 sec) S2 [(none)]> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | a1 | | mysql | | performance_schema | | test | +--------------------+ 5 rows in set (0.01 sec)
①,在任意一臺主機上生成一對公私鑰,例如,在M1上生成centos
ssh-keygen 這是候會在本機生成一個.ssh/目錄 ls .ssh/ id_rsa id_rsa.pub
②,而後再把整個.ssh/目錄複製給本機M1安全
ssh-copy-id M1_IP 這時候本機.ssh目錄還會多兩個文件 ls .ssh/ authorized_keys id_rsa id_rsa.pub known_hosts
③,將整個.ssh目錄複製到其餘全部主機服務器
ssh-copy-id MHA_IP ssh-copy-id S1_IP ssh-copy-id S2_IP
④,ssh鏈接測試,不用輸入密碼錶示成功架構
[root@MHA ~]# ssh 192.168.68.7 Last login: Fri Mar 30 13:21:52 2018 from 192.168.68.1 [root@master ~]#
原理:全部主機都是用的一個公鑰,其實這些主機會覺得就是一臺主機,因此這個.ssh文件千萬不要泄露了,若是泄露了就意味着能夠不須要密碼登入任意一臺主機,這就是爲何只能在局域網中使用的緣由,若是,想在公網使用的話,就須要在每一臺主機上生成公私鑰,在每一臺主機都互相複製一遍,或者是都複製到一臺主機上,再把這臺主機上的.ssh目錄下的authorized_keys 這個文件複製到其餘的全部主機上app
注意:須要配置epel源,用阿里雲的epel源就能夠
yum install mha4mysql-manager-0.55-1.el5.noarch.rpm mha4mysql-node-0.54-1.el5.noarch.rpm
mkdir -pv /etc/mha/
vim /etc/mha/app1.cnf #添加以下項目 [server default] #默認規則 user=mha #mhauser(mysql內配置的用來管理數據庫的用戶) password=123123 #密碼 manager_workdir=/data/mha/test/ #工做目錄 manager_log=/data/mha/test/manager.log #日誌文件 remote_workdir=/data/mha/test/ #節點的工做目錄 ssh_user=root #ssh用戶 repl_user=reluser #主從複製用戶(mysql內配置的用來複制數據庫的用戶) repl_password=123123 #密碼 ping_interval=1 #心跳檢測間隔(秒) [server1] #節點名稱 hostname=172.18.30.1 #節點地址 candidate_master=1 #表示容許提高爲主服務器 [server2] #節點名稱 hostname=172.18.30.2 #節點地址 candidate_master=1 #表示容許提高爲主服務器 [server3] #節點名稱 hostname=172.18.30.4 #節點地址
[root@test ~]# masterha_check_ssh --conf=/etc/mha/app1.cnf Sat Mar 31 19:18:47 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Sat Mar 31 19:18:47 2018 - [info] Reading application default configuration from /etc/mha/app1.cnf.. Sat Mar 31 19:18:47 2018 - [info] Reading server configuration from /etc/mha/app1.cnf.. Sat Mar 31 19:18:47 2018 - [info] Starting SSH connection tests.. Sat Mar 31 19:18:49 2018 - [debug] Sat Mar 31 19:18:47 2018 - [debug] Connecting via SSH from root@172.18.30.107(172.18.30.107:22) to root@172.18.30.108(172.18.30.108:22).. Sat Mar 31 19:18:48 2018 - [debug] ok. Sat Mar 31 19:18:48 2018 - [debug] Connecting via SSH from root@172.18.30.107(172.18.30.107:22) to root@172.18.30.109(172.18.30.109:22).. Sat Mar 31 19:18:49 2018 - [debug] ok. Sat Mar 31 19:18:50 2018 - [debug] Sat Mar 31 19:18:47 2018 - [debug] Connecting via SSH from root@172.18.30.108(172.18.30.108:22) to root@172.18.30.107(172.18.30.107:22).. Sat Mar 31 19:18:49 2018 - [debug] ok. Sat Mar 31 19:18:49 2018 - [debug] Connecting via SSH from root@172.18.30.108(172.18.30.108:22) to root@172.18.30.109(172.18.30.109:22).. Sat Mar 31 19:18:49 2018 - [debug] ok. Sat Mar 31 19:18:50 2018 - [debug] Sat Mar 31 19:18:48 2018 - [debug] Connecting via SSH from root@172.18.30.109(172.18.30.109:22) to root@172.18.30.107(172.18.30.107:22).. Sat Mar 31 19:18:49 2018 - [debug] ok. Sat Mar 31 19:18:49 2018 - [debug] Connecting via SSH from root@172.18.30.109(172.18.30.109:22) to root@172.18.30.108(172.18.30.108:22).. Sat Mar 31 19:18:50 2018 - [debug] ok. Sat Mar 31 19:18:50 2018 - [info] All SSH connection tests passed successfully.
[root@test ~]# masterha_check_repl --conf=/etc/mha/app1.cnf Sat Mar 31 19:19:26 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Sat Mar 31 19:19:26 2018 - [info] Reading application default configuration from /etc/mha/app1.cnf.. Sat Mar 31 19:19:26 2018 - [info] Reading server configuration from /etc/mha/app1.cnf.. Sat Mar 31 19:19:26 2018 - [info] MHA::MasterMonitor version 0.56. Sat Mar 31 19:19:28 2018 - [info] GTID failover mode = 0 Sat Mar 31 19:19:28 2018 - [info] Dead Servers: Sat Mar 31 19:19:28 2018 - [info] Alive Servers: Sat Mar 31 19:19:28 2018 - [info] 172.18.30.107(172.18.30.107:3306) Sat Mar 31 19:19:28 2018 - [info] 172.18.30.108(172.18.30.108:3306) Sat Mar 31 19:19:28 2018 - [info] 172.18.30.109(172.18.30.109:3306) Sat Mar 31 19:19:28 2018 - [info] Alive Slaves: Sat Mar 31 19:19:28 2018 - [info] 172.18.30.108(172.18.30.108:3306) Version=5.5.56-MariaDB (oldest major version between slaves) log-bin:enabled Sat Mar 31 19:19:28 2018 - [info] Replicating from 172.18.30.107(172.18.30.107:3306) Sat Mar 31 19:19:28 2018 - [info] Primary candidate for the new Master (candidate_master is set) Sat Mar 31 19:19:28 2018 - [info] 172.18.30.109(172.18.30.109:3306) Version=5.5.56-MariaDB (oldest major version between slaves) log-bin:enabled Sat Mar 31 19:19:28 2018 - [info] Replicating from 172.18.30.107(172.18.30.107:3306) Sat Mar 31 19:19:28 2018 - [info] Primary candidate for the new Master (candidate_master is set) Sat Mar 31 19:19:28 2018 - [info] Current Alive Master: 172.18.30.107(172.18.30.107:3306) Sat Mar 31 19:19:28 2018 - [info] Checking slave configurations.. Sat Mar 31 19:19:28 2018 - [info] read_only=1 is not set on slave 172.18.30.108(172.18.30.108:3306). Sat Mar 31 19:19:28 2018 - [warning] relay_log_purge=0 is not set on slave 172.18.30.108(172.18.30.108:3306). Sat Mar 31 19:19:28 2018 - [info] read_only=1 is not set on slave 172.18.30.109(172.18.30.109:3306). Sat Mar 31 19:19:28 2018 - [warning] relay_log_purge=0 is not set on slave 172.18.30.109(172.18.30.109:3306). Sat Mar 31 19:19:28 2018 - [info] Checking replication filtering settings.. Sat Mar 31 19:19:28 2018 - [info] binlog_do_db= , binlog_ignore_db= Sat Mar 31 19:19:28 2018 - [info] Replication filtering check ok. Sat Mar 31 19:19:28 2018 - [info] GTID (with auto-pos) is not supported Sat Mar 31 19:19:28 2018 - [info] Starting SSH connection tests.. Sat Mar 31 19:19:31 2018 - [info] All SSH connection tests passed successfully. Sat Mar 31 19:19:31 2018 - [info] Checking MHA Node version.. Sat Mar 31 19:19:32 2018 - [info] Version check ok. Sat Mar 31 19:19:32 2018 - [info] Checking SSH publickey authentication settings on the current master.. Sat Mar 31 19:19:33 2018 - [info] HealthCheck: SSH to 172.18.30.107 is reachable. Sat Mar 31 19:19:34 2018 - [info] Master MHA Node version is 0.56. Sat Mar 31 19:19:34 2018 - [info] Checking recovery script configurations on 172.18.30.107(172.18.30.107:3306).. Sat Mar 31 19:19:34 2018 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/data/mastermha/app1//save_binary_logs_test --manager_version=0.56 --start_file=mariadb-bin.000001 Sat Mar 31 19:19:34 2018 - [info] Connecting to root@172.18.30.107(172.18.30.107:22).. Creating /data/mastermha/app1 if not exists.. ok. Checking output directory is accessible or not.. ok. Binlog found at /var/lib/mysql, up to mariadb-bin.000001 Sat Mar 31 19:19:34 2018 - [info] Binlog setting check done. Sat Mar 31 19:19:34 2018 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers.. Sat Mar 31 19:19:34 2018 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=172.18.30.108 --slave_ip=172.18.30.108 --slave_port=3306 --workdir=/data/mastermha/app1/ --target_version=5.5.56-MariaDB --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx Sat Mar 31 19:19:34 2018 - [info] Connecting to root@172.18.30.108(172.18.30.108:22).. Checking slave recovery environment settings.. Opening /var/lib/mysql/relay-log.info ... ok. Relay log found at /var/lib/mysql, up to mariadb-relay-bin.000002 Temporary relay log file is /var/lib/mysql/mariadb-relay-bin.000002 Testing mysql connection and privileges.. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Sat Mar 31 19:19:35 2018 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=172.18.30.109 --slave_ip=172.18.30.109 --slave_port=3306 --workdir=/data/mastermha/app1/ --target_version=5.5.56-MariaDB --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx Sat Mar 31 19:19:35 2018 - [info] Connecting to root@172.18.30.109(172.18.30.109:22).. Checking slave recovery environment settings.. Opening /var/lib/mysql/relay-log.info ... ok. Relay log found at /var/lib/mysql, up to mariadb-relay-bin.000002 Temporary relay log file is /var/lib/mysql/mariadb-relay-bin.000002 Testing mysql connection and privileges.. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Sat Mar 31 19:19:36 2018 - [info] Slaves settings check done. Sat Mar 31 19:19:36 2018 - [info] 172.18.30.107(172.18.30.107:3306) (current master) +--172.18.30.108(172.18.30.108:3306) +--172.18.30.109(172.18.30.109:3306) Sat Mar 31 19:19:36 2018 - [info] Checking replication health on 172.18.30.108.. Sat Mar 31 19:19:36 2018 - [info] ok. Sat Mar 31 19:19:36 2018 - [info] Checking replication health on 172.18.30.109.. Sat Mar 31 19:19:36 2018 - [info] ok. Sat Mar 31 19:19:36 2018 - [warning] master_ip_failover_script is not defined. Sat Mar 31 19:19:36 2018 - [warning] shutdown_script is not defined. Sat Mar 31 19:19:36 2018 - [info] Got exit code 0 (Not master dead). MySQL Replication Health is OK.
[root@test ~]# masterha_manager --conf=/etc/mha/app1.cnf Sat Mar 31 19:23:26 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Sat Mar 31 19:23:26 2018 - [info] Reading application default configuration from /etc/mha/app1.cnf.. Sat Mar 31 19:23:26 2018 - [info] Reading server configuration from /etc/mha/app1.cnf..
咱們手動宕掉M1主機,在理論上來講MHA會將S1機器該接替Master服務器,並自動踢除M1主機,最終S1爲master,S2爲slave服務器。
1,宕掉M1上的mariadb服務
systemctl stop mariadb
2,登入S1上查看情況
MariaDB [(none)]> show master logs; +--------------------+-----------+ | Log_name | File_size | +--------------------+-----------+ | mariadb-bin.000001 | 715 | | mariadb-bin.000002 | 245 | +--------------------+-----------+ 2 rows in set (0.00 sec) MariaDB [(none)]> show slave status; Empty set (0.00 sec) MariaDB [(none)]> show variables like '%read_only%'; +------------------+-------+ | Variable_name | Value | +------------------+-------+ | innodb_read_only | OFF | | read_only | OFF | | tx_read_only | OFF | +------------------+-------+ 3 rows in set (0.00 sec) "咱們能夠看出,MHA已經完成了切換,以前在B主機設置的read_only選項也已經關閉了。"
3,登入S2上查看狀況
MariaDB [(none)]> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 172.18.30.108 Master_User: repluser Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mariadb-bin.000002 Read_Master_Log_Pos: 245 Relay_Log_File: mariadb-relay-bin.000002 Relay_Log_Pos: 531 Relay_Master_Log_File: mariadb-bin.000002 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 245 Relay_Log_Space: 827 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 2 1 row in set (0.00 sec) "能夠看出,S2主機已經將同步服務器改成了S1主機 測試完成。"
錯誤1
[root@test ~]# masterha_check_ssh --conf=/etc/mha/app1.cnf Sat Mar 31 20:14:26 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Sat Mar 31 20:14:26 2018 - [info] Reading application default configuration from /etc/mha/app1.cnf.. Sat Mar 31 20:14:26 2018 - [info] Reading server configuration from /etc/mha/app1.cnf.. Sat Mar 31 20:14:26 2018 - [error][/usr/share/perl5/vendor_perl/MHA/Config.pm, ln383] Block name "server" is invalid. Block name must be "server default" or start from "server"(+ non-whitespace characters). Block name "server" is invalid. Block name must be "server default" or start from "server"(+ non-whitespace characters). at /usr/share/perl5/vendor_perl/MHA/SSHCheck.pm line 148. "注意:這個問題出在本身建立的MHA配置文件中的被管理的節點沒有排序[server]下面是錯誤文件" [server default] user=mha password=centos manager_workdir=/data/mastermha/app1/ manager_log=/data/mastermha/app1/manager.log remote_workdir=/data/mastermha/app1/ ssh_user=root repl_user=repluser repl_password=centos ping_interval=1 "[server]" <- hostname=172.18.30.107 candidate_master=1 "[server]" <- hostname=172.18.30.108 candidate_master=1 "[server]" <- hostname=172.18.30.109 candidate_master=1 "正確配置,仔細查看有什麼配置不一樣" [server default] user=mha password=centos manager_workdir=/data/mastermha/app1/ manager_log=/data/mastermha/app1/manager.log remote_workdir=/data/mastermha/app1/ ssh_user=root repl_user=repluser repl_password=centos ping_interval=1 "[server1]" hostname=172.18.30.107 candidate_master=1 "[server2]" hostname=172.18.30.108 candidate_master=1 "[server3]" hostname=172.18.30.109 candidate_master=1
錯誤2
若是出現下面的問題說明是有一個庫找不到,須要在每個節點都建立一個軟鏈接 [root@centos-MHA ~]# masterha_check_ssh --conf=/etc/mha/app1.cnf Can't locate MHA/SSHCheck.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /usr/bin/masterha_check_ssh line 25. BEGIN failed--compilation aborted at /usr/bin/masterha_check_ssh line 25. [root@centos-MHA ~]# ln -s /usr/lib/perl5/vendor_perl/MHA/ /usr/lib64/perl5/vendor_perl/ [root@centos-M1 ~]# ln -s /usr/lib/perl5/vendor_perl/MHA/ /usr/lib64/perl5/vendor_perl/ [root@centos-S1 ~]# ln -s /usr/lib/perl5/vendor_perl/MHA/ /usr/lib64/perl5/vendor_perl/ [root@centos-S2 ~]# ln -s /usr/lib/perl5/vendor_perl/MHA/ /usr/lib64/perl5/vendor_perl/