幾個月沒有更新博客了,已經長草了,特地來除草。本次主要分享如何利用consul來實現redis以及mysql的高可用。之前的公司mysql是單機單實例,高可用MHA加vip就能搞定,新公司mysql是單機多實例,那麼顯然這個方案不適用,後來也實現了故障切換調用dns api來修改域名記錄,可是仍是沒有利用consul來實現高可用方便,後面會說明優點。redis單機多實例最正常不過了,那麼redis單機多實例高可用也不太好作,固然也能夠利用sentinel來實現,當failover之後調用腳本調用dns api修改域名解析也是能夠的。也不是那麼的優雅,有人會說怎麼不用codis,redis cluster,這些方案當然好,但不適合咱們,這些方案不夠靈活,不能很好的處理熱點數據的問題。那麼consul是什麼呢,接下慢慢說:html
consul是HashiCorp公司(曾經開發過vgrant) 推出的一款開源工具, 基於go語言開發, 輕量級, 用於實現分佈式系統的服務發現與配置。 與其餘相似產品相比, 提供更「一站式」的解決方案。 consul內置有KV存儲, 服務註冊/發現, 健康檢查, HTTP+DNS API, Web UI等多種功能。官網: https://www.consul.io/其餘同類服務發現與配置的主流開源產品有:zookeeper和ETCD。node
consul的優點:mysql
1. 支持多數據中心, 內外網的服務採用不一樣的端口進行監聽。 多數據中心集羣能夠避免單數據中心的單點故障, zookeeper和 etcd 均不提供多數據中心功能的支持web
2. 支持健康檢查. etcd 不提供此功能.redis
3. 支持 http 和 dns 協議接口. zookeeper 的集成較爲複雜,etcd 只支持 http 協議. 有DNS功能, 支持REST API算法
4. 官方提供web管理界面, etcd 無此功能.sql
5. 部署簡單, 運維友好, 無依賴, go的二進制程序copy過來就能用了, 一個程序搞定, 能夠結合ansible來推送。shell
Consul和其餘服務發現工具的對比表:json
Consul 架構和角色bootstrap
1. Consul Cluster由部署和運行了Consul Agent的節點組成。 在Cluster中有兩種角色:Server和 Client。
2. Server和Client的角色和Consul Cluster上運行的應用服務無關, 是基於Consul層面的一種角色劃分.
3. Consul Server: 用於維護Consul Cluster的狀態信息, 實現數據一致性, 響應RPC請求。官方建議是: 至少要運行3個或者3個以上的Consul Server。 多個server之中須要選舉一個leader, 這個選舉過程Consul基於Raft協議實現. 多個Server節點上的Consul數據信息保持強一致性。 在局域網內與本地客戶端通信,經過廣域網與其餘數據中心通信。Consul Client: 只維護自身的狀態, 並將HTTP和DNS接口請求轉發給服務端。
4. Consul 支持多數據中心, 多個數據中心要求每一個數據中心都要安裝一組Consul cluster,多個數據中心間基於gossip protocol協議來通信, 使用Raft算法實現一致性
基礎知識就介紹這麼多了,更加詳細的能夠參考官網。下面咱們來搭建一下consul,以及如何利用consul實現redis以及mysql的高可用。
測試環境(生產環境consul server部署3個或者5個):
consul server:192.168.0.10
consul client:192.168.0.20,192.168.0.30,192.168.0.40
consul的安裝很是容易,從https://www.consul.io/downloads.html這裏下載之後,解壓便可使用,就是一個二進制文件,其餘的都沒有了。我這裏使用的是0.92版本。文件下載之後解壓放到/usr/local/bin。就可使用了。不依賴任何東西。上面的4臺服務器都安裝。
4臺機器都建立目錄,分別是放配置文件,以及存放數據的。以及存放redis,mysql的健康檢查腳本
mkdir /etc/consul.d/ -p && mkdir /data/consul/ -p
mkidr /data/consul/shell -p
而後把相關配置參數寫入配置文件,其實也能夠不用寫,直接跟在命令後面就行,那樣不方便管理。
consul server(192.168.0.10)配置文件(具體參數的意思請查詢官網或者文章給的參考連接):
[root@db-server-yayun-01 ~]# cat /etc/consul.d/server.json { "data_dir": "/data/consul", "datacenter": "dc1", "log_level": "INFO", "server": true, "bootstrap_expect": 1, "bind_addr": "192.168.0.10", "client_addr": "192.168.0.10", "ui":true } [root@db-server-yayun-01 ~]#
consul client(192.168.0.20,192.168.0.30,192.168.0.40)
[root@db-server-yayun-02 ~]# cat /etc/consul.d/client.json { "data_dir": "/data/consul", "enable_script_checks": true, "bind_addr": "192.168.0.20", "retry_join": ["192.168.0.10"], "retry_interval": "30s", "rejoin_after_leave": true, "start_join": ["192.168.0.10"] } [root@db-server-yayun-02 ~]#
3臺服務器的配置文件差別不大,惟一有區別的就是bind_addr地方,自行修改成你本身服務器的ip。我測試環境是虛擬機,有多快網卡,因此必須指定,不然能夠綁定0.0.0.0。
下面咱們先啓動consul server:
nohup consul agent -config-dir=/etc/consul.d > /data/consul/consul.log &
查看日誌:
[root@db-server-yayun-01 consul]# cat consul.log ==> WARNING: BootstrapExpect Mode is specified as 1; this is the same as Bootstrap mode. ==> WARNING: Bootstrap mode enabled! Do not enable unless necessary ==> Starting Consul agent... ==> Consul agent running! Version: 'v0.9.2' Node ID: '5e612623-ec5b-386c-19be-d38876a9a46f' Node name: 'db-server-yayun-01' Datacenter: 'dc1' Server: true (bootstrap: true) Client Addr: 192.168.0.10 (HTTP: 8500, HTTPS: -1, DNS: 8600) Cluster Addr: 192.168.0.10 (LAN: 8301, WAN: 8302) Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false ==> Log data will now stream in as it occurs: 2017/12/09 09:49:53 [INFO] raft: Initial configuration (index=1): [{Suffrage:Voter ID:192.168.0.10:8300 Address:192.168.0.10:8300}] 2017/12/09 09:49:53 [INFO] raft: Node at 192.168.0.10:8300 [Follower] entering Follower state (Leader: "") 2017/12/09 09:49:53 [INFO] serf: EventMemberJoin: db-server-yayun-01.dc1 192.168.0.10 2017/12/09 09:49:53 [INFO] serf: EventMemberJoin: db-server-yayun-01 192.168.0.10 2017/12/09 09:49:53 [INFO] agent: Started DNS server 192.168.0.10:8600 (udp) 2017/12/09 09:49:53 [INFO] consul: Adding LAN server db-server-yayun-01 (Addr: tcp/192.168.0.10:8300) (DC: dc1) 2017/12/09 09:49:53 [INFO] consul: Handled member-join event for server "db-server-yayun-01.dc1" in area "wan" 2017/12/09 09:49:53 [INFO] agent: Started DNS server 192.168.0.10:8600 (tcp) 2017/12/09 09:49:53 [INFO] agent: Started HTTP server on 192.168.0.10:8500 2017/12/09 09:50:00 [ERR] agent: failed to sync remote state: No cluster leader 2017/12/09 09:50:00 [WARN] raft: Heartbeat timeout from "" reached, starting election 2017/12/09 09:50:00 [INFO] raft: Node at 192.168.0.10:8300 [Candidate] entering Candidate state in term 2 2017/12/09 09:50:00 [INFO] raft: Election won. Tally: 1 2017/12/09 09:50:00 [INFO] raft: Node at 192.168.0.10:8300 [Leader] entering Leader state 2017/12/09 09:50:00 [INFO] consul: cluster leadership acquired 2017/12/09 09:50:00 [INFO] consul: New leader elected: db-server-yayun-01 2017/12/09 09:50:00 [INFO] consul: member 'db-server-yayun-01' joined, marking health alive 2017/12/09 09:50:03 [INFO] agent: Synced node info
能夠從日誌中看到(HTTP: 8500, HTTPS: -1, DNS: 8600),http端口默認8500,在reload以及web ui會用到,dns端口是8600,在使用dns解析的時候會用到。還能夠看到這臺機器就是leader,consul: New leader elected: db-server-yayun-01。由於只有一臺機器。因此生產環境必定要3個或者5個server。
下面啓動3臺client,3臺client啓動命令是同樣的。而後查看其中一臺client的日誌:
nohup consul agent -config-dir=/etc/consul.d > /data/consul/consul.log &
[root@db-server-yayun-02 consul]# cat /data/consul/consul.log ==> Starting Consul agent... ==> Joining cluster... Join completed. Synced with 1 initial agents ==> Consul agent running! Version: 'v0.9.2' Node ID: '0ec901ab-6c66-2461-95e6-50a77a28ed72' Node name: 'db-server-yayun-02' Datacenter: 'dc1' Server: false (bootstrap: false) Client Addr: 127.0.0.1 (HTTP: 8500, HTTPS: -1, DNS: 8600) Cluster Addr: 192.168.0.20 (LAN: 8301, WAN: 8302) Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false ==> Log data will now stream in as it occurs: 2017/12/09 10:06:10 [INFO] serf: EventMemberJoin: db-server-yayun-02 192.168.0.20 2017/12/09 10:06:10 [INFO] agent: Started DNS server 127.0.0.1:8600 (udp) 2017/12/09 10:06:10 [INFO] agent: Started DNS server 127.0.0.1:8600 (tcp) 2017/12/09 10:06:10 [INFO] agent: Started HTTP server on 127.0.0.1:8500 2017/12/09 10:06:10 [INFO] agent: (LAN) joining: [192.168.0.10] 2017/12/09 10:06:10 [INFO] agent: Retry join is supported for: aws azure gce softlayer 2017/12/09 10:06:10 [INFO] agent: Joining cluster... 2017/12/09 10:06:10 [INFO] agent: (LAN) joining: [192.168.0.10] 2017/12/09 10:06:10 [INFO] serf: EventMemberJoin: db-server-yayun-01 192.168.0.10 2017/12/09 10:06:10 [INFO] agent: (LAN) joined: 1 Err: <nil> 2017/12/09 10:06:10 [INFO] consul: adding server db-server-yayun-01 (Addr: tcp/192.168.0.10:8300) (DC: dc1) 2017/12/09 10:06:10 [INFO] agent: (LAN) joined: 1 Err: <nil> 2017/12/09 10:06:10 [INFO] agent: Join completed. Synced with 1 initial agents 2017/12/09 10:06:10 [INFO] agent: Synced node info
能夠看到提示agent: Join completed. Synced with 1 initial agents,以及Server: false (bootstrap: false)。這也是client和server的區別。
咱們繼續執行命令看一下集羣:
[root@db-server-yayun-02 ~]# consul members Node Address Status Type Build Protocol DC db-server-yayun-01 192.168.0.10:8301 alive server 0.9.2 2 dc1 db-server-yayun-02 192.168.0.20:8301 alive client 0.9.2 2 dc1 db-server-yayun-03 192.168.0.30:8301 alive client 0.9.2 2 dc1 db-server-yayun-04 192.168.0.40:8301 alive client 0.9.2 2 dc1 [root@db-server-yayun-02 ~]#
[root@db-server-yayun-02 ~]# consul operator raft list-peers Node ID Address State Voter RaftProtocol db-server-yayun-01 192.168.0.10:8300 192.168.0.10:8300 leader true 2 [root@db-server-yayun-02 ~]#
咱們看看web ui,consul自帶的ui,很是輕便。訪問:http://192.168.0.10:8500/ui/
到這來consul集羣就搭建完成了,是否是很簡單。對就是這麼簡單,可是從上面能夠看到,client節點並無註冊服務,顯示0 services。這也就是接下來須要講解的。那麼到底如何實現redis及mysql的高可用呢?正式開始:
Consul 使用場景一(redis sentinel)
(1)Redis 哨兵架構下,服務器部署了哨兵,但業務部門沒有在app 層面,使用jedis 哨兵驅動來自動發現Redis master,而使用直連IP master。當master掛掉,其餘redis節點擔當新master後,應用須要手工修改配置,指向新master。
(2)Redis 客戶端驅動,尚未讀寫分離的配置,若想slave的讀負載均衡,暫時沒好的辦法。咱們程序都是支持讀寫分離,因此沒影響
(3)Consul 能夠知足以上需求,配置兩個DNS服務,一個是master的服務,利用consul自身的服務健康檢查和探測功能, 自動發現新的master。 而後定義一個slave的服務,基於DNS自己, 可以對slave角色的redis IP作輪詢。
架構圖以下:
一樣也能夠對mysql作高可用,mha和sentinel的角色同樣,架構圖以下:
下面就說說redis高可用的實現過程,mysql的我就不說了,mysql用到的健康檢查腳本我會貼出來。思路都是同樣的。
Consul 服務定義(Redis)
上面已經搭建好了consul集羣,server是192.168.0.10 client是20到40. 那麼20咱們就拿來當redis master,30,40拿來當redis slave。下面定義服務(20,30,40都要存在):
20,30,40的配置文件以下,除了address要修改成對應的服務器地址,其餘同樣。
[root@db-server-yayun-02 consul.d]# pwd /etc/consul.d [root@db-server-yayun-02 consul.d]# ll total 12 -rw-r--r--. 1 root root 221 Dec 9 09:44 client.json -rw-r--r--. 1 root root 319 Dec 9 10:48 r-6029-redis-test.json -rw-r--r--. 1 root root 321 Dec 9 10:48 w-6029-redis-test.json [root@db-server-yayun-02 consul.d]#
master的服務定義配置文件:
[root@db-server-yayun-02 consul.d]# cat w-6029-redis-test.json { "services": [ { "name": "w-6029-redis-test", "tags": [ "master-test-6029" ], "address": "192.168.0.20", "port": 6029, "checks": [ { "script": "/data/consul/shell/check_redis_master.sh 6029 ", "interval": "15s" } ] } ] } [root@db-server-yayun-02 consul.d]#
slave的服務定義配置文件:
[root@db-server-yayun-02 consul.d]# cat r-6029-redis-test.json { "services": [ { "name": "r-6029-redis-test", "tags": [ "slave-test-6029" ], "address": "192.168.0.20", "port": 6029, "checks": [ { "script": "/data/consul/shell/check_redis_slave.sh 6029 ", "interval": "15s" } ] } ] } [root@db-server-yayun-02 consul.d]#
每一個agent都註冊後, 對應有兩個域名:
w-6029-redis-test.service.consul (對應惟一一個master IP)
r-6029-redis-test.service.consul (對應兩個slave IP, 客戶端請求時, 隨機分配一個)
其中"script": "/data/consul/shell/check_redis_slave.sh 6029 "表明對redis 6029端口進行健康檢查,關於更多健康檢查請查看官網介紹。
[root@db-server-yayun-03 shell]# pwd /data/consul/shell [root@db-server-yayun-03 shell]# ll total 16 -rwxr-xr-x. 1 root root 480 Dec 9 10:56 check_mysql_master.sh -rwxr-xr-x. 1 root root 3004 Dec 9 10:55 check_mysql_slave.sh -rwxr-xr-x. 1 root root 254 Dec 9 10:51 check_redis_master.sh -rwxr-xr-x. 1 root root 379 Dec 9 10:51 check_redis_slave.sh [root@db-server-yayun-03 shell]#
/data/consul/shell目錄下面有4個腳本,是對redis和mysql進行健康檢查用的。腳本比較簡單,大概就是若是隻有一個master,那麼讀寫都在master,若是有slave可用,那麼讀會在slave進行。若是slave複製不正常,或者複製延時,那麼slave服務將不會註冊。
[root@db-server-yayun-03 shell]# cat check_redis_master.sh #!/bin/bash myport=$1 auth=$2 if [ ! -n "$auth" ] then auth='\"\"' fi comm="/usr/local/bin/redis-cli -p $myport -a $auth " role=`echo 'INFO Replication'|$comm |grep -Ec 'role:master'` echo 'INFO Replication'|$comm if [ $role -ne 1 ] then exit 2 fi [root@db-server-yayun-03 shell]#
[root@db-server-yayun-03 shell]# cat check_redis_slave.sh #!/bin/bash myport=$1 auth=$2 if [ ! -n "$auth" ] then auth='\"\"' fi comm="/usr/local/bin/redis-cli -p $myport -a $auth " role=`echo 'INFO Replication'|$comm |grep -Ec '^role:slave|^master_link_status:up'` single=`echo 'INFO Replication'|$comm |grep -Ec '^role:master|^connected_slaves:0'` echo 'INFO Replication'|$comm if [ $role -ne 2 -a $single -ne 2 ] then exit 2 fi [root@db-server-yayun-03 shell]#
[root@db-server-yayun-02 shell]# cat check_mysql_master.sh #!/bin/bash port=$1 user="root" passwod="123" comm="/usr/local/mysql/bin/mysql -u$user -h 127.0.0.1 -P $port -p$passwod" slave_info=`$comm -e "show slave status" |wc -l` value=`$comm -Nse "select 1"` # 判斷是否是從庫 if [ $slave_info -ne 0 ] then echo "MySQL $port Instance is Slave........" $comm -e "show slave status\G" | egrep -w "Master_Host|Master_User|Master_Port|Master_Log_File|Read_Master_Log_Pos|Relay_Log_File|Relay_Log_Pos|Relay_Master_Log_File|Slave_IO_Running|Slave_SQL_Running|Exec_Master_Log_Pos|Relay_Log_Space|Seconds_Behind_Master" exit 2 fi # 判斷mysql是否存活 if [ -z $value ] then exit 2 fi echo "MySQL $port Instance is Master........" $comm -e "select * from information_schema.PROCESSLIST where user='repl' and COMMAND like '%Dump%'" [root@db-server-yayun-02 shell]#
[root@db-server-yayun-02 shell]# cat check_mysql_slave.sh #!/bin/bash port=$1 user="root" passwod="123" repl_check_user="root" repl_check_pwd="123" master_comm="/usr/local/mysql/bin/mysql -u$user -h 127.0.0.1 -P $port -p$passwod" slave_comm="/usr/local/mysql/bin/mysql -u$repl_check_user -P $port -p$repl_check_pwd" # 判斷mysql是否存活 value=`$master_comm -Nse "select 1"` if [ -z $value ] then echo "MySQL Server is Down....." exit 2 fi get_slave_count=0 is_slave_role=0 slave_mode_repl_delay=0 master_mode_repl_delay=0 master_mode_repl_dead=0 slave_mode_repl_status=0 max_delay=120 get_slave_hosts=`$master_comm -Nse "select substring_index(HOST,':',1) from information_schema.PROCESSLIST where user='repl' and COMMAND like '%Binlog Dump%';" ` get_slave_count=`$master_comm -Nse "select count(1) from information_schema.PROCESSLIST where user='repl' and COMMAND like '%Binlog Dump%';" ` is_slave_role=`$master_comm -e "show slave status\G"|grep -Ewc "Slave_SQL_Running|Slave_IO_Running"` ### 單點模式(若是 get_slave_count=0 and is_slave_role=0) function single_mode { if [ $get_slave_count -eq 0 -a $is_slave_role -eq 0 ] then echo "MySQL $port Instance is Single Master........" exit 0 fi } ### 從節點模式(若是 get_slave_count=0 and is_slave_role=2 ) function slave_mode { #若是是從節點,必須知足不延遲, if [ $is_slave_role -ge 2 ] then echo "MySQL $port Instance is Slave........" $master_comm -e "show slave status\G" | egrep -w "Master_Host|Master_User|Master_Port|Master_Log_File|Read_Master_Log_Pos|Relay_Log_File|Relay_Log_Pos|Relay_Master_Log_File|Slave_IO_Running|Slave_SQL_Running|Exec_Master_Log_Pos|Relay_Log_Space|Seconds_Behind_Master" slave_mode_repl_delay=`$master_comm -e "show slave status\G" | grep -w "Seconds_Behind_Master" | awk '{print $NF}'` slave_mode_repl_status=`$master_comm -e "show slave status\G"|grep -Ec "Slave_IO_Running: Yes|Slave_SQL_Running: Yes"` if [ X"$slave_mode_repl_delay" == X"NULL" ] then slave_mode_repl_delay=99999 fi if [ $slave_mode_repl_delay != "NULL" -a $slave_mode_repl_delay -lt $max_delay -a $slave_mode_repl_status -ge 2 ] then exit 0 fi fi } function master_mode { ###若是是主節點,必須知足從節點爲延遲或複製錯誤。纔可讀 if [ $get_slave_count -gt 0 -a $is_slave_role -eq 0 ] then echo "MySQL $port Instance is Master........" $master_comm -e "select * from information_schema.PROCESSLIST where user='repl' and COMMAND like '%Dump%'" for my_slave in $get_slave_hosts do master_mode_repl_delay=`$slave_comm -h $my_slave -e "show slave status\G" | grep -w "Seconds_Behind_Master" | awk '{print $NF}' ` master_mode_repl_thread=`$slave_comm -h $my_slave -e "show slave status\G"|grep -Ec "Slave_IO_Running: Yes|Slave_SQL_Running: Yes"` if [ X"$master_mode_repl_delay" == X"NULL" ] then master_mode_repl_delay=99999 fi if [ $master_mode_repl_delay -lt $max_delay -a $master_mode_repl_thread -ge 2 ] then exit 2 fi done exit 0 fi } single_mode slave_mode master_mode exit 2 [root@db-server-yayun-02 shell]#
"name": "r-6029-redis-test",這個就是域名了,默認後綴是servers.consul,consul能夠利用domain參數修改。配置文件生成之後安裝redis,搭建主從複製(省略)。主從複製完成之後就能夠從新reload consul了。redis info信息:
127.0.0.1:6029> info replication # Replication role:master connected_slaves:2 slave0:ip=192.168.0.40,port=6029,state=online,offset=6786,lag=0 slave1:ip=192.168.0.30,port=6029,state=online,offset=6786,lag=1 master_repl_offset:6786 repl_backlog_active:1 repl_backlog_size:67108864 repl_backlog_first_byte_offset:2 repl_backlog_histlen:6785 127.0.0.1:6029>
reload consul(3臺client,也就是20-40):
[root@db-server-yayun-02 ~]# consul reload Configuration reload triggered [root@db-server-yayun-02 ~]#
在其中一臺服務器查看consul日誌(20):
[root@db-server-yayun-02 consul]# tail -f consul.log 2017/12/09 10:09:59 [INFO] serf: EventMemberJoin: db-server-yayun-04 192.168.0.40 2017/12/09 11:14:55 [INFO] Caught signal: hangup 2017/12/09 11:14:55 [INFO] Reloading configuration... 2017/12/09 11:14:55 [INFO] agent: Synced service 'r-6029-redis-test' 2017/12/09 11:14:55 [INFO] agent: Synced service 'w-6029-redis-test' 2017/12/09 11:14:55 [INFO] agent: Synced check 'service:w-6029-redis-test' 2017/12/09 11:15:00 [WARN] agent: Check 'service:r-6029-redis-test' is now critical 2017/12/09 11:15:15 [WARN] agent: Check 'service:r-6029-redis-test' is now critical 2017/12/09 11:15:30 [WARN] agent: Check 'service:r-6029-redis-test' is now critical 2017/12/09 11:15:45 [WARN] agent: Check 'service:r-6029-redis-test' is now critical
能夠看到r-6029-redis-test,w-6029-redis-test服務都已經註冊,可是隻有w-6029-redis-test註冊成功,也就是寫的,由於服務器20上面的redis是master,slave的服務固然沒法註冊成功。咱們經過web ui看看。
能夠看到3個client節點每一個節點都已經註冊了2個服務。還能夠看到咱們自定義的輸出:
下面咱們使用dns來解析看看,是不是咱們想要的。咱們註冊兩個服務。r-6029-redis-test,w-6029-redis-test,那麼就是就產生了2個域名,分別是r-6029-redis-test.service.consul和w-6029-redis-test.service.consul。咱們使用dig來看看:
[root@db-server-yayun-02 ~]# dig @192.168.0.10 -p 8600 r-6029-redis-test.service.consul ; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.17.rc1.el6_4.6 <<>> @192.168.0.10 -p 8600 r-6029-redis-test.service.consul ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 34508 ;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0 ;; WARNING: recursion requested but not available ;; QUESTION SECTION: ;r-6029-redis-test.service.consul. IN A ;; ANSWER SECTION: r-6029-redis-test.service.consul. 0 IN A 192.168.0.30 r-6029-redis-test.service.consul. 0 IN A 192.168.0.40 ;; Query time: 1 msec ;; SERVER: 192.168.0.10#8600(192.168.0.10) ;; WHEN: Sat Dec 9 11:26:38 2017 ;; MSG SIZE rcvd: 82 [root@db-server-yayun-02 ~]#
咱們能夠看到讀的域名r-6029-redis-test.service.consul解析到了兩臺服務器。那麼咱們就可以對從庫進行負載均衡了。那麼寫的域名呢?
[root@db-server-yayun-02 ~]# dig @192.168.0.10 -p 8600 w-6029-redis-test.service.consul ; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.17.rc1.el6_4.6 <<>> @192.168.0.10 -p 8600 w-6029-redis-test.service.consul ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 7451 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0 ;; WARNING: recursion requested but not available ;; QUESTION SECTION: ;w-6029-redis-test.service.consul. IN A ;; ANSWER SECTION: w-6029-redis-test.service.consul. 0 IN A 192.168.0.20 ;; Query time: 1 msec ;; SERVER: 192.168.0.10#8600(192.168.0.10) ;; WHEN: Sat Dec 9 11:27:59 2017 ;; MSG SIZE rcvd: 66 [root@db-server-yayun-02 ~]#
和咱們預料的沒錯,解析在了20上面。那麼咱們若是關閉其中一個從庫會是怎樣的?
[root@db-server-yayun-03 ~]# ifconfig eth1 | grep -oP '(?<=inet addr:)\S+' 192.168.0.30 [root@db-server-yayun-03 ~]# pgrep -fl redis-server | awk '{print $1}' | xargs kill [root@db-server-yayun-03 ~]#
127.0.0.1:6029> info replication # Replication role:master connected_slaves:1 slave0:ip=192.168.0.40,port=6029,state=online,offset=8200,lag=0 master_repl_offset:8200 repl_backlog_active:1 repl_backlog_size:67108864 repl_backlog_first_byte_offset:2 repl_backlog_histlen:8199 127.0.0.1:6029>
能夠看到只有一個從了,咱們再次dig 讀域名看看:
[root@db-server-yayun-02 ~]# dig @192.168.0.10 -p 8600 r-6029-redis-test.service.consul ; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.17.rc1.el6_4.6 <<>> @192.168.0.10 -p 8600 r-6029-redis-test.service.consul ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 41984 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0 ;; WARNING: recursion requested but not available ;; QUESTION SECTION: ;r-6029-redis-test.service.consul. IN A ;; ANSWER SECTION: r-6029-redis-test.service.consul. 0 IN A 192.168.0.40 ;; Query time: 8 msec ;; SERVER: 192.168.0.10#8600(192.168.0.10) ;; WHEN: Sat Dec 9 11:32:46 2017 ;; MSG SIZE rcvd: 66 [root@db-server-yayun-02 ~]#
能夠看到踢掉了另一臺機器。若是我再次關閉40這個從呢?
[root@db-server-yayun-04 shell]# ifconfig eth1 | grep -oP '(?<=inet addr:)\S+' 192.168.0.40 [root@db-server-yayun-04 shell]# pgrep -fl redis-server | awk '{print $1}' | xargs kill [root@db-server-yayun-04 shell]#
那麼咱們的redis就沒有可用從庫了,那麼讀寫都將在master上面。
[root@db-server-yayun-02 ~]# dig @192.168.0.10 -p 8600 r-6029-redis-test.service.consul ; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.17.rc1.el6_4.6 <<>> @192.168.0.10 -p 8600 r-6029-redis-test.service.consul ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 58564 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0 ;; WARNING: recursion requested but not available ;; QUESTION SECTION: ;r-6029-redis-test.service.consul. IN A ;; ANSWER SECTION: r-6029-redis-test.service.consul. 0 IN A 192.168.0.20 ;; Query time: 4 msec ;; SERVER: 192.168.0.10#8600(192.168.0.10) ;; WHEN: Sat Dec 9 11:35:11 2017 ;; MSG SIZE rcvd: 66 [root@db-server-yayun-02 ~]# dig @192.168.0.10 -p 8600 w-6029-redis-test.service.consul ; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.17.rc1.el6_4.6 <<>> @192.168.0.10 -p 8600 w-6029-redis-test.service.consul ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 56965 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0 ;; WARNING: recursion requested but not available ;; QUESTION SECTION: ;w-6029-redis-test.service.consul. IN A ;; ANSWER SECTION: w-6029-redis-test.service.consul. 0 IN A 192.168.0.20 ;; Query time: 5 msec ;; SERVER: 192.168.0.10#8600(192.168.0.10) ;; WHEN: Sat Dec 9 11:35:16 2017 ;; MSG SIZE rcvd: 66 [root@db-server-yayun-02 ~]#
這裏測試的就差很少了,下面結合sentinel來實現高可用。我會恢復剛纔的環境。也就是20是master,30,40是slave。10是sentinel。生產環境sentinel也要部署3個或5個。個人10上面已經有sentinel,端口是36029,我直接添加對20的6029監控。
127.0.0.1:36029> sentinel monitor my-test-6029 192.168.0.20 6029 1 OK 127.0.0.1:36029>
127.0.0.1:36029> info Sentinel # Sentinel sentinel_masters:1 sentinel_tilt:0 sentinel_running_scripts:0 sentinel_scripts_queue_length:0 master0:name=my-test-6029,status=ok,address=192.168.0.20:6029,slaves=2,sentinels=1 127.0.0.1:36029>
再次看看讀寫域名是否正常了,我已經恢復環境:
[root@db-server-yayun-02 ~]# dig @192.168.0.10 -p 8600 w-6029-redis-test.service.consul ; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.17.rc1.el6_4.6 <<>> @192.168.0.10 -p 8600 w-6029-redis-test.service.consul ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62669 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0 ;; WARNING: recursion requested but not available ;; QUESTION SECTION: ;w-6029-redis-test.service.consul. IN A ;; ANSWER SECTION: w-6029-redis-test.service.consul. 0 IN A 192.168.0.20 ;; Query time: 2 msec ;; SERVER: 192.168.0.10#8600(192.168.0.10) ;; WHEN: Sat Dec 9 11:43:04 2017 ;; MSG SIZE rcvd: 66 [root@db-server-yayun-02 ~]# dig @192.168.0.10 -p 8600 r-6029-redis-test.service.consul ; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.17.rc1.el6_4.6 <<>> @192.168.0.10 -p 8600 r-6029-redis-test.service.consul ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 41305 ;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0 ;; WARNING: recursion requested but not available ;; QUESTION SECTION: ;r-6029-redis-test.service.consul. IN A ;; ANSWER SECTION: r-6029-redis-test.service.consul. 0 IN A 192.168.0.30 r-6029-redis-test.service.consul. 0 IN A 192.168.0.40 ;; Query time: 2 msec ;; SERVER: 192.168.0.10#8600(192.168.0.10) ;; WHEN: Sat Dec 9 11:43:08 2017 ;; MSG SIZE rcvd: 82 [root@db-server-yayun-02 ~]#
能夠看到已經正常,如今關閉redis master:
[root@db-server-yayun-02 ~]# ifconfig eth1 | grep -oP '(?<=inet addr:)\S+' 192.168.0.20 [root@db-server-yayun-02 ~]# pgrep -fl redis-server | awk '{print $1}' | xargs kill
看看sentinel信息:
127.0.0.1:36029> info Sentinel # Sentinel sentinel_masters:1 sentinel_tilt:0 sentinel_running_scripts:0 sentinel_scripts_queue_length:0 master0:name=my-test-6029,status=ok,address=192.168.0.30:6029,slaves=2,sentinels=1 127.0.0.1:36029>
能夠看到master已是30了,dig域名看看:
[root@db-server-yayun-02 ~]# dig @192.168.0.10 -p 8600 w-6029-redis-test.service.consul ; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.17.rc1.el6_4.6 <<>> @192.168.0.10 -p 8600 w-6029-redis-test.service.consul ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55527 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0 ;; WARNING: recursion requested but not available ;; QUESTION SECTION: ;w-6029-redis-test.service.consul. IN A ;; ANSWER SECTION: w-6029-redis-test.service.consul. 0 IN A 192.168.0.30 ;; Query time: 2 msec ;; SERVER: 192.168.0.10#8600(192.168.0.10) ;; WHEN: Sat Dec 9 11:45:46 2017 ;; MSG SIZE rcvd: 66 [root@db-server-yayun-02 ~]# dig @192.168.0.10 -p 8600 r-6029-redis-test.service.consul ; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.17.rc1.el6_4.6 <<>> @192.168.0.10 -p 8600 r-6029-redis-test.service.consul ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 11563 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0 ;; WARNING: recursion requested but not available ;; QUESTION SECTION: ;r-6029-redis-test.service.consul. IN A ;; ANSWER SECTION: r-6029-redis-test.service.consul. 0 IN A 192.168.0.40 ;; Query time: 1 msec ;; SERVER: 192.168.0.10#8600(192.168.0.10) ;; WHEN: Sat Dec 9 11:45:50 2017 ;; MSG SIZE rcvd: 66 [root@db-server-yayun-02 ~]#
ok,能夠看到已是咱們想要的結果了。最後說說dns的問題。
App端配置域名服務器IP來解析consul後綴的域名,DNS解析及跳轉, 有三個方案:
1. 原內網dns服務器,作域名轉發,consul後綴的,都轉到consul server上(咱們線上是採用這個)
2. dns所有跳到consul DNS服務器上,非consul後綴的,使用 recursors 屬性跳轉到原DNS服務器上
3. dnsmaq 轉: server=/consul/10.16.X.X#8600 解析consul後綴的
咱們內網dns是用的bind,對於bind的如何作域名轉發consul官網也有栗子:https://www.consul.io/docs/guides/forwarding.html,另外也對consul的dns進行了壓力測試,不存在性能問題:
參考資料:
https://book-consul-guide.vnzmi.com/
http://www.liangxiansen.cn/2017/04/06/consul/
總結:
對於單機多實例的mysql以及redis,利用consul可以很好的實現高可用,固然要結合mha或者sentinel,最大的好處是consul足夠輕量,方便,簡單。若是程序支持讀寫分離的,那麼用起來更加方便。從掛掉一個或者多個也不會影響服務。