Redis主從複製(讀寫分離)、哨兵(主從切換)配置

時間 2019-11-05

標籤 redis 主從複製讀寫分離哨兵切換配置欄目 Redis 简体版

原文原文鏈接

Redis的主從複製功能很是強大，一個master能夠擁有多個slave，而一個slave又能夠擁有多個slave，如此下去，造成了強大的多級服務器集羣架構。
官網：https://redis.io/node

環境：
Master:

[root@Master ~]# uname -a
Linux Master.Redis 3.10.0-514.el7.x86_64 #1 SMP Tue Nov 22 16:42:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
[root@Master ~]#

Slaveredis

[root@Slave ~]# uname -a
Linux Slave.Redis 3.10.0-327.el7.x86_64 #1 SMP Thu Nov 19 22:10:57 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
[root@Slave ~]#

實驗中Slave上啓動了2個實例，port1:6979 port2:6980算法

Redis
redis-4.0.10數據庫

Redis主從複製(讀寫分離)配置vim

Redis主從複製能夠根據是不是全量分爲全量同步和增量同步：
一、全量同步
Redis全量複製通常發生在Slave初始化階段，這時Slave須要將Master上的全部數據都複製一份。具體步驟以下：
　　1）從服務器鏈接主服務器，發送SYNC命令；
　　2）主服務器接收到SYNC命名後，開始執行BGSAVE命令生成RDB文件並使用緩衝區記錄此後執行的全部寫命令；
　　3）主服務器BGSAVE執行完後，向全部從服務器發送快照文件，並在發送期間繼續記錄被執行的寫命令；
　　4）從服務器收到快照文件後丟棄全部舊數據，載入收到的快照；
　　5）主服務器快照發送完畢後開始向從服務器發送緩衝區中的寫命令；
　　6）從服務器完成對快照的載入，開始接收命令請求，並執行來自主服務器緩衝區的寫命令；

完成上面幾個步驟後就完成了從服務器數據初始化的全部操做，從服務器此時能夠接收來自用戶的讀請求。安全

2 增量同步
Redis增量複製是指Slave初始化後開始正常工做時主服務器發生的寫操做同步到從服務器的過程。增量複製的過程主要是主服務器每執行一個寫命令就會向從服務器發送相同的寫命令，從服務器接收並執行收到的寫命令。服務器

三、Redis主從同步策略
主從剛剛鏈接的時候，進行全量同步；全同步結束後，進行增量同步。固然，若是有須要，slave 在任什麼時候候均可以發起全量同步。redis 策略是，不管如何，首先會嘗試進行增量同步，如不成功，要求從機進行全量同步。網絡

安裝

一、redis主不須要特別配置，按照正常配置便可
二、redis從，須要在配置文件裏指定redis主架構

Master併發

[root@Master ~]# mkdir /opt/soft
[root@Master ~]# cd /opt/soft/
[root@Master soft]#  wge thttp://download.redis.io/releases/redis-4.0.10.tar.gz
[root@Master soft]# tar -zxvf redis-4.0.10.tar.gz
[root@Master soft]# cd redis-4.0.10
[root@Master redis-4.0.10]# make
......
Hint: It's a good idea to run 'make test' ;)

make[1]: Leaving directory `/opt/soft/redis-4.0.10/src'
[root@Master redis-4.0.10]# cd src/
[root@Master src]# make test
......

\o/ All tests passed without errors!

Cleanup: may take some time... OK
[root@Master src]# mkdir -p /opt/redis/{logs,etc,data,bin}
[root@Master src]# vim /opt/redis/etc/redis.conf
[root@Master src]# cat /opt/redis/etc/redis.conf |grep -v "#"|sed '/^[[:space:]]*$/d'
bind 0.0.0.0
protected-mode yes
port 6979
tcp-backlog 511
timeout 0
tcp-keepalive 300
daemonize yes
supervised no
pidfile /var/run/redis_6979.pid
loglevel notice
logfile "./logs/redis.log"
databases 16
always-show-logo yes
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb
dir ./data/
slave-serve-stale-data yes
slave-read-only yes
repl-diskless-sync no
repl-diskless-sync-delay 5
repl-disable-tcp-nodelay no
slave-priority 100
requirepass 51cto
lazyfree-lazy-eviction no
lazyfree-lazy-expire no
lazyfree-lazy-server-del no
slave-lazy-flush no
appendonly no
appendfilename "appendonly.aof"
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
aof-use-rdb-preamble no
lua-time-limit 5000
slowlog-log-slower-than 10000
slowlog-max-len 128
latency-monitor-threshold 0
notify-keyspace-events ""
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-size -2
list-compress-depth 0
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
hll-sparse-max-bytes 3000
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
aof-rewrite-incremental-fsync yes
[root@Master src]# touch /opt/redis/logs/redis.log
[root@Master src]# echo 'vm.overcommit_memory = 1' >> /etc/sysctl.conf
[root@Master src]#  sysctl -p
[root@Master src]# echo '511' > /proc/sys/net/core/somaxconn
[root@Master src]# echo 'never' > /sys/kernel/mm/transparent_hugepage/enabled 
[root@Master src]# cd /opt/redis/bin/
[root@Master bin]# ./redis-server ../etc/redis.conf 
[root@Master bin]# tail -500f ../logs/redis.log 
9084:C 30 Jun 23:16:40.766 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
9084:C 30 Jun 23:16:40.766 # Redis version=4.0.10, bits=64, commit=00000000, modified=0, pid=9084, just started
9084:C 30 Jun 23:16:40.766 # Configuration loaded
                _._                                                  
           _.-``__ ''-._                                             
      _.-``    `.  `_.  ''-._           Redis 4.0.10 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._                                   
 (    '      ,       .-`  | `,    )     Running in standalone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6979
 |    `-._   `._    /     _.-'    |     PID: 9085
  `-._    `-._  `-./  _.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |           http://redis.io        
  `-._    `-._`-.__.-'_.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |                                  
  `-._    `-._`-.__.-'_.-'    _.-'                                   
      `-._    `-.__.-'    _.-'                                       
          `-._        _.-'                                           
              `-.__.-'                                               

9085:M 30 Jun 23:16:40.770 # Server initialized
9085:M 30 Jun 23:16:40.771 * Ready to accept connections

Slave安裝完之後將Master上的/opt/redis整個文件夾都拷貝過去

[root@Master bin]# scp -r /opt/redis root@10.15.43.16:/opt/

修改Slave上redis配置

[root@Slave redis]# cat /opt/redis/etc/redis_6979.conf |grep -v "#"|sed '/^[[:space:]]*$/d'
bind 0.0.0.0
protected-mode yes
port 6979
tcp-backlog 511
timeout 0
tcp-keepalive 300
daemonize yes
supervised no
pidfile /var/run/redis_6979.pid
loglevel notice
logfile "../logs/redis_6979.log"
databases 16
always-show-logo yes
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename dump_6979.rdb
dir ../data/
slaveof 10.15.43.15 6979   //master地址 端口
masterauth 51cto      //master的密碼，若是設置了須要開啓此項
slave-serve-stale-data yes
slave-read-only yes
repl-diskless-sync no
repl-diskless-sync-delay 5
repl-disable-tcp-nodelay no
slave-priority 100        #slave的優先級，是一個整數，展現在Redis的Info輸出中。若是master再也不正常工做了，哨兵將這一個slave提高爲master。
# 優先級數字小的salve會優先考慮提高爲master，因此例若有三個slave優先級分別爲10，100，25，哨兵將挑選優先級最小數字爲10的slave。
# 0做爲一個特殊的優先級，標識這個slave不能做爲master，因此一個優先級爲0的slave永遠不會被哨兵挑選提高爲master
requirepass 51cto
lazyfree-lazy-eviction no
lazyfree-lazy-expire no
lazyfree-lazy-server-del no
slave-lazy-flush no
appendonly no
appendfilename "appendonly.aof"
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
aof-use-rdb-preamble no
lua-time-limit 5000
slowlog-log-slower-than 10000
slowlog-max-len 128
latency-monitor-threshold 0
notify-keyspace-events ""
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-size -2
list-compress-depth 0
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
hll-sparse-max-bytes 3000
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
aof-rewrite-incremental-fsync yes
[root@Slave redis]# echo 'vm.overcommit_memory = 1' >> /etc/sysctl.conf
[root@Slave redis]# sysctl -p
[root@Slave redis]# echo '511' > /proc/sys/net/core/somaxconn
[root@Slave redis]# echo 'never' > /sys/kernel/mm/transparent_hugepage/enabled
[root@Slave redis]# cd bin/
[root@Slave bin]# ./redis-server ../etc/redis.conf 
[root@Slave bin]# tail -500f ../logs/redis.log 
9025:C 30 Jun 23:31:06.487 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
9025:C 30 Jun 23:31:06.487 # Redis version=4.0.10, bits=64, commit=00000000, modified=0, pid=9025, just started
9025:C 30 Jun 23:31:06.487 # Configuration loaded
                _._                                                  
           _.-``__ ''-._                                             
      _.-``    `.  `_.  ''-._           Redis 4.0.10 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._                                   
 (    '      ,       .-`  | `,    )     Running in standalone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6979
 |    `-._   `._    /     _.-'    |     PID: 9026
  `-._    `-._  `-./  _.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |           http://redis.io        
  `-._    `-._`-.__.-'_.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |                                  
  `-._    `-._`-.__.-'_.-'    _.-'                                   
      `-._    `-.__.-'    _.-'                                       
          `-._        _.-'                                           
              `-.__.-'                                               

9026:S 30 Jun 23:31:06.491 # Server initialized
9026:S 30 Jun 23:31:06.491 * DB loaded from disk: 0.000 seconds
9026:S 30 Jun 23:31:06.491 * Before turning into a slave, using my master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
9026:S 30 Jun 23:31:06.491 * Ready to accept connections
9026:S 30 Jun 23:31:06.491 * Connecting to MASTER 10.15.43.15:6979
9026:S 30 Jun 23:31:06.492 * MASTER <-> SLAVE sync started
9026:S 30 Jun 23:31:06.492 * Non blocking connect for SYNC fired the event.
9026:S 30 Jun 23:31:06.492 * Master replied to PING, replication can continue...
9026:S 30 Jun 23:31:06.494 * Trying a partial resynchronization (request 88f6e9e624a3f8af365809c46e8a3ee2f16945b7:1).
9026:S 30 Jun 23:31:06.494 * Successful partial resynchronization with master.
9026:S 30 Jun 23:31:06.494 * MASTER <-> SLAVE sync: Master accepted a Partial Resynchronization.

在Slave機器上再起一個實例，複製上面從的配置文件，並修改端口爲6980

[root@Slave bin]# cp ../etc/redis_6979.conf ../etc/redis_6980.conf 
[root@Slave bin]# sed -i 's/6979/6980/g' ../etc/redis_6980.conf
[root@Slave bin]# touch ../logs/redis_6980.log
[root@Slave bin]# ./redis-server ../etc/redis_6980.conf
[root@Slave bin]# tail -500f ../logs/redis_6980.log
1974:C 01 Jul 11:12:09.351 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1974:C 01 Jul 11:12:09.351 # Redis version=4.0.10, bits=64, commit=00000000, modified=0, pid=1974, just started
1974:C 01 Jul 11:12:09.351 # Configuration loaded
                _._                                                  
           _.-``__ ''-._                                             
      _.-``    `.  `_.  ''-._           Redis 4.0.10 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._                                   
 (    '      ,       .-`  | `,    )     Running in standalone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6980
 |    `-._   `._    /     _.-'    |     PID: 1975
  `-._    `-._  `-./  _.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |           http://redis.io        
  `-._    `-._`-.__.-'_.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |                                  
  `-._    `-._`-.__.-'_.-'    _.-'                                   
      `-._    `-.__.-'    _.-'                                       
          `-._        _.-'                                           
              `-.__.-'                                               

1975:S 01 Jul 11:12:09.355 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1975:S 01 Jul 11:12:09.355 # Server initialized
1975:S 01 Jul 11:12:09.355 * Ready to accept connections
1975:S 01 Jul 11:12:09.355 * Connecting to MASTER 10.15.43.15:6979
1975:S 01 Jul 11:12:09.355 * MASTER <-> SLAVE sync started
1975:S 01 Jul 11:12:09.356 * Non blocking connect for SYNC fired the event.
1975:S 01 Jul 11:12:09.356 * Master replied to PING, replication can continue...
1975:S 01 Jul 11:12:09.358 * Partial resynchronization not possible (no cached master)
1975:S 01 Jul 11:12:09.359 * Full resync from master: 88f6e9e624a3f8af365809c46e8a3ee2f16945b7:57196
1975:S 01 Jul 11:12:09.365 * MASTER <-> SLAVE sync: receiving 214 bytes from master
1975:S 01 Jul 11:12:09.366 * MASTER <-> SLAVE sync: Flushing old data
1975:S 01 Jul 11:12:09.366 * MASTER <-> SLAVE sync: Loading DB in memory
1975:S 01 Jul 11:12:09.366 * MASTER <-> SLAVE sync: Finished with success

主從驗證

[root@Slave bin]# ./redis-cli -h 10.15.43.15 -p 6979 -a 51cto info replication
Warning: Using a password with '-a' option on the command line interface may not be safe.
# Replication
role:master
connected_slaves:2
slave0:ip=10.15.43.16,port=6979,state=online,offset=187508,lag=0
slave1:ip=10.15.43.16,port=6980,state=online,offset=187508,lag=1
master_replid:88f6e9e624a3f8af365809c46e8a3ee2f16945b7
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:187508
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:187508
[root@Slave bin]# ./redis-cli -h 10.15.43.16 -p 6979 -a 51cto info replication
Warning: Using a password with '-a' option on the command line interface may not be safe.
# Replication
role:slave
master_host:10.15.43.15
master_port:6979
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_repl_offset:187550
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:88f6e9e624a3f8af365809c46e8a3ee2f16945b7
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:187550
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:56987
repl_backlog_histlen:130564
[root@Slave bin]# ./redis-cli -h 10.15.43.16 -p 6980 -a 51cto info replication
Warning: Using a password with '-a' option on the command line interface may not be safe.
# Replication
role:slave
master_host:10.15.43.15
master_port:6979
master_link_status:up
master_last_io_seconds_ago:3
master_sync_in_progress:0
slave_repl_offset:187564
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:88f6e9e624a3f8af365809c46e8a3ee2f16945b7
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:187564
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:57197
repl_backlog_histlen:130368
[root@Slave bin]# ./redis-cli -h 10.15.43.15 -p 6979 -a 51cto
Warning: Using a password with '-a' option on the command line interface may not be safe.
10.15.43.15:6979> set justin "51cto id:ityunwei2017"
OK
10.15.43.15:6979> get justin
"51cto id:ityunwei2017"
10.15.43.15:6979> quit
[root@Slave bin]# ./redis-cli -h 10.15.43.16 -p 6979 -a 51cto
Warning: Using a password with '-a' option on the command line interface may not be safe.
10.15.43.16:6979> get justin
"51cto id:ityunwei2017"
10.15.43.16:6979> del justin
(error) READONLY You can't write against a read only slave.
10.15.43.16:6979> quit
[root@Slave bin]# ./redis-cli -h 10.15.43.16 -p 6980 -a 51cto
Warning: Using a password with '-a' option on the command line interface may not be safe.
10.15.43.16:6980> get justin
"51cto id:ityunwei2017"
10.15.43.16:6980> del justin
(error) READONLY You can't write against a read only slave.
10.15.43.16:6980> quit
[root@Slave bin]#

在master上插入鍵值數據後在slave上能夠獲取到，主從同步正常，slave上只能查看，不能進行寫操做。

哨兵(主從切換)配置

Redis官方提供了一個工具sentinel(哨兵)，sentinel在下載的redis源碼裏。

sentinel系統會執行如下3個任務：
一、監控(Monitoring)：不斷的檢查你的主服務器和從服務器是否運行正常；
二、通知(Notification)：當被監控的某個redis服務器出現問題時，Sentinel能夠經過API向管理員或者其餘應用程序發送通知；
三、自動故障遷移(Automatic failover)：當一個主服務器不能正常工做時，sentinel會開始一次自動故障遷移操做，它會將失效的主服務器的其中一個從服務器升級爲新的主服務器，並讓失效主服務器的其餘從服務器爲複製新的主服務器。當客戶端試圖鏈接失效的主服務器時，集羣也會向客戶端返回新主服務器地址，使得集羣可使用新主服務器代替失效服務器。
sentinel的配置

[root@Slave bin]# cp /opt/soft/redis-4.0.10/sentinel.conf ./etc/
[root@Slave bin]# cp /opt/soft/redis-4.0.10/sentinel.conf ./etc/
[root@Slave bin]# mkdir /opt/redis/tmp
[root@Slave bin]# grep -v "#" ../etc/sentinel.conf |sed '/^[[:space:]]*$/d'
port 26979
dir ../tmp
sentinel monitor master15 10.15.43.15 6979 1
sentinel auth-pass master15 51cto
sentinel down-after-milliseconds master15 30000
sentinel parallel-syncs master15 1
sentinel failover-timeout master15 180000
logfile ../logs/sentinel.log
[root@Slave bin]#

master15表示要監控的主庫的名字，能夠本身定義。這個名字必須僅由大小寫字母、數字和」.-」這3個字符組成。後兩個參數表示主庫的IP地址和端口號。最後的1表示最低經過票數，這裏爲了試驗只使用了一個sentinel，實際壞境中建議使用2n+1個sentinel。

[root@Slave bin]# nohup ./redis-sentinel ../etc/sentinel.conf --sentinel &
[1] 3193
[root@Slave bin]# nohup: ignoring input and appending output to ‘nohup.out’
[root@Slave bin]# ./redis-cli -p 26979 info sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=master15,status=ok,address=10.15.43.15:6979,slaves=2,sentinels=1
[root@Slave bin]# tail -500f ../logs/sentinel.log 
3193:X 02 Jul 14:04:46.745 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
3193:X 02 Jul 14:04:46.745 # Redis version=4.0.10, bits=64, commit=00000000, modified=0, pid=3193, just started
3193:X 02 Jul 14:04:46.745 # Configuration loaded
3193:X 02 Jul 14:04:46.747 * Running mode=sentinel, port=26979.
3193:X 02 Jul 14:04:46.747 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
3193:X 02 Jul 14:04:46.749 # Sentinel ID is 2d38409d6a2c8ffadd899180eb874afb17486c2c
3193:X 02 Jul 14:04:46.749 # +monitor master master15 10.15.43.15 6979 quorum 1
3193:X 02 Jul 14:04:46.750 * +slave slave 10.15.43.16:6979 10.15.43.16 6979 @ master15 10.15.43.15 6979
3193:X 02 Jul 14:04:46.751 * +slave slave 10.15.43.16:6980 10.15.43.16 6980 @ master15 10.15.43.15 6979

其中」+slave」表示新發現了從庫，可見哨兵成功地發現了兩個從庫:10.15.43.16:697九、10.15.43.16:6980

這時候sentinel.conf文件被修改了

[root@Slave bin]# grep -v "#" ../etc/sentinel.conf |sed '/^[[:space:]]*$/d'
port 26979
dir "/opt/redis/tmp"
sentinel myid 2d38409d6a2c8ffadd899180eb874afb17486c2c
sentinel monitor master15 10.15.43.15 6979 1
sentinel auth-pass master15 51cto
sentinel config-epoch master15 0
sentinel leader-epoch master15 0
logfile "../logs/sentinel.log"
sentinel known-slave master15 10.15.43.16 6980
sentinel known-slave master15 10.15.43.16 6979
sentinel current-epoch 0
[root@Slave bin]#

如今哨兵已經在監控這3個Redis實例，這時將主庫關閉（殺死進程或使用 shutdown 命令），等待指定時間後（down-after-milliseconds，默認爲 30 秒），哨兵會輸出以下內容：

[root@Slave bin]# tail -500f ../logs/sentinel.log 
3193:X 02 Jul 14:20:20.439 # +sdown master master15 10.15.43.15 6979
3193:X 02 Jul 14:20:20.439 # +odown master master15 10.15.43.15 6979 #quorum 1/1
3193:X 02 Jul 14:20:20.439 # +new-epoch 1
3193:X 02 Jul 14:20:20.439 # +try-failover master master15 10.15.43.15 6979
3193:X 02 Jul 14:20:20.441 # +vote-for-leader 2d38409d6a2c8ffadd899180eb874afb17486c2c 1
3193:X 02 Jul 14:20:20.441 # +elected-leader master master15 10.15.43.15 6979
3193:X 02 Jul 14:20:20.441 # +failover-state-select-slave master master15 10.15.43.15 6979
3193:X 02 Jul 14:20:20.531 # +selected-slave slave 10.15.43.16:6979 10.15.43.16 6979 @ master15 10.15.43.15 6979
3193:X 02 Jul 14:20:20.532 * +failover-state-send-slaveof-noone slave 10.15.43.16:6979 10.15.43.16 6979 @ master15 10.15.43.15 6979
3193:X 02 Jul 14:20:20.615 * +failover-state-wait-promotion slave 10.15.43.16:6979 10.15.43.16 6979 @ master15 10.15.43.15 6979
3193:X 02 Jul 14:20:20.622 # +promoted-slave slave 10.15.43.16:6979 10.15.43.16 6979 @ master15 10.15.43.15 6979
3193:X 02 Jul 14:20:20.622 # +failover-state-reconf-slaves master master15 10.15.43.15 6979
3193:X 02 Jul 14:20:20.692 * +slave-reconf-sent slave 10.15.43.16:6980 10.15.43.16 6980 @ master15 10.15.43.15 6979
3193:X 02 Jul 14:20:21.283 * +slave-reconf-inprog slave 10.15.43.16:6980 10.15.43.16 6980 @ master15 10.15.43.15 6979
3193:X 02 Jul 14:20:22.331 * +slave-reconf-done slave 10.15.43.16:6980 10.15.43.16 6980 @ master15 10.15.43.15 6979
3193:X 02 Jul 14:20:22.394 # +failover-end master master15 10.15.43.15 6979
3193:X 02 Jul 14:20:22.394 # +switch-master master15 10.15.43.15 6979 10.15.43.16 6979
3193:X 02 Jul 14:20:22.394 * +slave slave 10.15.43.16:6980 10.15.43.16 6980 @ master15 10.15.43.16 6979
3193:X 02 Jul 14:20:22.394 * +slave slave 10.15.43.15:6979 10.15.43.15 6979 @ master15 10.15.43.16 6979
3193:X 02 Jul 14:20:52.440 # +sdown slave 10.15.43.15:6979 10.15.43.15 6979 @ master15 10.15.43.16 6979

+sdown，表示哨兵主觀認爲主庫中止服務了（Subjectively Down，簡稱 SDOWN）指的是單個 Sentinel 實例對服務器作出的下線判斷
+odown，表示哨兵客觀認爲主庫中止服務了（Objectively Down，簡稱 ODOWN）指的是多個 Sentinel 實例在對同一個服務器作出 SDOWN 判斷，而且經過SENTINEL is-master-down-by-addr 命令互相交流以後，得出的服務器下線判斷。（一個 Sentinel 能夠經過向另外一個 Sentinel 發送 SENTINEL is-master-down-by-addr 命令來詢問對方是否定爲給定的服務器已下線。）
+try-failover，表示哨兵開始進行故障恢復
+failover-end，表示哨兵完成故障恢復，期間涉及包括領頭哨兵的選舉、備選從庫的選擇等，
+switch-master，表示開始發送切換master指令了，主庫從10.15.43.15 6979遷移到10.15.43.16 6979，
+slave，列出了新主庫的兩個從庫，10.15.43.16:6980、10.15.43.15:6979
+sdown，這裏kill掉了原來的主庫，哨兵主觀認爲主庫中止服務了

哨兵並無完全清除中止服務實例的信息，這是由於中止服務的實例可能會在以後的某個時間恢復服務，這時哨兵會讓其從新加入進來，因此當實例中止服務後，哨兵會更新該實例的信息，使得當其從新加入後能夠按照當前信息繼續對外提供服務。此例中10.15.43.15:6979的主庫實例中止服務了，而10.15.43.16 6979的從庫已經升級爲主庫，當6379端口的實例恢復服務後，會轉變爲6381端口實例的從庫來運行，因此哨兵將6379端口實例的信息修改爲了 6381端口實例的從庫。

[root@Slave bin]# ./redis-cli -p 26979 info sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=master15,status=ok,address=10.15.43.16:6979,slaves=2,sentinels=1
[root@Slave bin]# ./redis-cli -p 6979 -a 51cto info replication
Warning: Using a password with '-a' option on the command line interface may not be safe.
# Replication
role:master
connected_slaves:1
slave0:ip=10.15.43.16,port=6980,state=online,offset=371290,lag=1
master_replid:9f381205761bc87469d017aa8cdf927154861d4a
master_replid2:88f6e9e624a3f8af365809c46e8a3ee2f16945b7
master_repl_offset:371290
second_repl_offset:262772
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:56987
repl_backlog_histlen:314304
[root@Slave bin]# ./redis-cli -p 6980 -a 51cto info replication
Warning: Using a password with '-a' option on the command line interface may not be safe.
# Replication
role:slave
master_host:10.15.43.16
master_port:6979
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_repl_offset:371852
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:9f381205761bc87469d017aa8cdf927154861d4a
master_replid2:88f6e9e624a3f8af365809c46e8a3ee2f16945b7
master_repl_offset:371852
second_repl_offset:262772
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:57197
repl_backlog_histlen:314656
[root@Slave bin]#

此時，從新啓動10.15.43.15 6979實例，查看sentinel日誌

[root@Slave bin]# tail -500f ../logs/sentinel.log 
3193:X 02 Jul 14:55:28.440 # -sdown slave 10.15.43.15:6979 10.15.43.15 6979 @ master15 10.15.43.16 6979
3193:X 02 Jul 14:55:38.398 * +convert-to-slave slave 10.15.43.15:6979 10.15.43.15 6979 @ master15 10.15.43.16 6979

「-sdown」，表示實例10.15.43.15 6979已經恢復了服務（與+sdown相反），
」+convert-to-slave」，表示將10.15.43.15 6979端口的實例設置爲10.15.43.16 6979實例的從庫。

查看10.15.43.15的複製信息

[root@Slave bin]# ./redis-cli -h 10.15.43.15 -p 6979 -a 51cto info replication
Warning: Using a password with '-a' option on the command line interface may not be safe.
# Replication
role:slave
master_host:10.15.43.16
master_port:6979
master_link_status:down
master_last_io_seconds_ago:-1
master_sync_in_progress:0
slave_repl_offset:1
master_link_down_since_seconds:1530514926
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:312b860d03bc08ac37976005df1d8c6fc1378038
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
[root@Slave bin]#

10.15.43.15 6979 限制是slave，可是鏈接master失敗，這是由於如今的master的設置了密碼，須要在配置文件里加上masterauth "51cto"，爲了能主動切換成功過在master設置了密碼是也須要在master的配置文件里加上masterauth "51cto"配置好項，修改後重啓服務

[root@Slave bin]# ./redis-cli -h 10.15.43.15 -p 6979 -a 51cto info replication
Warning: Using a password with '-a' option on the command line interface may not be safe.
# Replication
role:slave
master_host:10.15.43.16
master_port:6979
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_repl_offset:454348
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:9f381205761bc87469d017aa8cdf927154861d4a
master_replid2:88f6e9e624a3f8af365809c46e8a3ee2f16945b7
master_repl_offset:454348
second_repl_offset:57197
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:57197
repl_backlog_histlen:397152
[root@Slave bin]# ./redis-cli -p 6979 -a 51cto info replication
Warning: Using a password with '-a' option on the command line interface may not be safe.
# Replication
role:master
connected_slaves:2
slave0:ip=10.15.43.16,port=6980,state=online,offset=457432,lag=1
slave1:ip=10.15.43.15,port=6979,state=online,offset=457432,lag=0
master_replid:9f381205761bc87469d017aa8cdf927154861d4a
master_replid2:88f6e9e624a3f8af365809c46e8a3ee2f16945b7
master_repl_offset:457432
second_repl_offset:262772
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:56987
repl_backlog_histlen:400446
[root@Slave bin]#

若是此時，主10.15.43.16 6979宕機，在通常狀況下，lag的值應該在0秒或者1秒之間跳動，若是超過1秒的話，那麼說明主從服務器之間的鏈接出現了故障。

[root@Slave bin]# tail -500f ../logs/sentinel.log
3193:X 02 Jul 15:09:35.979 # +sdown master master15 10.15.43.16 6979
3193:X 02 Jul 15:09:35.980 # +odown master master15 10.15.43.16 6979 #quorum 1/1
3193:X 02 Jul 15:09:35.980 # +new-epoch 2
3193:X 02 Jul 15:09:35.980 # +try-failover master master15 10.15.43.16 6979
3193:X 02 Jul 15:09:35.983 # +vote-for-leader 2d38409d6a2c8ffadd899180eb874afb17486c2c 2
3193:X 02 Jul 15:09:35.983 # +elected-leader master master15 10.15.43.16 6979
3193:X 02 Jul 15:09:35.983 # +failover-state-select-slave master master15 10.15.43.16 6979
3193:X 02 Jul 15:09:36.075 # +selected-slave slave 10.15.43.15:6979 10.15.43.15 6979 @ master15 10.15.43.16 6979
3193:X 02 Jul 15:09:36.075 * +failover-state-send-slaveof-noone slave 10.15.43.15:6979 10.15.43.15 6979 @ master15 10.15.43.16 6979
3193:X 02 Jul 15:09:36.131 * +failover-state-wait-promotion slave 10.15.43.15:6979 10.15.43.15 6979 @ master15 10.15.43.16 6979
3193:X 02 Jul 15:09:36.858 # +promoted-slave slave 10.15.43.15:6979 10.15.43.15 6979 @ master15 10.15.43.16 6979
3193:X 02 Jul 15:09:36.858 # +failover-state-reconf-slaves master master15 10.15.43.16 6979
3193:X 02 Jul 15:09:36.916 * +slave-reconf-sent slave 10.15.43.16:6980 10.15.43.16 6980 @ master15 10.15.43.16 6979
3193:X 02 Jul 15:09:37.879 * +slave-reconf-inprog slave 10.15.43.16:6980 10.15.43.16 6980 @ master15 10.15.43.16 6979
3193:X 02 Jul 15:09:37.879 * +slave-reconf-done slave 10.15.43.16:6980 10.15.43.16 6980 @ master15 10.15.43.16 6979
3193:X 02 Jul 15:09:37.961 # +failover-end master master15 10.15.43.16 6979
3193:X 02 Jul 15:09:37.962 # +switch-master master15 10.15.43.16 6979 10.15.43.15 6979
3193:X 02 Jul 15:09:37.962 * +slave slave 10.15.43.16:6980 10.15.43.16 6980 @ master15 10.15.43.15 6979
3193:X 02 Jul 15:09:37.962 * +slave slave 10.15.43.16:6979 10.15.43.16 6979 @ master15 10.15.43.15 6979
3193:X 02 Jul 15:10:07.982 # +sdown slave 10.15.43.16:6979 10.15.43.16 6979 @ master15 10.15.43.15 6979

主自動切換到10.15.43.15 6979

[root@Slave bin]# ./redis-cli -p 26979 info sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=master15,status=ok,address=10.15.43.15:6979,slaves=2,sentinels=1
[root@Slave bin]# ./redis-cli -p 6980 -a 51cto info replication
Warning: Using a password with '-a' option on the command line interface may not be safe.
# Replication
role:slave
master_host:10.15.43.15
master_port:6979
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_repl_offset:472682
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:d92482767f5f83abc3aacb8057081c01a192c122
master_replid2:9f381205761bc87469d017aa8cdf927154861d4a
master_repl_offset:472682
second_repl_offset:463436
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:57197
repl_backlog_histlen:415486
[root@Slave bin]# ./redis-cli -h 10.15.43.15 -p 6979 -a 51cto info replication
Warning: Using a password with '-a' option on the command line interface may not be safe.
# Replication
role:master
connected_slaves:1
slave0:ip=10.15.43.16,port=6980,state=online,offset=473381,lag=1
master_replid:d92482767f5f83abc3aacb8057081c01a192c122
master_replid2:9f381205761bc87469d017aa8cdf927154861d4a
master_repl_offset:473518
second_repl_offset:463436
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:57197
repl_backlog_histlen:416322
[root@Slave bin]#

哨兵進程啓動時讀取配置文件的內容，經過sentinel monitor master-name ip redis-port quorum項找出須要監控的主庫， master-name 是主庫的名字，由於考慮到故障恢復後當前監控的系統的主庫的地址和端口會產生變化，因此哨兵提供了命令能夠經過主庫的名字獲取當前系統的主庫的IP地址和端口號。
一個哨兵節點能夠同時監控多個Redis主從系統，只須要定義每一個主庫的名字」sentinel monitor」配置便可，一個哨兵節點能夠同時監控多個Redis主從系統，只須要提供多個」sentinel monitor」配置便可。snetinel的狀態會被持久化地寫入sentinel的配置文件中。每次當收到一個新的配置時，或者新建立一個配置時，配置會被持久化到硬盤中，並帶上配置的版本戳，這樣就能夠安全的中止和重啓sentinel進程。

哨兵啓動後，會與要監控的主庫創建兩條鏈接：

一、一條用來訂閱該主庫的」sentinel:hello」頻道，以獲取其餘一樣監控該數據庫的哨兵節點的信息。
二、一條按期向主庫發送info等命令來獲取主庫自己的信息，因爲進入訂閱模式時就不能再執行其餘命令了，因此這時哨兵會使用另一條鏈接來發送這些命令。

和主庫的鏈接創建完成後，哨兵會定時執行3個操做:

一、每10秒向主庫和從庫發送info命令
發送info命令使哨兵能夠得到當前數據庫的相關信息（包括運行ID、複製信息等）從而實現新節點的自動發現。配置哨兵監控 Redis 主從系統時只須要指定主庫的信息便可，由於哨兵正是藉助info命令來獲取全部複製該主庫的從庫信息的。
啓動後，哨兵向主庫發送info命令獲得從庫列表，然後對每一個從庫一樣創建兩個鏈接，兩個鏈接的做用和與主庫創建的兩個鏈接徹底一致。在此以後，哨兵會每 10 秒定時向已知的全部主從庫發送info命令來獲取更新信息，並進行相應操做。好比對新增的從庫創建鏈接並加入監控列表，對主從庫的角色變化（由故障恢復操做引發）進行信息更新等。
二、每2秒向主庫和從庫的」sentinel:hello」頻道發送本身的信息，與一樣監控該數據庫的哨兵分享本身的信息。也就是說哨兵不但訂閱了該頻道，並且還會向該頻道發佈信息，以使其餘哨兵獲得本身的信息；
發送的消息內容爲：
<哨兵的地址>,<哨兵的端口>, <哨兵運行ID>, <哨兵的配置版本>, <主庫的名字>, <主庫的地址>, <主庫的端口>, <主庫的配置版本>
哨兵會訂閱每一個其監控的數據庫的」sentinel:hello」頻道，因此當其餘哨兵收到消息後，會判斷髮消息的哨兵是否是新發現的哨兵。若是是，則將其加入已發現的哨兵列表中並建立一個到其的鏈接（與數據庫不一樣，哨兵與哨兵之間只會建立一條鏈接用來發送ping命令，而不須要建立另一條鏈接來訂閱頻道，由於哨兵只須要訂閱數據庫的頻道便可實現自動發現其餘哨兵）。
同時，哨兵會判斷信息中主庫的配置版本，若是該版本比當前記錄的主庫的版本高，則更新主庫的數據。
三、每1秒向主庫、從庫和其餘哨兵節點發送ping命令。
發送ping的時間間隔與」down-after-milliseconds」選項有關，最長間隔爲1秒。當」down-after-milliseconds」的值小於1秒時，哨兵會每隔」down-after-milliseconds」指定的時間發送一次ping命令，當down-after-milliseconds的值大於1秒時，哨兵會每隔1秒發送一次ping命令。
若是超過」down-after-milliseconds」指定時間後，被ping的節點仍未回覆，則哨兵認爲其主觀下線(subjectively down)。主觀下線表示，從當前的哨兵進程看來，該節點已經下線。若是該節點是主庫，則哨兵會進一步判斷是否須要對其進行故障恢復：哨兵發送」SENTINEL is-master-down-by-addr」命令詢問其餘哨兵節點以瞭解他們是否也認爲該主庫主觀下線，如果達到指定數量時，哨兵會認爲其客觀下線(objectively down)，並選舉領頭的哨兵節點發起故障恢復。例如sentinel monitor master15 10.15.43.15 6979 3表示只有當quorum至少3個哨兵節點（包括當前節點）認爲該主庫主觀下線時，當前哨兵節點纔會認爲該主庫客觀下線。
當哨兵節點發現了主庫客觀下線，須要故障恢復，故障恢復須要由領頭的哨兵來完成，這樣能夠保證同一時間只有一個哨兵節點來執行故障恢復。選舉領頭哨兵的過程使用了 Raft算法:
一、發現主庫客觀下線的哨兵節點（哨兵A）向每一個哨兵節點發送命令，要求對方選本身成爲領頭哨兵。
二、若是目標哨兵節點沒有選過其餘人，則會贊成將哨兵A設置成領頭哨兵。
三、若是哨兵A發現有超過半數且超過quorum參數值的哨兵節點贊成選本身成爲領頭哨兵，則哨兵A成功成爲領頭哨兵。
四、如有多個哨兵節點同時參選領頭哨兵，則會出現沒有任何節點當選的可能。此時每一個參選節點將等待一個隨機時間從新發起參選請求，進行下一輪選舉，直到選舉成功。
選出領頭哨兵後，領頭哨兵開始對主庫進行故障恢復：
a、領頭哨兵將從中止服務的主庫的從庫中挑選一個來充當新的主庫：
一、全部在線的從庫中，選擇優先級最高的從庫。優先級能夠經過」slave-priority」選項來設置；
二、若是有多個最高優先級的從庫，則複製的命令偏移量越大（即複製越完整）越優先；
三、若是以上條件都同樣，則選擇運行ID較小的從庫。
b、領頭哨兵將向從庫發送」slaveof no one」命令，使其升級爲主庫。然後領頭哨兵向其餘從庫發送slaveof命令來使其成爲新主庫的從庫。
c、更新內部記錄，將已經中止服務的，舊的主庫更新爲新的主庫的從庫，使得當其恢復服務時自動以從庫的身份繼續服務。
若是一個主從系統中配置的哨兵較少，哨兵對整個系統的判斷的可靠性就會下降。當節點較少時建議爲每一個節點（不管是主庫仍是從庫）部署一個哨兵，同時設置 quorum 的值爲 N/2 + 1（其中N爲哨兵節點數量）；當系統中的節點較多時，考慮到每一個哨兵都會和系統中的全部節點創建鏈接，爲每一個節點分配一個哨兵會產生較多鏈接，尤爲是當進行客戶端分片時使用多個哨兵節點監控多個主庫，會由於 Redis 不支持鏈接複用而產生大量冗餘鏈接
slave的選舉主要會評估slave的如下幾個方面：
與master斷開鏈接的次數
Slave的優先級
數據複製的下標(用來評估slave當前擁有多少master的數據)
進程ID
若是一個slave與master失去聯繫超過10次，而且每次都超過了配置的最大失聯時間(down-after-milliseconds)，若是sentinel在進行failover時發現slave失聯，那麼這個slave就會被sentinel認爲不適合用來作新master的。
爲何要先得到大多數sentinel的承認時才能真正去執行failover呢？

當一個sentinel被受權後，它將會得到宕掉的master的一份最新配置版本號，當failover執行結束之後，這個版本號將會被用於最新的配置。由於大多數sentinel都已經知道該版本號已經被要執行failover的sentinel拿走了，因此其餘的sentinel都不能再去使用這個版本號。這意味着，每次failover都會附帶有一個獨一無二的版本號。咱們將會看到這樣作的重要性。

並且，sentinel集羣都遵照一個規則：若是sentinel A推薦sentinel B去執行failover，B會等待一段時間後，自行再次去對同一個master執行failover，這個等待的時間是經過failover-timeout配置項去配置的。從這個規則能夠看出，sentinel集羣中的sentinel不會再同一時刻併發去failover同一個master，第一個進行failover的sentinel若是失敗了，另一個將會在必定時間內進行從新進行failover，以此類推。

redis sentinel保證了活躍性：若是大多數sentinel可以互相通訊，最終將會有一個被受權去進行failover.
redis sentinel也保證了安全性：每一個試圖去failover同一個master的sentinel都會獲得一個獨一無二的版本號。

sentinel集羣中各個sentinel也有互相通訊，經過gossip協議

一旦一個sentinel成功地對一個master進行了failover，它將會把關於master的最新配置經過廣播形式通知其它sentinel，其它的sentinel則更新對應master的配置。

一個faiover要想被成功實行，sentinel必須可以向選爲master的slave發送SLAVEOF NO ONE命令，而後可以經過INFO命令看到新master的配置信息。

當將一個slave選舉爲master併發送SLAVEOF NO ONE後，即便其它的slave還沒針對新master從新配置本身，failover也被認爲是成功了的，而後全部sentinels將會發布新的配置信息。

新配在集羣中相互傳播的方式，就是爲何咱們須要當一個sentinel進行failover時必須被受權一個版本號的緣由。

即便當前沒有failover正在進行，sentinel依然會使用當前配置去設置監控的master，當最新配置確認爲slaves的節點卻聲稱本身是master，這時它們會被從新配置爲當前master的slave。若是slaves鏈接了一個錯誤的master，將會被改正過來，鏈接到正確的master。

每一個sentinel使用##發佈/訂閱##的方式持續地傳播master的配置版本信息，配置傳播的##發佈/訂閱##管道是：sentinel:hello。

由於每個配置都有一個版本號，因此以版本號最大的那個爲標準。

不一樣機房部署redis主從

有三個主機，每一個主機分別運行一個redis和一個sentinel，redis2和redis3在一個機房A，redis1在另一個機房B，當機房A與機房B網絡中斷時，sentinel3和sentinel2啓動了failover並把redis2選舉爲master。此時sentinel1依然是舊的配置，由於它與sentinel三、sentinel2隔離了。當網絡恢復之後，sentinel1將會更新它的配置,講redis1變成redis2的slave，而在網絡斷開期間客戶端依然能夠向redis1寫入數據，這樣網絡恢復後，客戶端在網絡斷開期間寫入redis1的數據就會丟失。

Redis的min-slaves-to-write和min-slaves-max-lag兩個選項能夠防止主服務器在不安全的狀況下執行寫命令。這個時候能夠經過修改redis配置，讓網絡斷開期間redis1拒絕客戶端的寫請求。min-slaves-to-write和min-slaves-max-lag兩個選項能夠防止主服務器在不安全的狀況下執行寫命令。min-slaves-to-write 2 min-slaves-max-lag 10從服務器的數量少於2個，或者2個從服務器的延遲（lag）值都大於或等於10秒時，主服務器將拒絕執行寫命令，這裏的延遲值是INFO replication命令的lag值。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。