Heartbeat+DRBD+MySQL高可用方案

1.方案簡介

本方案採用Heartbeat雙機熱備軟件來保證數據庫的高穩定性和連續性,數據的一致性由DRBD這個工具來保證。默認情況下只有一臺mysql在工作,當主mysql服務器出現問題後,系統將自動切換到備機上繼續提供服務,當主數據庫修復完畢,又將服務切回繼續由主mysql提供服務。

2.方案優缺點

優點:安全性高、穩定性高、可用性高,出現故障自動切換。

缺點:只有一臺服務器提供服務,成本相對較高,不方便擴展,可能會發生腦裂。

3.軟件介紹

Heartbeat介紹

官方站點:http://linux-ha.org/wiki/Main_Page

heartbeat可以資源(VIP地址及程序服務)從一臺有故障的服務器快速的轉移到另一臺正常的服務器提供服務,heartbeat和keepalived相似,heartbeat可以實現failover功能,但不能實現對後端的健康檢查

DRBD介紹

官方站點:http://www.drbd.org/

DRBD(DistributedReplicatedBlockDevice)是一個基於塊設備級別在遠程服務器直接同步和鏡像數據的軟件,用軟件實現的、無共享的、服務器之間鏡像塊設備內容的存儲複製解決方案。它可以實現在網絡中兩臺服務器之間基於塊設備級別的實時鏡像或同步複製(兩臺服務器都寫入成功)/異步複製(本地服務器寫入成功),相當於網絡的RAID1,由於是基於塊設備(磁盤,LVM邏輯卷),在文件系統的底層,所以數據複製要比cp命令更快。DRBD已經被MySQL官方寫入文檔手冊作爲推薦的高可用的方案之一

4.方案拓撲

5.方案適用場景:

適用於數據庫訪問量不太大,短期內訪問量增長不會太快,對數據庫可用性要求非常高的場景。

6.測試環境介紹(如下所示,均已關閉防火牆及selinux,生產環境自行開放端口)

主機名             ip               系統                DRBD磁盤                heartbeat版本
db-server-01    192.168.0.10    centos6.2 64bit         /dev/sda5                  3.0.4
db-server-02    192.168.0.20    centos6.2 64bit         /dev/sda5                  3.0.4

7.軟件安裝以及環境配置

(1)安裝drbd依賴組件(兩臺機器,安裝以後重啓系統,因爲會升級內核版本,不重啓會對不上內核版本,有知道不用重啓的童鞋請給我留言^_^):

yum install -y kernel kernel-devel kernel-headers  flex 

(2)下載軟件安裝(兩臺機器操作一樣)

wget http://oss.linbit.com/drbd/8.4/drbd-8.4.2.tar.gz

複製代碼

tar xf drbd-8.4.2.tar.gz 
cd drbd-8.4.2
./configure --prefix=/usr/local/drbd --with-km
make KDIR=/usr/src/kernels/2.6.32-431.11.2.el6.x86_64/   #很多童鞋無法加載drbd模塊,多半是正在運行的內核版本和新安裝的不相符
make install
mkdir -p /usr/local/drbd/var/run/drbd
cp /usr/local/drbd/etc/rc.d/init.d/drbd /etc/rc.d/init.d
chmod 755 /etc/init.d/drbd
cd drbd
make clean
make KDIR=/usr/src/kernels/2.6.32-431.11.2.el6.x86_64/
cp drbd.ko /lib/modules/`uname -r`/kernel/lib/
modprobe drbd

複製代碼

檢查是否加載了drbd模塊

[[email protected] ~]# lsmod | grep drbd
drbd                  314246  0 
libcrc32c               1246  1 drbd
[[email protected] ~]# 

(3)DRBD配置(配置之前需要先使用fdisk對 /dev/sda進行分區)

複製代碼

[[email protected] ~]# df -HT
Filesystem    Type     Size   Used  Avail Use% Mounted on
/dev/sda2     ext4      19G   2.6G    16G  15% /
tmpfs        tmpfs     121M      0   121M   0% /dev/shm
/dev/sda1     ext4     204M    52M   141M  27% /boot
/dev/sda5     ext4      34G   185M    32G   1% /data
[[email protected] ~]# 

複製代碼

我這裏兩臺機器之前都已經分區了,由於是自己筆記本上的虛擬機,所以懶得加磁盤了,我直接把 /data/卸載,然後格式化/dev/sda5,我兩臺機器都這樣操作,如果你有空的磁盤,照樣需要進行分區,比如可以將一個1T的盤分一個區就行了。

複製代碼

[[email protected] ~]# umount /data/         
[[email protected] ~]# mkfs.ext4 /dev/sda5
mke2fs 1.41.12 (17-May-2010)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
2048000 inodes, 8185344 blocks
409267 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
250 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks: 
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
        4096000, 7962624

Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 28 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.
[[email protected] ~]# 

複製代碼

複製代碼

[[email protected] ~]# fdisk -l

Disk /dev/sda: 53.7 GB, 53687091200 bytes
255 heads, 63 sectors/track, 6527 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000eb0ff

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          26      204800   83  Linux
Partition 1 does not end on cylinder boundary.
/dev/sda2              26        2321    18432000   83  Linux
/dev/sda3            2321        2451     1048576   82  Linux swap / Solaris
/dev/sda4            2451        6528    32742400    5  Extended
/dev/sda5            2451        6528    32741376   83  Linux
[[email protected] ~]# 

複製代碼

我這裏還要在/etc/fstab裏面註釋一項:

#UUID=33958004-e8a7-4135-844f-707a5537e86a /data                   ext4    defaults        1 2

否則重啓機器的時候提示無法掛載,會無法啓動的。

修改/etc/hosts文件,兩臺服務器操作一樣。

192.168.0.10    db-server-01
192.168.0.20    db-server-02

drbd配置只需要修改/usr/local/drbd/etc/drbd.d/global_common.conf配置文件即可,修改後如下(兩臺服務器配置一樣):

複製代碼

[[email protected] ~]# cat /usr/local/drbd/etc/drbd.d/global_common.conf
global { usage-count yes; }
common { syncer { rate 30M; } }       #同步速率,視帶寬而定 
resource r0 {                         #創建一個資源,名字叫"r0" 
        protocol C;                   #選擇的是drbd的C 協議(數據同步協議,C爲收到數據並寫入後返回,確認成功) 
        startup {
        }
        disk {
                on-io-error detach;
        }
        net {
        }
        on db-server-01 {            #設定一個節點,分別以各自的主機名命名 
                device /dev/drbd0;   #設定資源設備/dev/drbd0 指向實際的物理分區 /dev/sda5
                disk /dev/sda5;
                address 192.168.0.10:7888;  #設定監聽地址以及端口 
                meta-disk internal;
        }
        on db-server-02 {
                device /dev/drbd0;
                disk /dev/sda5;
                address 192.168.0.20:7888;
                meta-disk internal;     #internal表示是在同一個局域網內 
        }
}
[[email protected] ~]# 

複製代碼

(4)DRBD的管理與維護:

創建DRBD資源

配置好drbd以後,就需要使用命令創建配置的drbd資源,使用如下命令(兩臺服務器操作一樣):

[[email protected] ~]# dd if=/dev/zero of=/dev/sda5 bs=1M count=100  #不這樣做的話,在創建資源的時候報錯
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 3.34339 s, 31.4 MB/s
[[email protected] ~]# 

複製代碼

[[email protected] ~]# drbdadm create-md r0                             
Writing meta data...
initializing activity log
NOT initializing bitmap
New drbd meta data block successfully created.
success
[[email protected] ~]# 

複製代碼

(5)DRBD的啓動與狀態查看(分別在兩臺服務器啓動)

複製代碼

[[email protected] ~]# /etc/init.d/drbd start               
Starting DRBD resources: [
     create res: r0
   prepare disk: r0
    adjust disk: r0
     adjust net: r0
]
.....
[[email protected] ~]# 

複製代碼

複製代碼

[[email protected] ~]# /etc/init.d/drbd start
Starting DRBD resources: [
     create res: r0
   prepare disk: r0
    adjust disk: r0
     adjust net: r0
]
.
[[email protected] ~]# 

複製代碼

查看drbd的狀態:

複製代碼

[[email protected] ~]# /etc/init.d/drbd status
drbd driver loaded OK; device status:
version: 8.4.2 (api:1/proto:86-101)
GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by [email protected], 2014-04-18 21:15:57
m:res  cs         ro                   ds                         p  mounted  fstype
0:r0   Connected  Secondary/Secondary  Inconsistent/Inconsistent  C
[[email protected] ~]# 

複製代碼

可以看見都還沒有主節點。設置當前節點(192.168.0.10)爲主節點,並進行格式化和掛載 。

drbdadm -- --overwrite-data-of-peer primary all
mkfs.ext4 /dev/drbd0
mkdir /data
mount /dev/drbd0 /data/

在另外一臺服務器創建掛載目錄,也創建/data

[[email protected] ~]# mkdir /data

查看一下drbd的狀態(可以看見還在同步):

複製代碼

[[email protected] ~]# /etc/init.d/drbd status
drbd driver loaded OK; device status:
version: 8.4.2 (api:1/proto:86-101)
GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by [email protected], 2014-04-18 21:15:57
m:res  cs          ro                 ds                     p  mounted  fstype
...    sync'ed:    13.7%              (27596/31972)M
0:r0   SyncSource  Primary/Secondary  UpToDate/Inconsistent  C  /data    ext4
[[email protected] ~]# 

複製代碼

(6)mysql安裝,我這裏爲了簡單直接安裝編譯好的二進制軟件包(兩臺服務器都需要安裝,操作一樣,只是第二臺mysql不需要初始化數據)

注意:兩臺服務器上的mysql用戶的uid和gid要一樣。不然切換後會導致mysql數據目錄的屬主不正確而啓動失敗。

[[email protected] ~]# wget http://cdn.mysql.com/Downloads/MySQL-5.5/mysql-5.5.37-linux2.6-x86_64.tar.gz

複製代碼

[[email protected] ~]# tar xf mysql-5.5.37-linux2.6-x86_64.tar.gz -C /usr/local/
[[email protected] ~]# cd /usr/local/
[[email protected] local]# ln -s mysql-5.5.37-linux2.6-x86_64/ mysql
[[email protected] local]# groupadd mysql
[[email protected] local]# useradd -r -g mysql mysql
[[email protected] local]# cd mysql
[[email protected] mysql]# chown -R mysql .
[[email protected] mysql]# chgrp -R mysql .
[[email protected] mysql]# mkdir /data/mysql
[[email protected] mysql]# chown -R mysql.mysql /data/mysql/
[[email protected] mysql]# /usr/local/mysql/scripts/mysql_install_db --user=mysql --datadir=/data/mysql/ --basedir=/usr/local/mysql

複製代碼

[[email protected] mysql]# chown -R root .
[[email protected] mysql]# cp support-files/my-medium.cnf /etc/my.cnf
[[email protected] mysql]# cp support-files/mysql.server /etc/init.d/mysqld
[[email protected] mysql]# chmod 755 /etc/init.d/mysqld 
[[email protected] mysql]# egrep 'datadir|basedir' /etc/my.cnf       #兩臺服務器上的mysql配置文件都加入這裏的配置 
datadir=/data/mysql
basedir=/usr/local/mysql                                
[[email protected] mysql]# 

(7)手動切換drbd的主從。看另外一臺服務器是否有數據(自動切換需要使用heartbeat,後面介紹):

[[email protected] ~]# ll /data/
total 20
drwx------ 2 root  root  16384 Apr 18 22:16 lost+found
drwxr-xr-x 5 mysql mysql  4096 Apr 18 23:01 mysql
[[email protected] ~]# 
[[email protected] ~]# ll /data/
total 0
[[email protected] ~]# 

複製代碼

[[email protected] ~]# /etc/init.d/drbd status
drbd driver loaded OK; device status:
version: 8.4.2 (api:1/proto:86-101)
GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by [email protected], 2014-04-18 21:15:57
m:res  cs         ro                 ds                 p  mounted  fstype
0:r0   Connected  Primary/Secondary  UpToDate/UpToDate  C  /data    ext4
[[email protected] ~]# 

複製代碼

可以看見當前服務器是主,也就是數據在這臺服務器上,另外一臺服務器是沒有數據的。下面進行手動切換

主切換成從,需要先卸載文件系統,再執行降級爲從的命令:

[[email protected] ~]# umount /data/
[[email protected] ~]# drbdadm secondary all

從切換成主,要先執行升級成主的命令然後掛在文件系統:

複製代碼

[[email protected] ~]# drbdadm  primary all
[[email protected] ~]# mount /dev/drbd0 /data/
[[email protected] ~]# ll /data/
total 20
drwx------ 2 root  root  16384 Apr 18 22:16 lost+found
drwxr-xr-x 5 mysql mysql  4096 Apr 18 23:01 mysql
[[email protected] ~]# /etc/init.d/drbd status
drbd driver loaded OK; device status:
version: 8.4.2 (api:1/proto:86-101)
GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by [email protected], 2014-04-18 21:22:55
m:res  cs         ro                 ds                 p  mounted  fstype
0:r0   Connected  Primary/Secondary  UpToDate/UpToDate  C  /data    ext4
[[email protected] ~]# 

複製代碼

可以看見已經成功切換成主,並且mysql初始化數據也存在了。

DRBD腦裂後的處理

當DRBD出現腦裂後,會導致drbd兩邊的磁盤數據不一致,在確定要作爲從的節點上切換成secondary,並放棄該資源的數據:

drbdadm secondary r0
drbdadm -- --discard-my-data connect r0

在要作爲primary的節點重新連接secondary(如果這個節點當前的連接狀態爲WFConnection的話,可以省略),使用如下命令連接:

drbdadm connect r0

(8)Heartbeat安裝(兩臺服務器)

需要添加epel源,centos默認自己沒有該軟件包,當然你可以自己源碼編譯。

rpm -ivh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
yum install heartbeat -y

創建DRBD腳本文件drbddisk:(兩臺服務器)

注意:

此處是一個大坑,因爲默認yum安裝Heartbeat,不會在/etc/ha.d/resource.d/創建drbddisk腳本,估計是版本太新了吧。記得前兩年都不會這樣的。囧。而且也無法在安裝後從本地其他路徑找到該文件。此處也是因爲啓動Heartbeat後無法PING通虛IP,最後通過查看/var/log/ha-log日誌,找到一行ERROR: Cannot locate resource script drbddisk,然後進而到/etc/ha.d/resource.d/路徑下發現竟然沒有drbddisk腳本,最後在google上找到該代碼,創建該腳本,終於測試通過:

複製代碼

[[email protected] ~]# chmod 755 /etc/ha.d/resource.d/drbddisk 
[[email protected] ~]# cat /etc/ha.d/resource.d/drbddisk 
#!/bin/bash
#
# This script is inteded to be used as resource script by heartbeat
#
# Copright 2003-2008 LINBIT Information Technologies
# Philipp Reisner, Lars Ellenberg
#
###

DEFAULTFILE="/etc/default/drbd"
DRBDADM="/sbin/drbdadm"

if [ -f $DEFAULTFILE ]; then
 . $DEFAULTFILE
fi

if [ "$#" -eq 2 ]; then
 RES="$1"
 CMD="$2"
else
 RES="all"
 CMD="$1"
fi

## EXIT CODES
# since this is a "legacy heartbeat R1 resource agent" script,
# exit codes actually do not matter that much as long as we conform to
#  http://wiki.linux-ha.org/HeartbeatResourceAgent
# but it does not hurt to conform to lsb init-script exit codes,
# where we can.
#  http://refspecs.linux-foundation.org/LSB_3.1.0/
#LSB-Core-generic/LSB-Core-generic/iniscrptact.html
####

drbd_set_role_from_proc_drbd()
{
local out
if ! test -e /proc/drbd; then
ROLE="Unconfigured"
return
fi

dev=$( $DRBDADM sh-dev $RES )
minor=${dev#/dev/drbd}
if [[ $minor = *[!0-9]* ]] ; then
# sh-minor is only supported since drbd 8.3.1
minor=$( $DRBDADM sh-minor $RES )
fi
if [[ -z $minor ]] || [[ $minor = *[!0-9]* ]] ; then
ROLE=Unknown
return
fi

if out=$(sed -ne "/^ *$minor: cs:/ { s/:/ /g; p; q; }" /proc/drbd); then
set -- $out
ROLE=${5%/**}
: ${ROLE:=Unconfigured} # if it does not show up
else
ROLE=Unknown
fi
}

case "$CMD" in
   start)
# try several times, in case heartbeat deadtime
# was smaller than drbd ping time
try=6
while true; do
$DRBDADM primary $RES && break
let "--try" || exit 1 # LSB generic error
sleep 1
done
;;
   stop)
# heartbeat (haresources mode) will retry failed stop
# for a number of times in addition to this internal retry.
try=3
while true; do
$DRBDADM secondary $RES && break
# We used to lie here, and pretend success for anything != 11,
# to avoid the reboot on failed stop recovery for "simple
# config errors" and such. But that is incorrect.
# Don't lie to your cluster manager.
# And don't do config errors...
let --try || exit 1 # LSB generic error
sleep 1
done
;;
   status)
if [ "$RES" = "all" ]; then
   echo "A resource name is required for status inquiries."
   exit 10
fi
ST=$( $DRBDADM role $RES )
ROLE=${ST%/**}
case $ROLE in
Primary|Secondary|Unconfigured)
# expected
;;
*)
# unexpected. whatever...
# If we are unsure about the state of a resource, we need to
# report it as possibly running, so heartbeat can, after failed
# stop, do a recovery by reboot.
# drbdsetup may fail for obscure reasons, e.g. if /var/lock/ is
# suddenly readonly.  So we retry by parsing /proc/drbd.
drbd_set_role_from_proc_drbd
esac
case $ROLE in
Primary)
echo "running (Primary)"
exit 0 # LSB status "service is OK"
;;
Secondary|Unconfigured)
echo "stopped ($ROLE)"
exit 3 # LSB status "service is not running"
;;
*)
# NOTE the "running" in below message.
# this is a "heartbeat" resource script,
# the exit code is _ignored_.
echo "cannot determine status, may be running ($ROLE)"
exit 4 #  LSB status "service status is unknown"
;;
esac
;;
   *)
echo "Usage: drbddisk [resource] {start|stop|status}"
exit 1
;;
esac

exit 0
[[email protected] ~]# 

複製代碼

(9)heartbeat配置

Hearbeat的配置主要包括三個配置文件,authkeys,ha.cf和haresources的配置,下面就分別來看看:

Authkerys的配置(兩臺服務器配置一樣)

這個文件用來配置密碼認證方式,支持3種認證方式,crc,md5和sha1,從左到右安全性越來越高,消耗的資源也越多。因此如果heartbeat運行在安全的網路之上,比如私網,那麼可以將驗證方式設置成crc,master和backup的authkeys配置一樣。我的authkeys文件配置如下:

[[email protected] ~]# cat /etc/ha.d/authkeys 
auth 1
1 crc
[[email protected] ~]# chmod 600 /etc/ha.d/authkeys

注意:該文件權限必須是600

ha.cf的配置(兩臺機器稍微有點區別),Primary(192.168.0.10)如下:

複製代碼

[[email protected] ~]# cat /etc/ha.d/ha.cf 
logfile /var/log/ha-log 
#定義Heartbeat的日誌名字及位置 
logfacility local0 
keepalive 2 
#設定心跳(監測)時間爲2秒 
deadtime 15 
#設定死亡時間爲15秒 
ucast eth1 192.168.0.20
#採用單播的方式,IP地址指定爲對方IP 
auto_failback off 
#當Primary機器發生故障切換到Secondary機器後Primary恢復後是否進行切回操作 (最好是我們有需求手動進行切換)
node db-server-01
node db-server-02
[[email protected] ~]# 

複製代碼

Secondary(192.168.0.20)如下:

複製代碼

[[email protected] ~]# cat /etc/ha.d/ha.cf 
logfile /var/log/ha-log 
#定義Heartbeat的日誌名字及位置 
logfacility local0 
keepalive 2 
#設定心跳(監測)時間爲2秒 
deadtime 15 
#設定死亡時間爲15秒 
ucast eth1 192.168.0.10
#採用單播的方式,IP地址指定爲對方IP 
auto_failback off
#當Primary機器發生故障切換到Secondary機器後Primary恢復後是否進行切回操作(一般我們可以看需求,否則不用自動切換) 
node db-server-01
node db-server-02
[[email protected] ~]# 

複製代碼

haresources的配置(兩臺機器配置一樣):

[[email protected] ~]# cat /etc/ha.d/haresources 
db-server-01 IPaddr::192.168.0.88/24/eth1 drbddisk::r0 Filesystem::/dev/drbd0::/data::ext4  mysqld 
[[email protected] ~]# 

注:該文件內IPaddr,Filesystem等腳本存放路徑在/etc/ha.d/resource.d/下,也可在該目錄下存放服務啓動腳本(例如:mysqld),將相同腳本名稱添到/etc/ha.d/haresources內容中,從而跟隨heartbeat啓動而啓動該腳本。

IPaddr::192.168.0.88/24/eth1:用IPaddr腳本配置浮動VIP

drbddisk::r0:用drbddisk腳本實現DRBD主從節點資源組的掛載和卸載

Filesystem::/dev/drbd0::/data::ext4:用Filesystem腳本實現磁盤掛載和卸載

(10)heartbeat的管理

配置好heartbeat之後,需要將mysql從自啓動服務器中去掉,因爲主heartbeat啓動的時候會掛載drdb文件系統以及啓動mysql,切換的時候會將主上的mysql停止並卸載文件系統,從上會掛載文件系統,並啓動mysql。因此需要做如下操作(兩臺服務器):

[[email protected] ~]# chkconfig mysqld off
[[email protected] ~]# chkconfig heartbeat off
[[email protected] ~]# chkconfig drbd off     

複製代碼

[[email protected] ~]# cat /etc/rc.local 
#!/bin/sh
#
# This script will be executed *after* all the other init scripts.
# You can put your own initialization stuff in here if you don't
# want to do the full Sys V style init stuff.

touch /var/lock/subsys/local
modprobe drbd              #必須先加載模塊,這也是因爲將啓動命令放在這裏的原因
/etc/init.d/drbd start
/etc/init.d/heartbeat start

[[email protected] ~]# 

複製代碼

到這裏heartbeat+drbd+mysql高可用環境就搭建結束了。接下來進行測試。

高可用測試

(1)在第一臺服務器上面啓動mysql服務。(192.168.0.10)

[[email protected] ~]# /etc/init.d/mysqld start
Starting MySQL.The server quit without updating PID file (/[FAILED]ql/db-server-01.pid).
[[email protected] ~]# ll /data/
total 0
[[email protected] ~]# 

怎麼回事?/data/下面爲空。這裏是因爲我們在前面已經把這個節點變爲Secondary

複製代碼

[[email protected] ~]# /etc/init.d/drbd status
drbd driver loaded OK; device status:
version: 8.4.2 (api:1/proto:86-101)
GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by [email protected], 2014-04-18 21:15:57
m:res  cs         ro                 ds                 p  mounted  fstype
0:r0   Connected  Secondary/Primary  UpToDate/UpToDate  C
[[email protected] ~]# 

複製代碼

我們現在需要手動切換回來。才能啓動mysql

[[email protected] ~]# umount /data/
[[email protected] ~]# drbdadm secondary all
[[email protected] ~]# 

複製代碼

[[email protected] ~]# drbdadm  primary all
[[email protected] ~]# mount /dev/drbd0 /data/
[[email protected] ~]# ll /data/
total 20
drwx------ 2 root  root  16384 Apr 18 22:16 lost+found
drwxr-xr-x 5 mysql mysql  4096 Apr 18 23:01 mysql
[[email protected].0.10 ~]# /etc/init.d/drbd status
drbd driver loaded OK; device status:
version: 8.4.2 (api:1/proto:86-101)
GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by [email protected], 2014-04-18 21:15:57
m:res  cs         ro                 ds                 p  mounted  fstype
0:r0   Connected  Primary/Secondary  UpToDate/UpToDate  C  /data    ext4
[[email protected] ~]# 

複製代碼

可以看見已經切換回來了,我們現在可以啓動mysql了。

複製代碼

[[email protected] ~]# /etc/init.d/mysqld start             
Starting MySQL.......                                      [  OK  ]
[[email protected] ~]# mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.5.37-log MySQL Community Server (GPL)

Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> 

複製代碼

(2)在兩臺服務器上面啓動heartbeat

[[email protected] ~]# /etc/init.d/heartbeat start
Starting High-Availability services: INFO:  Resource is stopped
Done.

[[email protected] ~]# 
[[email protected] ~]# /etc/init.d/heartbeat start
Starting High-Availability services: INFO:  Resource is stopped
Done.

[[email protected] ~]# 
[[email protected] ~]# ip addr | grep eth1
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    inet 192.168.0.10/24 brd 192.168.0.255 scope global eth1
    inet 192.168.0.88/24 brd 192.168.0.255 scope global secondary eth1
[[email protected] ~]# 

可以看見虛擬ip192.168.0.88已經存在了。說明成功了。我們看看heartbeat的日誌就能發現。

複製代碼

[[email protected] ~]# tail -n 20 /var/log/ha-log  
harc(default)[5598]:    2014/04/19_00:25:21 info: Running /etc/ha.d//rc.d/status status
Apr 19 00:25:22 db-server-01 heartbeat: [5591]: info: Comm_now_up(): updating status to active
Apr 19 00:25:22 db-server-01 heartbeat: [5591]: info: Local status now set to: 'active'
Apr 19 00:25:22 db-server-01 heartbeat: [5591]: info: Status update for node db-server-02: status active
harc(default)[5618]:    2014/04/19_00:25:22 info: Running /etc/ha.d//rc.d/status status
Apr 19 00:25:33 db-server-01 heartbeat: [5591]: info: remote resource transition completed.
Apr 19 00:25:33 db-server-01 heartbeat: [5591]: info: remote resource transition completed.
Apr 19 00:25:33 db-server-01 heartbeat: [5591]: info: Initial resource acquisition complete (T_RESOURCES(us))
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.0.88)[5671]:   2014/04/19_00:25:33 INFO:  Resource is stopped
Apr 19 00:25:33 db-server-01 heartbeat: [5635]: info: Local Resource acquisition completed.
harc(default)[5752]:    2014/04/19_00:25:33 info: Running /etc/ha.d//rc.d/ip-request-resp ip-request-resp
ip-request-resp(default)[5752]: 2014/04/19_00:25:33 received ip-request-resp IPaddr::192.168.0.88/24/eth1 OK yes
ResourceManager(default)[5775]: 2014/04/19_00:25:33 info: Acquiring resource group: db-server-01 IPaddr::192.168.0.88/24/eth1 drbddisk::r0 Filesystem::/dev/drbd0::/data::ext4 mysqld
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.0.88)[5803]:   2014/04/19_00:25:33 INFO:  Resource is stopped
ResourceManager(default)[5775]: 2014/04/19_00:25:33 info: Running /etc/ha.d/resource.d/IPaddr 192.168.0.88/24/eth1 start
IPaddr(IPaddr_192.168.0.88)[5926]:      2014/04/19_00:25:34 INFO: Adding inet address 192.168.0.88/24 with broadcast address 192.168.0.255 to device eth1
IPaddr(IPaddr_192.168.0.88)[5926]:      2014/04/19_00:25:34 INFO: Bringing device eth1 up
IPaddr(IPaddr_192.168.0.88)[5926]:      2014/04/19_00:25:34 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-192.168.0.88 eth1 192.168.0.88 auto not_used not_used
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.0.88)[5900]:   2014/04/19_00:25:34 INFO:  Success
/usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[6030]:     2014/04/19_00:25:34 INFO:  Running OK
[[email protected] ~]# 

複製代碼

激動的時刻到了,我們測試一下自動切換。我們先看看兩臺服務器的狀態:

複製代碼

[[email protected] ~]# df -HT
Filesystem    Type     Size   Used  Avail Use% Mounted on
/dev/sda2     ext4      19G   3.5G    15G  20% /
tmpfs        tmpfs     121M      0   121M   0% /dev/shm
/dev/sda1     ext4     204M    52M   141M  27% /boot
/dev/drbd0    ext4      33G   216M    32G   1% /data
[[email protected] ~]# 

複製代碼

[[email protected] ~]# df -HT
Filesystem    Type     Size   Used  Avail Use% Mounted on
/dev/sda2     ext4      19G   4.9G    13G  28% /
tmpfs        tmpfs     121M      0   121M   0% /dev/shm
/dev/sda1     ext4     204M    52M   141M  27% /boot
[[email protected] ~]# 

可以看見掛載在第一臺服務器。

測試方法:

1.停掉master上的mysqld,看看是否切換(因爲heartheat不檢查服務的可用性,因此需要通過而外的腳本來實現)。
2.停掉master的heartheat看看是否能正常切換。 
3.停掉master的網絡或者直接將master系統shutdown,看看能否正常切換。 
4.啓動master的heartbeat看看是否能正常切換回來。 
5.重新啓動master看看能否切換過程是否OK。 
注意:這裏說的切換是不是已經將mysql停掉、是否卸載了文件系統等等。

我就停止master(192.168.0.10)上的heartbeat來測試是否會自動切換,這裏除了第一條無法實現,其他的都可以切換:

[[email protected] ~]# /etc/init.d/heartbeat stop
Stopping High-Availability services: Done.

複製代碼

[[email protected] ~]# df -HT
Filesystem    Type     Size   Used  Avail Use% Mounted on
/dev/sda2     ext4      19G   3.5G    15G  20% /
tmpfs        tmpfs     121M      0   121M   0% /dev/shm
/dev/sda1     ext4     204M    52M   141M  27% /boot
[[email protected]8.0.10 ~]# /etc/init.d/drbd status
drbd driver loaded OK; device status:
version: 8.4.2 (api:1/proto:86-101)
GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by [email protected], 2014-04-18 21:15:57
m:res  cs         ro                 ds                 p  mounted  fstype
0:r0   Connected  Secondary/Primary  UpToDate/UpToDate  C
[[email protected] ~]# 

複製代碼

可以看見已經切換了,我們看另外一臺機器的情況:

複製代碼

[[email protected] ~]# df -HT
Filesystem    Type     Size   Used  Avail Use% Mounted on
/dev/sda2     ext4      19G   4.9G    13G  28% /
tmpfs        tmpfs     121M      0   121M   0% /dev/shm
/dev/sda1     ext4     204M    52M   141M  27% /boot
/dev/drbd0    ext4      33G   216M    32G   1% /data
[[email protected] ~]# netstat -nltp | grep 3306 | grep -v grep
tcp        0      0 0.0.0.0:3306                0.0.0.0:*                   LISTEN      5542/mysqld         
[[email protected] ~]# 

複製代碼

可以發現已經切換過來,mysql也自動啓動了。之前是沒有啓動的。

複製代碼

[[email protected] ~]# ip addr | grep eth1
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    inet 192.168.0.20/24 brd 192.168.0.255 scope global eth1
    inet 192.168.0.88/24 brd 192.168.0.255 scope global secondary eth1
[[email protected] ~]# mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.5.37-log MySQL Community Server (GPL)

Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> 

複製代碼

可以看見,一切正常呢。如果我們查看日誌,就可以看見到底發生了什麼。

複製代碼

[[email protected] ~]# tail -n 10 /var/log/ha-log 
ResourceManager(default)[4768]: 2014/04/19_00:36:42 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /data ext4 start
Filesystem(Filesystem_/dev/drbd0)[5131]:        2014/04/19_00:36:42 INFO: Running start for /dev/drbd0 on /data
/usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[5122]:     2014/04/19_00:36:42 INFO:  Success
ResourceManager(default)[4768]: 2014/04/19_00:36:43 info: Running /etc/init.d/mysqld  start
mach_down(default)[4741]:       2014/04/19_00:36:46 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down(default)[4741]:       2014/04/19_00:36:46 info: mach_down takeover complete for node db-server-01.
Apr 19 00:36:46 db-server-02 heartbeat: [4637]: info: mach_down takeover complete.
Apr 19 00:36:58 db-server-02 heartbeat: [4637]: WARN: node db-server-01: is dead
Apr 19 00:36:58 db-server-02 heartbeat: [4637]: info: Dead node db-server-01 gave up resources.
Apr 19 00:36:58 db-server-02 heartbeat: [4637]: info: Link db-server-01:eth1 dead.
[[email protected] ~]# 

複製代碼

對於mysqld服務掛掉的情況無法實現自動切換,所以需要一個腳本來幫助我們完成,我這裏有個簡單的腳本,能實現當mysqld服務不可用時進行自動切換,當進行切換時發送郵件等。該腳本放在主服務器執行,也就是運行mysqld服務的服務器上執行。

複製代碼

[[email protected] ~]# cat mysqlmon.sh 
#!/bin/bash
trap 'echo  PROGRAM INTERRUPTED; exit 1'  INT
username=root
password=123456
n=0
log='/var/log/mysqlmon.log'
while true
do
    if /usr/local/mysql/bin/mysql  -u${username} -p${password} -e "use test"   >&/dev/null
    then
        echo `date +"%Y-%m-%d  %H:%M:%S"`  mysqld is alive!  >> ${log}
        n=0
    else
        echo  "`date +"%Y-%m-%d  %H:%M:%S"`  mysqld  cannot be  connected!"  >> ${log}
        n=$[n + 1]
        if [ $n -eq 3 ]
        then
            /etc/init.d/heartbeat stop
            echo  "`date +"%Y-%m-%d  %H:%M:%S"`  mysqld  switched to backup!" >> ${log}
            echo "`date +"%Y-%m-%d  %H:%M:%S"`  mysqld  switched to backup" | mutt -s "mysqld switched to backup" [email protected]
            break
        fi
    fi
    sleep 10
done

[[email protected] ~]# 

複製代碼

掛在後臺執行:

[[email protected] ~]# nohup mysqlmon.sh &

停止mysqld服務,看是否進行切換以及發送郵件:

[[email protected] ~]# /etc/init.d/mysqld stop
Shutting down MySQL.                                       [  OK  ]
[[email protected] ~]# 

複製代碼

[[email protected] ~]# df -HT
Filesystem    Type     Size   Used  Avail Use% Mounted on
/dev/sda2     ext4      19G   4.9G    13G  28% /
tmpfs        tmpfs     121M      0   121M   0% /dev/shm
/dev/sda1     ext4     204M    52M   141M  27% /boot
/dev/drbd0    ext4      33G   216M    32G   1% /data
[[email protected] ~]# netstat -nltp | grep 3306
tcp        0      0 0.0.0.0:3306                0.0.0.0:*                   LISTEN      13771/mysqld        
[[email protected] ~]# 

複製代碼

 

總結:

搭建還不算複雜,但是也踩了不少坑,比如yum安裝的heartbeat沒有drbddisk腳本。該方案的優點是安全性高、穩定性高、可用性高,出現故障自動切換,但是缺點也很明顯,只有一臺服務器提供服務,成本相對較高。不方便擴展。可能會發生腦裂。當mysql服務掛掉或者不可用的情況下不能進行自動切換,需要通過crm模式實現或者額外的腳本實現(比如shell腳本監測到master的mysql不可用就將主上的heartbeat停掉,這樣就會切換到backup中去)。監控也特別重要,可以使用nagios或者zabbix監控。

 

參考資料:

http://wiki.weithenn.org/cgi-bin/wiki.pl?HA-DRBD_Heartbeat_%E5%BB%BA%E7%BD%AE_MySQL_%E9%AB%98%E5%8F%AF%E7%94%A8%E6%80%A7

 

轉載至https://www.cnblogs.com/gomysql/p/3674030.html