centOS6.5 heartbeatV3+pacemaker實現高可用集羣

時間 2020-07-24

標籤 centos6.5 centos heartbeatv3+pacemaker heartbeatv pacemaker 實現可用集羣欄目 CentOS 简体版

原文原文鏈接

1.集羣環境
html

node1：192.168.220.111node

node2：192.168.220.112linux

2.準備工做bash

配置各節點SSH互信：網絡

# node1
ssh-keygen -t rsa -f ~/.ssh/id_rsa -P ''
ssh-copy-id -i .ssh/id_rsa.pub root@192.168.220.112
# node2
ssh-keygen -t rsa -f ~/.ssh/id_rsa -P ''
ssh-copy-id -i .ssh/id_rsa.pub root@192.168.220.111

配置主機名稱與uname -n一致，並經過/etc/hosts解析：dom

# node1
hostname node1.wyb.com
sed -i 's/localhost.localdomain/node1.wyb.com/g' /etc/sysconfig/network
echo '192.168.220.111 node1.wyb.com   node1' >> /etc/hosts
echo '192.168.220.112 node2.wyb.com   node2' >> /etc/hosts
# node2
hostname node2.wyb.com
sed -i 's/localhost.localdomain/node2.wyb.com/g' /etc/sysconfig/network
echo '192.168.220.111 node1.wyb.com   node1' >> /etc/hosts
echo '192.168.220.112 node2.wyb.com   node2' >> /etc/hosts

時間同步：ssh

# node1 node2
ntpdate asia.pool.ntp.org
echo '*/3 * * * * /usr/sbin/ntpdate asia.pool.ntp.org &> /dev/null' >> /var/spool/cron/root

3.安裝ide

自3版本開始，heartbeat將原來項目拆分爲了多個子項目(即多個獨立組件)，如今的組件包括：heartbeat、cluster-glue、resource-agents。各組件主要功能：加密

heartbeat：屬於集羣的信息層，負責維護集羣中全部節點的信息以及各節點之間的通訊。spa

cluster-glue：包括LRM（本地資源管理器）、STONITH，將heartbeat與crm（集羣資源管理器）聯繫起來，屬於一箇中間層。

resource-agents：即各類資源腳本，由LRM調用從而實現各個資源的啓動、中止、監控等。

設置yum源：

rpm -ivh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm

安裝heartbeat/pacemaker：

yum install heartbeat heartbeat-libs pacemaker pacemaker-libs resource-agents \
    cluster-glue cluster-glue-libs

4.配置

heartbeat有3個配置文件：

        密鑰文件： authkeys，用來加密集羣間事務信息傳遞，權限必須爲600；
        heartbeat：服務的配置文件ha.cf；
        haresources：資源管理配置文件；

默認目錄下並無相關配置文件，能夠本身手動創建，也能夠直接修改軟件包中自帶的模板，由於使用pacemaker管理資源因此不須要拷貝haresources文件，若是使用了crm管理資源，而在配置文件目錄含有haresources文件，日誌中會提示haresources沒有使用。

cp -p /usr/share/doc/heartbeat-3.0.4/{authkeys,ha.cf} /etc/ha.d/

配置密鑰文件：

(echo -ne "auth 1\n1 md5 ";dd if=/dev/random bs=512 count=1 | md5sum) >> /etc/ha.d/authkeys
chmod 600 /etc/ha.d/authkeys

配置主配置文件ha.cf：

#集羣中的節點不會自動加入
autojoin    none
 
#heartbeat會記錄debug日誌，若是啓用use_logd，則此選項會被忽略
#debugfile   /var/log/ha-debug
 
#記錄全部non-debug消息，若是啓用use_logd，則此選項會被忽略
logfile    /var/log/ha-log
 
#經過syslog記錄日誌
#logfacility   local0
 
#指定兩個心跳檢測包的時間間隔
keepalive 1
 
#多久之後心跳檢測決定集羣中的node已經掛掉
deadtime   30
 
#心跳包檢測的延時事件，若是延時，只是往日誌中記錄warning日誌，並不切換服務
warntime  10
 
#在heartbeat啓動後，在多長時間內宣佈node是dead狀態，由於有時候系統啓動後，網絡還須要一段時間才能啓動
initdead  120
 
#若是udpport指令在bcast ucast指令的前面，則使用哪一個端口進行廣播，不然使用默認端口
udpport   694
 
#設置使用哪一個網絡接口發送UDP廣播包，能夠設置多個網絡接口
bcast eth0
 
#設置在哪一個網絡接口進行多播心跳檢測
#mcast   eth0 239.0.0.1 694 1 0
 
#設置使用哪一個網絡接口進行UDP單播心跳檢測，在.3上爲10.1.1.2
#ucast  eth0 10.1.1.3
 
#在主節點的服務恢復後，是否把從節點的服務切換回來
auto_failback on
 
#告訴集羣中有哪些節點，node名稱必須是uname -n顯示出來的名稱，能夠在一個node中設置多個節點，也能夠屢次設置node，每個在集羣中的node都必須被列出來
node  node1.wyb.com
node  node2.wyb.com
 
#設置ping節點，仲裁設備，能夠指向網關
ping 192.168.220.2

#節點故障後，是否嘗試重啓heartbeat服務來恢復
respawn hacluster /usr/lib64/heartbeat/ipfail

#開啓Pacemaker cluster manager，由於歷史緣由，次選項默認是off，可是應該保持該選項值爲respawn。在設置爲respawn默認自動使用如下配置
pacemaker  respawn
 
#默認配置文件中下面還有不少選項，因爲暫時用不到因此暫時忽略

將配置文件複製到node2上：

scp -p /etc/ha.d/{authkeys,ha.cf} node2:/etc/ha.d/

5.安裝crmsh

從pacemaker 1.1.8開始，crmsh 發展成一個獨立項目，pacemaker中再也不提供，說明咱們安裝好pacemaker後，是不會有crm這個命令行模式的資源管理器的。

# node1 node2
wget http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/x86_64/crmsh-2.1-1.6.x86_64.rpm
yum -y --nogpgcheck localinstall crmsh-2.1-1.6.x86_64.rpm

6.遇到的問題

問題1：

[root@node1 ~]# service heartbeat start
Starting High-Availability services:  Heartbeat failure [rc=6]. Failed.

heartbeat[12176]: 2015/09/11_13:30:47 ERROR: Client child command [/usr/lib/heartbeat/ipfail] is not executable
heartbeat[12176]: 2015/09/11_13:30:47 info: Pacemaker support: respawn
heartbeat[12176]: 2015/09/11_13:30:47 ERROR: Client child command [/usr/lib64/heartbeat/cib] is not executable
heartbeat[12176]: 2015/09/11_13:30:47 ERROR: Directive respawn  hacluster /usr/lib64/heartbeat/cib failed
heartbeat[12176]: 2015/09/11_13:30:47 ERROR: Client child command [/usr/lib64/heartbeat/stonithd] is not executable
heartbeat[12176]: 2015/09/11_13:30:47 ERROR: Directive respawn root /usr/lib64/heartbeat/stonithd failed
heartbeat[12176]: 2015/09/11_13:30:47 ERROR: Client child command [/usr/lib64/heartbeat/attrd] is not executable
heartbeat[12176]: 2015/09/11_13:30:47 ERROR: Directive respawn  hacluster /usr/lib64/heartbeat/attrd failed
heartbeat[12176]: 2015/09/11_13:30:47 ERROR: Client child command [/usr/lib64/heartbeat/crmd] is not executable
heartbeat[12176]: 2015/09/11_13:30:47 ERROR: Directive respawn  hacluster /usr/lib64/heartbeat/crmd failed
heartbeat[12176]: 2015/09/11_13:30:47 ERROR: Heartbeat not started: configuration error.
heartbeat[12176]: 2015/09/11_13:30:47 ERROR: Configuration error, heartbeat not started.

解決辦法：

ln -sv /usr/libexec/pacemaker/* /usr/lib64/heartbeat/

問題2：pacemaker程序沒法啓動

Sep 11 13:44:04 [12376] node1.wyb.com       crmd:     info: crm_ipc_connect:    Could not establish cib_shm connection: Connection refused (111)
Sep 11 13:44:05 [12376] node1.wyb.com       crmd:     info: crm_ipc_connect:    Could not establish cib_shm connection: Connection refused (111)
Sep 11 13:44:05 [12376] node1.wyb.com       crmd:     info: do_cib_control:     Could not connect to the CIB service: Transport endpoint is not connected
Sep 11 13:44:05 [12376] node1.wyb.com       crmd:  warning: do_cib_control:     Couldn't complete CIB registration 15 times... pause and retry
Sep 11 13:44:07 [12376] node1.wyb.com       crmd:     info: crm_timer_popped:   Wait Timer (I_NULL) just popped (2000ms)

解決辦法：此問題還沒有解決，不知道是軟件BUG仍是其餘什麼緣由，經過本身從網絡(http://rpm.pbone.net)下載其餘版本的軟件安裝仍是出現一樣問題，網絡上也找不到相似問題的解決方案。

問題3：經過heartbeat自帶的haresource代替pacemaker進行資源管理時，兩節點之間沒法正常傳遞心跳信息，致使資源在兩節點上都啓動。

Sep 18 18:43:38 node1.wyb.com heartbeat: [11374]: info: Configuration validated. Starting heartbeat 3.0.4
Sep 18 18:43:38 node1.wyb.com heartbeat: [11375]: info: heartbeat: version 3.0.4
Sep 18 18:43:38 node1.wyb.com heartbeat: [11375]: info: Heartbeat generation: 1442572552
Sep 18 18:43:38 node1.wyb.com heartbeat: [11375]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0
Sep 18 18:43:38 node1.wyb.com heartbeat: [11375]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1
Sep 18 18:43:38 node1.wyb.com heartbeat: [11375]: info: glib: ping heartbeat started.
Sep 18 18:43:38 node1.wyb.com heartbeat: [11375]: info: G_main_add_TriggerHandler: Added signal manual handler
Sep 18 18:43:38 node1.wyb.com heartbeat: [11375]: info: G_main_add_TriggerHandler: Added signal manual handler
Sep 18 18:43:38 node1.wyb.com heartbeat: [11375]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Sep 18 18:43:38 node1.wyb.com heartbeat: [11375]: info: Local status now set to: 'up'
Sep 18 18:43:38 node1.wyb.com heartbeat: [11375]: info: Link 192.168.220.2:192.168.220.2 up.
Sep 18 18:43:38 node1.wyb.com heartbeat: [11375]: info: Status update for node 192.168.220.2: status ping
Sep 18 18:43:38 node1.wyb.com heartbeat: [11375]: info: Link node1.wyb.com:eth0 up.
Sep 18 18:44:08 node1.wyb.com heartbeat: [11375]: WARN: node node2.wyb.com: is dead

解決辦法：還沒有解決，iptables和selinux都已關閉，兩節點間也能互相ping通，無奈。

7.總結

綜上所述，並無成功實現heartbeat+pacemaker高可用功能，遇到各類奇葩問題，花費了近一個星期時間，重裝了N次，現已無能爲力，因爲時間問題，並且如今heartbeat已處於維護階段，再也不更新，corosync將成爲主流，因此留待之後有時間時再來檢查。

參考資料：

heartbeat + pacemaker實現pg流複製自動切換：

http://my.oschina.net/lianshunke/blog/200411?p=`currentPage-1`

Heartbeat3.0.5+pacemaker：http://my.oschina.net/guol/blog/90128

Linux高可用（HA）集羣之Pacemaker詳解：http://www.linuxeye.com/Linux/1899.html