1.集羣環境
html
node1:192.168.220.111node
node2:192.168.220.112linux
2.準備工做bash
配置各節點SSH互信:網絡
# node1 ssh-keygen -t rsa -f ~/.ssh/id_rsa -P '' ssh-copy-id -i .ssh/id_rsa.pub root@192.168.220.112 # node2 ssh-keygen -t rsa -f ~/.ssh/id_rsa -P '' ssh-copy-id -i .ssh/id_rsa.pub root@192.168.220.111
配置主機名稱與uname -n一致,並經過/etc/hosts解析:dom
# node1 hostname node1.wyb.com sed -i 's/localhost.localdomain/node1.wyb.com/g' /etc/sysconfig/network echo '192.168.220.111 node1.wyb.com node1' >> /etc/hosts echo '192.168.220.112 node2.wyb.com node2' >> /etc/hosts # node2 hostname node2.wyb.com sed -i 's/localhost.localdomain/node2.wyb.com/g' /etc/sysconfig/network echo '192.168.220.111 node1.wyb.com node1' >> /etc/hosts echo '192.168.220.112 node2.wyb.com node2' >> /etc/hosts
時間同步:ssh
# node1 node2 ntpdate asia.pool.ntp.org echo '*/3 * * * * /usr/sbin/ntpdate asia.pool.ntp.org &> /dev/null' >> /var/spool/cron/root
3.安裝ide
自3版本開始,heartbeat將原來項目拆分爲了多個子項目(即多個獨立組件),如今的組件包括:heartbeat、cluster-glue、resource-agents。各組件主要功能:加密
heartbeat:屬於集羣的信息層,負責維護集羣中全部節點的信息以及各節點之間的通訊。spa
cluster-glue:包括LRM(本地資源管理器)、STONITH,將heartbeat與crm(集羣資源管理器)聯繫起來,屬於一箇中間層。
resource-agents:即各類資源腳本,由LRM調用從而實現各個資源的啓動、中止、監控等。
設置yum源:
rpm -ivh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm
安裝heartbeat/pacemaker:
yum install heartbeat heartbeat-libs pacemaker pacemaker-libs resource-agents \ cluster-glue cluster-glue-libs
4.配置
heartbeat有3個配置文件:
密鑰文件: authkeys,用來加密集羣間事務信息傳遞,權限必須爲600;
heartbeat:服務的配置文件ha.cf;
haresources: 資源管理配置文件;
默認目錄下並無相關配置文件,能夠本身手動創建,也能夠直接修改軟件包中自帶的模板,由於使用pacemaker管理資源因此不須要拷貝haresources文件,若是使用了crm管理資源,而在配置文件目錄含有haresources文件,日誌中會提示haresources沒有使用。
cp -p /usr/share/doc/heartbeat-3.0.4/{authkeys,ha.cf} /etc/ha.d/
配置密鑰文件:
(echo -ne "auth 1\n1 md5 ";dd if=/dev/random bs=512 count=1 | md5sum) >> /etc/ha.d/authkeys chmod 600 /etc/ha.d/authkeys
配置主配置文件ha.cf:
#集羣中的節點不會自動加入 autojoin none #heartbeat會記錄debug日誌,若是啓用use_logd,則此選項會被忽略 #debugfile /var/log/ha-debug #記錄全部non-debug消息,若是啓用use_logd,則此選項會被忽略 logfile /var/log/ha-log #經過syslog記錄日誌 #logfacility local0 #指定兩個心跳檢測包的時間間隔 keepalive 1 #多久之後心跳檢測決定集羣中的node已經掛掉 deadtime 30 #心跳包檢測的延時事件,若是延時,只是往日誌中記錄warning日誌,並不切換服務 warntime 10 #在heartbeat啓動後,在多長時間內宣佈node是dead狀態,由於有時候系統啓動後,網絡還須要一段時間才能啓動 initdead 120 #若是udpport指令在bcast ucast指令的前面,則使用哪一個端口進行廣播,不然使用默認端口 udpport 694 #設置使用哪一個網絡接口發送UDP廣播包,能夠設置多個網絡接口 bcast eth0 #設置在哪一個網絡接口進行多播心跳檢測 #mcast eth0 239.0.0.1 694 1 0 #設置使用哪一個網絡接口進行UDP單播心跳檢測,在.3上爲10.1.1.2 #ucast eth0 10.1.1.3 #在主節點的服務恢復後,是否把從節點的服務切換回來 auto_failback on #告訴集羣中有哪些節點,node名稱必須是uname -n顯示出來的名稱,能夠在一個node中設置多個節點,也能夠屢次設置node,每個在集羣中的node都必須被列出來 node node1.wyb.com node node2.wyb.com #設置ping節點,仲裁設備,能夠指向網關 ping 192.168.220.2 #節點故障後,是否嘗試重啓heartbeat服務來恢復 respawn hacluster /usr/lib64/heartbeat/ipfail #開啓Pacemaker cluster manager,由於歷史緣由,次選項默認是off,可是應該保持該選項值爲respawn。在設置爲respawn默認自動使用如下配置 pacemaker respawn #默認配置文件中下面還有不少選項,因爲暫時用不到因此暫時忽略
將配置文件複製到node2上:
scp -p /etc/ha.d/{authkeys,ha.cf} node2:/etc/ha.d/
5.安裝crmsh
從pacemaker 1.1.8開始,crmsh 發展成一個獨立項目,pacemaker中再也不提供,說明咱們安裝好pacemaker後,是不會有crm這個命令行模式的資源管理器的。
# node1 node2 wget http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/x86_64/crmsh-2.1-1.6.x86_64.rpm yum -y --nogpgcheck localinstall crmsh-2.1-1.6.x86_64.rpm
6.遇到的問題
問題1:
[root@node1 ~]# service heartbeat start Starting High-Availability services: Heartbeat failure [rc=6]. Failed. heartbeat[12176]: 2015/09/11_13:30:47 ERROR: Client child command [/usr/lib/heartbeat/ipfail] is not executable heartbeat[12176]: 2015/09/11_13:30:47 info: Pacemaker support: respawn heartbeat[12176]: 2015/09/11_13:30:47 ERROR: Client child command [/usr/lib64/heartbeat/cib] is not executable heartbeat[12176]: 2015/09/11_13:30:47 ERROR: Directive respawn hacluster /usr/lib64/heartbeat/cib failed heartbeat[12176]: 2015/09/11_13:30:47 ERROR: Client child command [/usr/lib64/heartbeat/stonithd] is not executable heartbeat[12176]: 2015/09/11_13:30:47 ERROR: Directive respawn root /usr/lib64/heartbeat/stonithd failed heartbeat[12176]: 2015/09/11_13:30:47 ERROR: Client child command [/usr/lib64/heartbeat/attrd] is not executable heartbeat[12176]: 2015/09/11_13:30:47 ERROR: Directive respawn hacluster /usr/lib64/heartbeat/attrd failed heartbeat[12176]: 2015/09/11_13:30:47 ERROR: Client child command [/usr/lib64/heartbeat/crmd] is not executable heartbeat[12176]: 2015/09/11_13:30:47 ERROR: Directive respawn hacluster /usr/lib64/heartbeat/crmd failed heartbeat[12176]: 2015/09/11_13:30:47 ERROR: Heartbeat not started: configuration error. heartbeat[12176]: 2015/09/11_13:30:47 ERROR: Configuration error, heartbeat not started.
解決辦法:
ln -sv /usr/libexec/pacemaker/* /usr/lib64/heartbeat/
問題2:pacemaker程序沒法啓動
Sep 11 13:44:04 [12376] node1.wyb.com crmd: info: crm_ipc_connect: Could not establish cib_shm connection: Connection refused (111) Sep 11 13:44:05 [12376] node1.wyb.com crmd: info: crm_ipc_connect: Could not establish cib_shm connection: Connection refused (111) Sep 11 13:44:05 [12376] node1.wyb.com crmd: info: do_cib_control: Could not connect to the CIB service: Transport endpoint is not connected Sep 11 13:44:05 [12376] node1.wyb.com crmd: warning: do_cib_control: Couldn't complete CIB registration 15 times... pause and retry Sep 11 13:44:07 [12376] node1.wyb.com crmd: info: crm_timer_popped: Wait Timer (I_NULL) just popped (2000ms)
解決辦法:此問題還沒有解決,不知道是軟件BUG仍是其餘什麼緣由,經過本身從網絡(http://rpm.pbone.net)下載其餘版本的軟件安裝仍是出現一樣問題,網絡上也找不到相似問題的解決方案。
問題3:經過heartbeat自帶的haresource代替pacemaker進行資源管理時,兩節點之間沒法正常傳遞心跳信息,致使資源在兩節點上都啓動。
Sep 18 18:43:38 node1.wyb.com heartbeat: [11374]: info: Configuration validated. Starting heartbeat 3.0.4 Sep 18 18:43:38 node1.wyb.com heartbeat: [11375]: info: heartbeat: version 3.0.4 Sep 18 18:43:38 node1.wyb.com heartbeat: [11375]: info: Heartbeat generation: 1442572552 Sep 18 18:43:38 node1.wyb.com heartbeat: [11375]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0 Sep 18 18:43:38 node1.wyb.com heartbeat: [11375]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1 Sep 18 18:43:38 node1.wyb.com heartbeat: [11375]: info: glib: ping heartbeat started. Sep 18 18:43:38 node1.wyb.com heartbeat: [11375]: info: G_main_add_TriggerHandler: Added signal manual handler Sep 18 18:43:38 node1.wyb.com heartbeat: [11375]: info: G_main_add_TriggerHandler: Added signal manual handler Sep 18 18:43:38 node1.wyb.com heartbeat: [11375]: info: G_main_add_SignalHandler: Added signal handler for signal 17 Sep 18 18:43:38 node1.wyb.com heartbeat: [11375]: info: Local status now set to: 'up' Sep 18 18:43:38 node1.wyb.com heartbeat: [11375]: info: Link 192.168.220.2:192.168.220.2 up. Sep 18 18:43:38 node1.wyb.com heartbeat: [11375]: info: Status update for node 192.168.220.2: status ping Sep 18 18:43:38 node1.wyb.com heartbeat: [11375]: info: Link node1.wyb.com:eth0 up. Sep 18 18:44:08 node1.wyb.com heartbeat: [11375]: WARN: node node2.wyb.com: is dead
解決辦法:還沒有解決,iptables和selinux都已關閉,兩節點間也能互相ping通,無奈。
7.總結
綜上所述,並無成功實現heartbeat+pacemaker高可用功能,遇到各類奇葩問題,花費了近一個星期時間,重裝了N次,現已無能爲力,因爲時間問題,並且如今heartbeat已處於維護階段,再也不更新,corosync將成爲主流,因此留待之後有時間時再來檢查。
參考資料:
heartbeat + pacemaker實現pg流複製自動切換:
http://my.oschina.net/lianshunke/blog/200411?p=`currentPage-1`
Heartbeat3.0.5+pacemaker:http://my.oschina.net/guol/blog/90128
Linux高可用(HA)集羣之Pacemaker詳解:http://www.linuxeye.com/Linux/1899.html