ubuntu 14.4
集羣所用兩臺服務器ip: 10.11.8.192 和 10.11.8.193
nfs服務ip: 10.11.8.43html
1.時間同步
2.主機名及hosts文件配置(兩臺主機都須要配置)node
vim /etc/sysctl.d/10-kernel-hardening.conf #配置主機名爲node1,添加 kernel.hostname = node1 vim /etc/sysctl.d/10-kernel-hardening.conf #配置主機名爲node2,添加 kernel.hostname = node2
爲防止DNS解析出錯,使用hosts文件,保證主機名與'uname -n'一致web
vim /etc/hosts #都添加hosts記錄 10.11.8.192 node1 10.11.8.193 node2
3.配置基於ssh祕鑰的雙機互信apache
Node1:bootstrap
root@node1:~# ssh-keygen -t rsa root@node1:~# ssh-copy-id -i ~/.ssh/id_rsa.pub root@node2 #yes接受key,輸入node2的密碼
Node2:ubuntu
root@node2:~# ssh-keygen -t rsa root@node2:~# ssh-copy-id -i ~/.ssh/id_rsa.pub root@node1
安裝corosync和pacemaker:vim
cluster-glue cluster-glue-dev heartbeat resource-agents corosync
heartbeat-dev pacemaker corosync-lib libesmtp pacemaker-dev服務器
1.編輯/etc/corosync/corosync.conf網絡
# Please read the openais.conf.5 manual page totem { version: 2 # How long before declaring a token lost (ms) token: 3000 # How many token retransmits before forming a new configuration token_retransmits_before_loss_const: 10 # How long to wait for join messages in the membership protocol (ms) join: 60 # How long to wait for consensus to be achieved before starting a new round of membership configuration (ms) consensus: 3600 # Turn off the virtual synchrony filter vsftype: none # Number of messages that may be sent by one processor on receipt of the token max_messages: 20 # Limit generated nodeids to 31-bits (positive signed integers) clear_node_high_bit: yes # Disable encryption secauth: off #啓動認證功能 # How many threads to use for encryption/decryption threads: 0 # Optionally assign a fixed node id (integer) # nodeid: 1234 # This specifies the mode of redundant ring, which may be none, active, or passive. rrp_mode: none interface { # The following values need to be set based on your environment ringnumber: 0 bindnetaddr: 10.11.8.0 #主機所在的網絡地址 mcastaddr: 226.93.2.1 #組播地址,只要不被佔用便可使用 ps: 224.0.2.0~238.255.255.255爲用戶可用的組播地址(臨時組地址),全網範圍內有效; mcastport: 5405 #組播端口 } } amf { mode: disabled } quorum { # Quorum for the Pacemaker Cluster Resource Manager provider: corosync_votequorum expected_votes: 1 } aisexec { user: root group: root } logging { fileline: off to_stderr: no #輸出到標準輸出 to_logfile: yes #輸出到日誌文件 logfile: /var/log/corosync.log #日誌文件位置 to_syslog: no #輸出到系統日誌 syslog_facility: daemon debug: off timestamp: on logger_subsys { subsys: AMF debug: off tags: enter|leave|trace1|trace2|trace3|trace4|trace6 } } # 添加pacemaker服務配置 service { ver: 1 name: pacemaker }
PS: 官方文檔已修改, pacemaker需獨立啓動dom
Corosync In the past the Corosync process would launch pacemaker, this
is no longer the case. Pacemaker must be launched after Corosync has
successfully started.
來源: http://clusterlabs.org/wiki/Initial_Configuration#Corosync
/etc/init.d/corosync start /etc/init.d/pacemaker start
2.生成節點間通訊時用到的認證密鑰文件:
root@node1:~# corosync-keygen -l
option: -l 從/dev/urandom獲取隨機數
corosync-keygen: 不加-l 參數, 會從/dev/random中獲取隨機數, 若是隨機數不夠, 會卡住
3.將corosync和authkey複製至node2:
root@node1:~# scp -p corosync authkey node2:/etc/corosync/
4.分別編輯兩個節點/etc/default/corosync文件
# vim /etc/default/corosync START=yes
若是不修改, 命令正常執行, 無回顯, 進程不啓動
啓動corosync+pacemaker:
root@node1:~# /etc/init.d/corosync start root@node1:~# /etc/init.d/pacemaker start root@node1:~# tail -f /var/log/corosync.log #查看日誌文件 root@node1:~# netstat -tunlp #查看端口監聽狀況 udp 0 0 10.11.8.192:5404 0.0.0.0:* 1431/corosync udp 0 0 10.11.8.192:5405 0.0.0.0:* 1431/corosync udp 0 0 226.93.2.1:5405 0.0.0.0:* 1431/corosync
node1正常啓動後便可啓動node2
root@node1:~# ssh node2 -- /etc/init.d/corosync start root@node1:~# ssh node2 -- /etc/init.d/pacemaker start
查看集羣節點狀態:
Last updated: Wed May 18 08:49:46 2016 Last change: Mon May 16 06:12:56 2016 via crm_attribute on node1 Stack: corosync Current DC: node1 (168495296) - partition with quorum Version: 1.1.10-42f2063 2 Nodes configured 0 Resources configured Online: [ node1 node2 ] # ps auxf #查看集羣進程 root 1472 0.0 1.1 107512 9040 pts/0 S 08:32 0:00 pacemakerd haclust+ 1474 0.0 2.0 110260 15636 ? Ss 08:32 0:00 \_ /usr/lib/pacemaker/cib root 1475 0.0 1.2 107264 9668 ? Ss 08:32 0:00 \_ /usr/lib/pacemaker/stonithd root 1476 0.0 0.9 81824 6992 ? Ss 08:32 0:00 \_ /usr/lib/pacemaker/lrmd haclust+ 1477 0.0 0.8 97688 6800 ? Ss 08:32 0:00 \_ /usr/lib/pacemaker/attrd haclust+ 1478 0.0 2.9 110264 22136 ? Ss 08:32 0:00 \_ /usr/lib/pacemaker/pengine haclust+ 1479 0.0 1.8 166560 14000 ? Ss 08:32 0:00 \_ /usr/lib/pacemaker/crmd
1.禁用stonith
corosync默認啓用裏stonith, 此集羣沒有stonith設備, 因而禁用
# crm configure property stonith-enabled=false
PS: crm有命令模式和交互模式兩種使用方法
使用以下命令查看當前的配置信息:
# crm configure show node node1.magedu.com node node2.magedu.com property $id="cib-bootstrap-options" \ dc-version="1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ stonith-enabled="false
從中能夠看出stonith已經被禁用.
2.修改忽略quorum不能知足的集羣狀態檢查
離線一個節點後, 集羣狀態爲"WITHOUT quorum",即已經失去了quorum,此時集羣服務自己已經不知足正常運行的條件,這對於只有兩節點的集羣來說是不合理的。
# crm configure property no-quorum-policy=ignore
1.查看資源
corosync支持heartbeat,LSB和ocf等類型的資源代理,目前較爲經常使用的類型爲LSB和OCF兩類,stonith類專爲配置stonith設備而用;
能夠經過以下命令查看當前集羣系統所支持的類型:
# crm ra classes heartbeat lsb ocf / heartbeat pacemaker stonith
若是想要查看某種類別下的所用資源代理的列表,可使用相似以下命令實現:
# crm ra list lsb # crm ra list ocf heartbeat # crm ra info ocf:heartbeat:IPaddr
2.添加資源
crm(live)configure# primitive webip ocf:heartbeat:IPaddr params ip=10.11.8.200 nic=eth1 cidr_netmask=23 crm(live)configure# primitive filesystem ocf:heartbeat:Filesystem params device=10.11.8.43:/www/html directory=/var/www/html fstype=nfs crm(live)configure# primitive httpd lsb:apache2 crm(live)# status Last updated: Wed May 18 09:12:56 2016 Last change: Wed May 18 09:12:52 2016 via cibadmin on node1 Stack: corosync Current DC: node1 (168495296) - partition with quorum Version: 1.1.10-42f2063 2 Nodes configured 3 Resources configured Online: [ node1 node2 ] webip (ocf::heartbeat:IPaddr): Started node1 filesystem (ocf::heartbeat:Filesystem): Started node2 httpd (lsb:apache2): Started node1
3.資源約束
集羣的3個資源沒有運行在同一節點上, 這不是我想要的
crm(live)configure# group webservice webip filesystem httpd crm(live)# status Last updated: Wed May 18 09:22:48 2016 Last change: Wed May 18 09:20:48 2016 via crm_attribute on node2 Stack: corosync Current DC: node1 (168495296) - partition with quorum Version: 1.1.10-42f2063 2 Nodes configured 3 Resources configured Online: [ node1 node2 ] Resource Group: webservice webip (ocf::heartbeat:IPaddr): Started node1 filesystem (ocf::heartbeat:Filesystem): Started node1 httpd (lsb:apache2): Started node1
三種資源約束方法:
1)Resource Location(資源位置):定義資源能夠、不能夠或儘量在哪些節點上運行;
2)Resource Collocation(資源排列):排列約束用以定義集羣資源能夠或不能夠在某個節點上同時運行;
3)Resource Order(資源順序):順序約束定義集羣資源在節點上啓動的順序;
定義約束時,還須要指定分數。各類分數是集羣工做方式的重要組成部分。其實,從遷移資源到決定在已降級集羣中中止哪些資源的整個過程是經過以某種方式修改分數來實現的。分數按每一個資源來計算,資源分數爲負的任何節點都沒法運行該資源。在計算出資源分數後,集羣選擇分數最高的節點。INFINITY(無窮大)目前定義爲 1,000,000。加減無窮大遵循如下3個基本規則:
1)任何值 + 無窮大 = 無窮大
2)任何值 - 無窮大 = -無窮大
3)無窮大 - 無窮大 = -無窮大
定義資源約束時,也能夠指定每一個約束的分數。分數表示指派給此資源約束的值。分數較高的約束先應用,分數較低的約束後應用。經過使用不一樣的分數爲既定資源建立更多位置約束,能夠指定資源要故障轉移至的目標節點的順序。
所以,對於前述的WebIP和WebSite可能會運行於不一樣節點的問題,能夠經過如下命令來解決:
# crm configure colocation website-with-ip INFINITY: WebSite WebIP
接着,咱們還得確保WebSite在某節點啓動以前得先啓動WebIP,這可使用以下命令實現:
# crm configure order httpd-after-ip mandatory: WebIP WebSite
此外,因爲HA集羣自己並不強制每一個節點的性能相同或相近,因此,某些時候咱們可能但願在正常時服務總能在某個性能較強的節點上運行,這能夠經過位置約束來實現:
# crm configure location prefer-node1 WebSite rule 200: node1
這條命令實現了將WebSite約束在node1上,且指定其分數爲200;
crm(live)node# standby crm(live)node# cd crm(live)# status Last updated: Wed May 18 09:25:24 2016 Last change: Wed May 18 09:25:20 2016 via crm_attribute on node1 Stack: corosync Current DC: node1 (168495296) - partition with quorum Version: 1.1.10-42f2063 2 Nodes configured 3 Resources configured Node node1 (168495296): standby Online: [ node2 ] Resource Group: webservice webip (ocf::heartbeat:IPaddr): Started node2 filesystem (ocf::heartbeat:Filesystem): Started node2 httpd (lsb:apache2): Started node2
正常訪問