目錄html
一、引言及環境介紹node
二、高可用環境部署mysql
三、crmsh接口使用介紹nginx
四、案例git
五、總結github
一、引言及環境介紹
web
在上一博文中介紹了一些關於高可用技術的理論基礎知識,這一博文則是介紹corosync+pacemakcer這一高可用方案的安裝部署,並會以實際的案例來演示高可用的實現,corosync提供集羣的信息層(messaging layer)的功能,傳遞心跳信息和集羣事務信息,pacemaker工做在資源分配層,提供資源管理器的功能,並以crmsh這個資源配置的命令接口來配置資源。在進入主題前先來介紹一下常見的開源高可用方案和此次環境搭建的系統環境。sql
常見的HA開源方案:shell
heartbeat v1 + haresourcesapache
heartbeat v 2 + crm
heartbeat v3 + cluster-glue + pacemaker
corosync + cluster-glue + pacemaker
cman + rgmanager
keepalived + script
這次測試的系統環境:
[root@nod1 tomcat]# cat /etc/issue CentOS release 6.4 (Final) Kernel \r on an \m [root@nod1 tomcat]# uname -r 2.6.32-358.el6.x86_64
兩個節點都是採用相同的操做系統
二、高可用環境部署
[root@nod1 ~]# yum -y install pacemakcer corosync #pacemaker和corosync採用yum方式安裝便可,前提是你要配置好yum源,注意:兩個節點都要進行安裝 [root@nod1 ~]# rpm -ql corosync /etc/corosync /etc/corosync/corosync.conf.example #主配置文件模板 /etc/corosync/corosync.conf.example.udpu /etc/corosync/service.d /etc/corosync/uidgid.d /etc/dbus-1/system.d/corosync-signals.conf /etc/rc.d/init.d/corosync /etc/rc.d/init.d/corosync-notifyd /etc/sysconfig/corosync-notifyd /usr/bin/corosync-blackbox /usr/libexec/lcrso /usr/libexec/lcrso/coroparse.lcrso /usr/libexec/lcrso/objdb.lcrso /usr/libexec/lcrso/quorum_testquorum.lcrso /usr/libexec/lcrso/quorum_votequorum.lcrso /usr/libexec/lcrso/service_cfg.lcrso /usr/libexec/lcrso/service_confdb.lcrso /usr/libexec/lcrso/service_cpg.lcrso /usr/libexec/lcrso/service_evs.lcrso /usr/libexec/lcrso/service_pload.lcrso /usr/libexec/lcrso/vsf_quorum.lcrso /usr/libexec/lcrso/vsf_ykd.lcrso /usr/sbin/corosync /usr/sbin/corosync-cfgtool /usr/sbin/corosync-cpgtool /usr/sbin/corosync-fplay /usr/sbin/corosync-keygen #爲corosync生成authkey的命令,此命令是根據內核的熵池來生成認證文件的,若是熵池的隨機性不足,則會運行此命令後一直卡着,此時用戶只有不斷的敲擊鍵盤使產生足夠的隨機數後才能生成authkdy文件 /usr/sbin/corosync-notifyd /usr/sbin/corosync-objctl /usr/sbin/corosync-pload /usr/sbin/corosync-quorumtool /usr/share/doc/corosync-1.4.7 /usr/share/doc/corosync-1.4.7/LICENSE /usr/share/doc/corosync-1.4.7/SECURITY /usr/share/man/man5/corosync.conf.5.gz /usr/share/man/man8/confdb_keys.8.gz /usr/share/man/man8/corosync-blackbox.8.gz /usr/share/man/man8/corosync-cfgtool.8.gz /usr/share/man/man8/corosync-cpgtool.8.gz /usr/share/man/man8/corosync-fplay.8.gz /usr/share/man/man8/corosync-keygen.8.gz /usr/share/man/man8/corosync-notifyd.8.gz /usr/share/man/man8/corosync-objctl.8.gz /usr/share/man/man8/corosync-pload.8.gz /usr/share/man/man8/corosync-quorumtool.8.gz /usr/share/man/man8/corosync.8.gz /usr/share/man/man8/corosync_overview.8.gz /usr/share/snmp/mibs/COROSYNC-MIB.txt /var/lib/corosync /var/log/cluster
生成集羣節點間的認證文件:
[root@nod1 ~]# corosync-keygen #生成認證文件 Corosync Cluster Engine Authentication key generator. Gathering 1024 bits for key from /dev/random. Press keys on your keyboard to generate entropy. Press keys on your keyboard to generate entropy (bits = 80). #熵池隨機性不足時一直卡在這裏,這裏能夠另開窗口進行其餘的配置
提供corosync的配置文件,利用模板生成:
[root@nod1 ~]# cd /etc/corosync [root@nod1 corosync]# cp corosync.conf.example corosync.conf [root@nod1 corosync]# ls corosync.conf.example service.d corosync.conf corosync.conf.example.udpu uidgid.d [root@nod1 corosync]# vim corosync.conf # Please read the corosync.conf.5 manual page compatibility: whitetank #表示兼容whitetank版本,實際上是corosync 0.8以前的版本 totem { #定義集羣環境下各corosync間通信機制 version: 2 # secauth: Enable mutual node authentication. If you choose to # enable this ("on"), then do remember to create a shared # secret with "corosync-keygen". #secauth: off secauth: on #表示基於authkey的方式來驗證各節點 threads: 0 #啓動的線程數,0表示不啓動線程機制,默認便可 # interface: define at least one interface to communicate # over. If you define more than one interface stanza, you must # also set rrp_mode. interface { #定義哪一個接口來傳遞心跳信息和集羣事務信息 # Rings must be consecutively numbered, starting at 0. ringnumber: 0 #表示心跳信息發出後可以在網絡中轉幾圈,保持默認值便可 # This is normally the *network* address of the # interface to bind to. This ensures that you can use # identical instances of this configuration file # across all your cluster nodes, without having to # modify this option. bindnetaddr: 192.168.0.0 #綁定的網絡地址 # However, if you have multiple physical network # interfaces configured for the same subnet, then the # network address alone is not sufficient to identify # the interface Corosync should bind to. In that case, # configure the *host* address of the interface # instead: # bindnetaddr: 192.168.1.1 # When selecting a multicast address, consider RFC # 2365 (which, among other things, specifies that # 239.255.x.x addresses are left to the discretion of # the network administrator). Do not reuse multicast # addresses across multiple Corosync clusters sharing # the same network. mcastaddr: 239.255.21.111 #監聽的多播地址,不要使用默認 # Corosync uses the port you specify here for UDP # messaging, and also the immediately preceding # port. Thus if you set this to 5405, Corosync sends # messages over UDP ports 5405 and 5404. mcastport: 5405 #corosync間傳遞信息使用的端口,默認便可 # Time-to-live for cluster communication packets. The # number of hops (routers) that this ring will allow # itself to pass. Note that multicast routing must be # specifically enabled on most network routers. ttl: 1 #包的生存週期,保持默認便可 } } logging { # Log the source file and line where messages are being # generated. When in doubt, leave off. Potentially useful for # debugging. fileline: off # Log to standard error. When in doubt, set to no. Useful when # running in the foreground (when invoking "corosync -f") to_stderr: no # Log to a log file. When set to "no", the "logfile" option # must not be set. to_logfile: yes logfile: /var/log/cluster/corosync.log # Log to the system log daemon. When in doubt, set to yes. to_syslog: no #關閉日誌發往syslog # Log debug messages (very verbose). When in doubt, leave off. debug: off # Log messages with time stamps. When in doubt, set to on # (unless you are only logging to syslog, where double # timestamps can be annoying). timestamp: on #打印日誌時是否記錄時間戳,會消耗較多的cpu資源 logger_subsys { subsys: AMF debug: off } } #新增長如下內容 service { ver: 0 name: pacemaker #表示以插件化方式啓用pacemaker } aisexec { #運行openaix時所使用的用戶及組,默認時也是採用root,能夠不定義 user: root group: root }
當corosync-keygen命令順利運行完成後,在/etc/corosync/目錄下生成authkey認證文件:
[root@nod1 corosync]# ls authkey corosync.conf.example service.d corosync.conf corosync.conf.example.udpu uidgid.d [root@nod1 corosync]# scp authkey corosync.conf nod2.test.com:/etc/corosync/ #把認證文件與配置文件拷貝到另外一節點 [root@nod1 corosync]# service corosync start #啓動服務,不要忘記另外一個節點也要把corosync服務啓動
驗證corosync服務是否正常啓動,在集羣環境應對每一個服務器都要驗證:
驗證corosync是否啓動成功:
[root@nod1 corosync]# grep -e "Corosync Cluster Engine" /var/log/cluster/corosync.log #查看corosync集羣引擎是否啓動 Jul 19 21:45:48 corosync [MAIN ] Corosync Cluster Engine ('1.4.7'): started and ready to provide service. [root@nod1 corosync]# grep -e "configuration file" /var/log/cluster/corosync.log #查看corosync的配置文件是否成功加載 Jul 19 21:45:48 corosync [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
查看定義的TOTEM接口是否啓用:
[root@nod1 corosync]# grep "TOTEM" /var/log/cluster/corosync.log Jul 19 21:45:48 corosync [TOTEM ] Initializing transport (UDP/IP Multicast). Jul 19 21:45:48 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). Jul 19 21:45:48 corosync [TOTEM ] The network interface [192.168.0.201] is now up. Jul 19 21:45:48 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
驗證啓動時是否有錯誤:
[root@nod1 corosync]# grep "ERROR" /var/log/cluster/corosync.log Jul 19 21:45:48 corosync [pcmk ] ERROR: process_ais_conf: You have configured a cluster using the Pacemaker plugin for Corosync. The plugin is not supported in this environment and will be removed very soon. Jul 19 21:45:48 corosync [pcmk ] ERROR: process_ais_conf: Please see Chapter 8 of 'Clusters from Scratch' (http://www.clusterlabs.org/doc) for details on using Pacemaker with CMAN #上邊的錯誤信息能夠忽略,這裏報錯的信息主要意思是說pacemaker是以插件的方式配置的,在之後的版本中將再也不支持
驗證pacemaker是否正常啓動:
[root@nod1 corosync]# grep "pcmk_startup" /var/log/cluster/corosync.log Jul 19 21:45:48 corosync [pcmk ] info: pcmk_startup: CRM: Initialized Jul 19 21:45:48 corosync [pcmk ] Logging: Initialized pcmk_startup Jul 19 21:45:48 corosync [pcmk ] info: pcmk_startup: Maximum core file size is: 18446744073709551615 Jul 19 21:45:48 corosync [pcmk ] info: pcmk_startup: Service: 9 Jul 19 21:45:48 corosync [pcmk ] info: pcmk_startup: Local hostname: nod1.test.com
三、crmsh接口使用介紹
pacemaker的配置接口有兩種,一是crmsh,另外一個是pcs,主裏以crmsh的使用爲例。
crmsh依賴pssh這個包,因此兩個都須要分別在各個集羣節點上進行安裝,這兩個包能夠在這裏進行下載http://crmsh.github.io/
[root@nod1 ~]# ls crmsh-2.1-1.6.x86_64.rpm pssh-2.3.1-2.el6.x86_64.rpm [root@nod1 ~]# yum install crmsh-2.1-1.6.x86_64.rpm pssh-2.3.1-2.el6.x86_64.rpm
crmsh的crm命令有兩種模式:一種是命令模式,當執行一個命令,crmsh會把執行獲得的結果輸出到shell的標準輸出;另外一種是交互式模式;下邊將有大量的例子來講明。
crm命令的使用:
[root@nod1 ~]# crm #直接使用crm進入交互式模式 crm(live)# crm(live)# help #查看幫助信息獲取crm支持哪些子命令
crmsh經常使用的子命令:
status:查看集羣的狀態信息
configure:配置集羣的命令
node:管理節點狀態
ra:配置資源代理
resource:管理資源的子命令,好比關閉一個資源,清除資源的當前狀態(好比一些出錯信息)
接下來先查看一下集羣的狀態信息:
[root@nod1 ~]# crm crm(live)# status Last updated: Tue Jul 21 21:21:35 2015 Last change: Sun Jul 19 23:01:34 2015 Stack: classic openais (with plugin) #這裏表示基於插件化的方式用openais中的corosync調用pacemaker來工做的 Current DC: nod1.test.com - partition with quorum #Designated Coordinate簡稱DC,表示指定的協調員,這裏表示nod1.test.com就是集羣中的事務協調員,「partition with quorum」就表示當前分區是擁有法定票數的 Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes #表示配置了2個節點,預計的投票數爲2票 0 Resources configured #表示沒有配置集羣資源 Online: [ nod1.test.com nod2.test.com ] #這裏顯示兩個節點都是在線的
查看集羣默認的配置信息:
[root@nod1 ~]# crm crm(live)# configure crm(live)configure# show #使用show這個子命令就能查看當前集羣的配置信息,使用「show xml」能以xml文件格式顯示出當前的配置信息 node nod1.test.com node nod2.test.com property cib-bootstrap-options: \ dc-version=1.1.11-97629de \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes=2 \ stonith-enabled=true \ no-quorum-policy=stop \ last-lrm-refresh=1436887216 crm(live)configure# verify #verify是檢查配置文件是否有錯誤 error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity Errors found during check: config not valid #這裏報了一些錯誤,表示默認時沒有定義STONITH設備,在corosync+pacemaker的集羣是不容許的,固然能夠定義忽略這個檢查,下邊有介紹
用property子命令定義集羣的全局屬性:
[root@nod1 ~]# crm crm(live)configure# property #在crmsh接口中是支持tab鍵命令補全功能的,這裏輸入property後連續敲擊兩下tab鍵就可列出可配置的參數 batch-limit= maintenance-mode= remove-after-stop= cluster-delay= migration-limit= shutdown-escalation= cluster-recheck-interval= no-quorum-policy= start-failure-is-fatal= crmd-transition-delay= node-action-limit= startup-fencing= dc-deadtime= node-health-green= stonith-action= default-action-timeout= node-health-red= stonith-enabled= default-resource-stickiness= node-health-strategy= stonith-timeout= election-timeout= node-health-yellow= stop-all-resources= enable-acl= pe-error-series-max= stop-orphan-actions= enable-startup-probes= pe-input-series-max= stop-orphan-resources= is-managed-default= pe-warn-series-max= symmetric-cluster= load-threshold= placement-strategy= crm(live)configure# property stonith-enabled=false #把stonith設備的支持關閉,否則咱們在想使用corosync的集羣功能就須要定義stonith設備 crm(live)configure# show node nod1.test.com node nod2.test.com property cib-bootstrap-options: \ dc-version=1.1.11-97629de \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes=2 \ stonith-enabled=false \ #已經是false狀態 no-quorum-policy=stop \ last-lrm-refresh=1436887216 crm(live)configure# verify #再校驗配置就不會報錯了 crm(live)configure# commit #提交配置
集羣資源的配置
要想獲取資源的詳細信息就須要去ra(resource agent)中去查看,好比咱們要定義一個虛擬ip資源:
[root@nod1 ~]# crm crm(live)# ra crm(live)ra# classes #查看集羣資源有哪些類型 lsb ocf / heartbeat pacemaker service stonith crm(live)ra# list ocf #列出ocf這個類型下有哪些資源代理,下邊就有IPaddr這個關於設置ip的資源代理 CTDB ClusterMon Delay Dummy Filesystem HealthCPU HealthSMART IPaddr IPaddr2 IPsrcaddr LVM MailTo Route SendArp Squid Stateful SysInfo SystemHealth VirtualDomain Xinetd apache conntrackd controld db2 dhcpd ethmonitor exportfs iSCSILogicalUnit mysql named nfsnotify nfsserver pgsql ping pingd postfix remote rsyncd symlink tomcat crm(live)ra# meta ocf:IPaddr #使用meta子命令能獲取到一個資源的詳細信息,即此資源的使用幫助信息
定義主資源用primitive命令:
[root@nod1 ~]# crm crm(live)#configure crm(live)configure# primitive webip ocf:IPaddr params ip=192.168.0.100 crm(live)configure# verify crm(live)configure# commit #一旦提交成功,此資源就開始生效了 crm(live)configure# cd .. crm(live)# status Last updated: Tue Jul 21 22:14:43 2015 Last change: Tue Jul 21 22:12:44 2015 Stack: classic openais (with plugin) Current DC: nod1.test.com - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 1 Resources configured Online: [ nod1.test.com nod2.test.com ] webip(ocf::heartbeat:IPaddr):Started nod1.test.com #這裏就是咱們定義好的資源,在nod1.test.com節點啓用了 [root@nod1 ~]# ip add show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 1000 link/ether 00:0c:29:07:89:fe brd ff:ff:ff:ff:ff:ff inet 192.168.0.201/24 brd 192.168.0.255 scope global eth0 inet 192.168.0.100/24 brd 192.168.0.255 scope global secondary eth0 inet6 fe80::20c:29ff:fe07:89fe/64 scope link valid_lft forever preferred_lft forever #咱們定義的ip已生效
定義nginx的這個服務資源:
[root@nod1 ~]# crm crm(live)# configure crm(live)configure# primitive nginx lsb:nginx #nginx這個服務是在lsb這個資源類別下的資源代理,primitive命令後的第一個nginx是定義集羣資源的一個名稱 crm(live)configure# verify crm(live)configure# commit crm(live)configure# cd .. crm(live)# status Last updated: Tue Jul 21 22:25:00 2015 Last change: Tue Jul 21 22:24:58 2015 Stack: classic openais (with plugin) Current DC: nod1.test.com - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 2 Resources configured Online: [ nod1.test.com nod2.test.com ] webip(ocf::heartbeat:IPaddr):Started nod1.test.com nginx(lsb:nginx):Started nod2.test.com #nginx這個資源在nod2.test.com節點啓動起來了,這也驗證了在高可用集羣中集羣會盡量讓資源分攤到各個節點的特性,而在實際環境中咱們但願webip與nginx這兩個資源是運行在同一個節點上的。
要想讓多個資源同時運行在同一個節點上能夠把多個資源定義在一個group中或定義排列約束(colocation):
[root@nod1 ~]# crm crm(live)# configure crm(live)configure# group webservice webip nginx crm(live)configure# verify crm(live)configure# commit crm(live)configure# cd .. crm(live)# status Last updated: Tue Jul 21 22:30:19 2015 Last change: Tue Jul 21 22:30:17 2015 Stack: classic openais (with plugin) Current DC: nod1.test.com - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 2 Resources configured Online: [ nod1.test.com nod2.test.com ] Resource Group: webservice webip(ocf::heartbeat:IPaddr):Started nod1.test.com nginx(lsb:nginx):Started nod1.test.com #兩個資源同時運行在nod1.test.com上了
接下驗證資源是否能轉移到其餘節點上:
[root@nod1 ~]# crm node standby #把當前節點轉換成standby狀態 [root@nod1 ~]# crm status Last updated: Tue Jul 21 22:37:14 2015 Last change: Tue Jul 21 22:37:09 2015 Stack: classic openais (with plugin) Current DC: nod1.test.com - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 2 Resources configured Node nod1.test.com: standby Online: [ nod2.test.com ] Resource Group: webservice webip(ocf::heartbeat:IPaddr):Started nod2.test.com nginx(lsb:nginx):Started nod2.test.com #webservice組中的資源已轉換到了nod2.test.com節點上
再讓nod1.test.com從新上線,觀察資源是否能轉移回來:
[root@nod1 ~]# crm node online #讓當前節點從新上線 You have new mail in /var/spool/mail/root [root@nod1 ~]# crm status Last updated: Tue Jul 21 22:38:37 2015 Last change: Tue Jul 21 22:38:33 2015 Stack: classic openais (with plugin) Current DC: nod1.test.com - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 2 Resources configured Online: [ nod1.test.com nod2.test.com ] Resource Group: webservice webip(ocf::heartbeat:IPaddr):Started nod2.test.com nginx(lsb:nginx):Started nod2.test.com #webservice組資源沒有轉換到nod1.test.com,是由於沒有定義組對節點的傾向性
若是此時把nod2.test.com節點上的corosync服務中止,webservice這個組中的資源可以轉換到nod1.test.com節點上嗎?以下測試:
[root@nod2 ~]# service corosync stop Signaling Corosync Cluster Engine (corosync) to terminate: [肯定] Waiting for corosync services to unload:. [肯定] You have new mail in /var/spool/mail/root 在nod1.test.com節在上查看當前集羣的狀態: [root@nod1 ~]# crm status Last updated: Tue Jul 21 22:43:27 2015 Last change: Tue Jul 21 22:38:33 2015 Stack: classic openais (with plugin) Current DC: nod1.test.com - partition WITHOUT quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 2 Resources configured Online: [ nod1.test.com ] OFFLINE: [ nod2.test.com ]
從上邊的輸出信息可知資源並無轉移過來,爲何?仔細看上邊的「Current DC: nod1.test.com - partition WITHOUT quorum 」表示當前分區沒有法定的票數,因此此節點不會正常工做,資源固然不會轉移過來。那如何解決這個問題,方案不止一個,一是能夠增長一個ping node節點,二是能夠增長一個仲裁磁盤,三是讓集羣中的節點數成奇數個,四是直接忽略當集羣沒有法定票數時直接忽略,第四種是最簡單的方式,操做以下:
[root@nod2 ~]# service corosync start #先把nod2.test.com的corosync服務啓動 Starting Corosync Cluster Engine (corosync): [肯定] [root@nod1 ~]# crm crm(live)# status Last updated: Tue Jul 21 22:50:08 2015 Last change: Tue Jul 21 22:38:33 2015 Stack: classic openais (with plugin) Current DC: nod1.test.com - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 2 Resources configured Online: [ nod1.test.com nod2.test.com ] Resource Group: webservice webip(ocf::heartbeat:IPaddr):Started nod1.test.com nginx(lsb:nginx):Started nod1.test.com crm(live)configure# property no #敲擊兩下tab鍵後列出以no開頭的可配置參數 no-quorum-policy= node-health-green= node-health-strategy= node-action-limit= node-health-red= node-health-yellow= crm(live)configure# property no-quorum-policy= #輸入「no-quorum-policy=」再敲擊兩下tab鍵後列出一些幫助信息 no-quorum-policy (enum, [stop]): What to do when the cluster does not have quorum What to do when the cluster does not have quorum Allowed values: stop, freeze, ignore, suicide crm(live)configure# property no-quorum-policy=ignore #設置其值爲"ignore" crm(live)configure# verify crm(live)configure# commit crm(live)configure# show #顯示當前的配置信息 node nod1.test.com \ attributes standby=off node nod2.test.com primitive nginx lsb:nginx primitive webip IPaddr \ params ip=192.168.0.100 group webservice webip nginx property cib-bootstrap-options: \ dc-version=1.1.11-97629de \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes=2 \ stonith-enabled=false \ no-quorum-policy=ignore \ last-lrm-refresh=1436887216 crm(live)# status Last updated: Tue Jul 21 22:54:00 2015 Last change: Tue Jul 21 22:51:10 2015 Stack: classic openais (with plugin) Current DC: nod1.test.com - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 2 Resources configured Online: [ nod1.test.com nod2.test.com ] Resource Group: webservice webip(ocf::heartbeat:IPaddr):Started nod1.test.com nginx(lsb:nginx):Started nod1.test.com #當前資源已運行在nod1.test.com上
在nod1.test.com上中止corosync服務,再觀察資源是否能轉移到nod2.test.com上:
[root@nod1 ~]# service corosync stop Signaling Corosync Cluster Engine (corosync) to terminate: [ OK ] Waiting for corosync services to unload:. [ OK ] [root@nod2 ~]# crm #在nod2.test.com上進行crm管理接口 crm(live)# status Last updated: Tue Jul 21 22:56:52 2015 Last change: Tue Jul 21 22:52:25 2015 Stack: classic openais (with plugin) Current DC: nod2.test.com - partition WITHOUT quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 2 Resources configured Online: [ nod2.test.com ] OFFLINE: [ nod1.test.com ] Resource Group: webservice webip(ocf::heartbeat:IPaddr):Started nod2.test.com nginx(lsb:nginx):Started nod2.test.com #資源已成功轉移到nod2.test.com上,因此在兩個節點的高可用的環境,要設置「no-quorum-policy=ignore」,忽略節點的獲得的法定票數不大於一半時的狀況
若是是咱們把在nod2.test.com上的nginx進程殺掉,集羣資源會被轉移到nod1.test.com上嗎?以下測試:
[root@nod1 ~]# service corosync start #先把nod1.test.com上的corosync服務啓動 Starting Corosync Cluster Engine (corosync): [ OK ] [root@nod1 ~]# crm status Last updated: Wed Jul 22 22:22:56 2015 Last change: Wed Jul 22 22:19:55 2015 Stack: classic openais (with plugin) Current DC: nod2.test.com - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 2 Resources configured Online: [ nod1.test.com nod2.test.com ] Resource Group: webservice webip(ocf::heartbeat:IPaddr):Started nod2.test.com nginx(lsb:nginx):Started nod2.test.com 再切換到nod2.test.com節點上殺掉nginx進程: [root@nod2 ~]# pgrep nginx 1798 1799 [root@nod2 ~]# killall nginx #殺掉nginx進程 [root@nod2 ~]# pgrep nginx #檢驗nginx進程是否被殺掉,沒有任何信息輸出表示nginx進程已不存在 [root@nod2 ~]# crm status Last updated: Wed Jul 22 22:26:09 2015 Last change: Wed Jul 22 22:19:55 2015 Stack: classic openais (with plugin) Current DC: nod2.test.com - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 2 Resources configured Online: [ nod1.test.com nod2.test.com ] Resource Group: webservice webip(ocf::heartbeat:IPaddr):Started nod2.test.com nginx(lsb:nginx):Started nod2.test.com
上邊查看集羣狀態時發現資源仍是在nod2.test.com節點上,這在實際的生產環境中是不容許的,因此須要讓集羣能監控咱們定義的資源,若是發現某資源不存在了,本身會嘗試啓動這一資源,若是嘗試啓動不成功,則會轉移資源,下 邊就來講說如何定義監控資源。
[root@nod2 ~]# service nginx start #先把上邊殺掉的nginx啓動起來 正在啓動 nginx: [肯定]
要定義資源的監控時也是在用全局定義命令primitive定義資源時一同定義,接下來咱們先把以前定義的資源刪掉後從新定義:
[root@nod1 ~]# crm crm(live)# resource crm(live)resource# show Resource Group: webservice webip(ocf::heartbeat:IPaddr):Started nginx(lsb:nginx):Started #進入資源管理命令可查看當前集羣配置資源的狀況,上邊表示兩個資源都是處理started狀態 crm(live)resource# stop webservice #停掉webservice這個組中的全部資源,要刪除資源,必須讓資源處理stoppped狀態 crm(live)resource# show Resource Group: webservice webip(ocf::heartbeat:IPaddr):Stopped nginx(lsb:nginx):Stopped crm(live)resource# cd .. crm(live)# configure crm(live)configure# edit #輸入edit命令回車後會調用vi編輯器直接去編輯資源定義的配置文件,以下所示 node nod1.test.com \ attributes standby=on node nod2.test.com primitive nginx lsb:nginx #這是定義的資源,須要刪除 primitive webip IPaddr \ #這是定義的資源,須要刪除 params ip=192.168.0.100 group webservice webip nginx \ #這是定義的資源,須要刪除 meta target-role=Stopped property cib-bootstrap-options: \ dc-version=1.1.11-97629de \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes=2 \ stonith-enabled=false \ no-quorum-policy=ignore \ last-lrm-refresh=1436887216 #vim:set syntax=pcmk
在上邊打開的編輯窗口中刪除咱們本身定義的資源,再保存退出,最後保留的內容以下:
node nod1.test.com \ attributes standby=on node nod2.test.com property cib-bootstrap-options: \ dc-version=1.1.11-97629de \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes=2 \ stonith-enabled=false \ no-quorum-policy=ignore \ last-lrm-refresh=1436887216 #vim:set syntax=pcmk crm(live)configure# verify #檢查語法 crm(live)configure# commit #提交配置 crm(live)resource# cd #回到根目錄 crm(live)# status #查看集羣狀態 Last updated: Wed Jul 22 21:33:07 2015 Last change: Wed Jul 22 21:31:45 2015 Stack: classic openais (with plugin) Current DC: nod2.test.com - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 0 Resources configured Online: [ nod1.test.com nod2.test.com ]
從狀態信息輸出發現咱們定義的資源已被刪除了,如今開始從新定義帶監控的資源:
crm(live)configure# primitive webip ocf:IPaddr params ip=192.168.0.100 op monitor timeout=20s interval=60s crm(live)configure# primitive webserver lsb:nginx op monitor timeout=20s interval=60s crm(live)configure# group webservice webip webserver crm(live)configure# verify crm(live)configure# commit crm(live)configure# cd crm(live)# status Last updated: Wed Jul 22 22:29:59 2015 Last change: Wed Jul 22 22:28:01 2015 Stack: classic openais (with plugin) Current DC: nod2.test.com - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 2 Resources configured Online: [ nod1.test.com nod2.test.com ] Resource Group: webservice webip(ocf::heartbeat:IPaddr):Started nod1.test.com webserver(lsb:nginx):Started nod1.test.com
這樣帶監控的資源就定義好了,上邊在定義監控是的那些參數的意義能夠在使用相似的命令查看「crm(live)ra# meta ocf:IPaddr」。如今咱們再到nod1.test.com節點上把nginx殺掉,觀察會發生什麼現象:
[root@nod1 ~]# pgrep nginx 3056 3063 [root@nod1 ~]# killall nginx [root@nod1 ~]# pgrep nginx [root@nod1 ~]# pgrep nginx [root@nod1 ~]# pgrep nginx #等了幾十秒後,nginx又被從新啓動了 3337 3338
再看一下集羣的狀態信息,以下:
[root@nod1 ~]# crm status Last updated: Wed Jul 22 22:33:29 2015 Last change: Wed Jul 22 22:28:01 2015 Stack: classic openais (with plugin) Current DC: nod2.test.com - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 2 Resources configured Online: [ nod1.test.com nod2.test.com ] Resource Group: webservice webip(ocf::heartbeat:IPaddr):Started nod1.test.com webserver(lsb:nginx):Started nod1.test.com Failed actions: webserver_monitor_60000 on nod1.test.com 'not running' (7): call=23, status=complete, last-rc-change='Wed Jul 22 22:32:02 2015', queued=0ms, exec=0ms #這裏報告了webserver這個資源沒有運行
若是咱們kill掉nginx後,讓nginx沒法啓動,又是怎樣一個狀況呢,咱們這樣來測試,把nginx殺掉後,馬上去修改nginx的配置文件,隨便增長一些行,讓nginx的配置文件沒法經過語法檢測,這樣天然nginx就沒法啓動了,說作就作:
[root@nod1 ~]# killall nginx [root@nod1 ~]# echo "test" >> /etc/nginx/nginx.conf [root@nod1 ~]# nginx -t nginx: [emerg] unexpected end of file, expecting ";" or "}" in /etc/nginx/nginx.conf:44 nginx: configuration file /etc/nginx/nginx.conf test failed [root@nod1 ~]# crm status Last updated: Wed Jul 22 22:37:42 2015 Last change: Wed Jul 22 22:28:01 2015 Stack: classic openais (with plugin) Current DC: nod2.test.com - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 2 Resources configured Online: [ nod1.test.com nod2.test.com ] Resource Group: webservice webip(ocf::heartbeat:IPaddr):Started nod2.test.com #看這裏資源被轉移到nod2.test.com了 webserver(lsb:nginx):Started nod2.test.com Failed actions: webserver_start_0 on nod1.test.com 'unknown error' (1): call=30, status=complete, last-rc-change='Wed Jul 22 22:37:02 2015', queued=0ms, exec=70ms #這裏也報告一個未知的錯誤
上邊的兩個測試證實,集羣對資源能實現監控,並在資源不可用時能測試從新啓動資源,若是不成功則轉移資源。測試完了不要忘記恢復nod1.test.com節點上的nginx配置。
四、資源約束
資源約束定義咱們指望資源運行在某一個節點上,或指望某些資源會在一塊兒,而不使用組的方式定義。
接着上邊的實驗,咱們但願webip與webserver這兩個資源始終是在一塊兒的,而不用定義webservice這個group來實現,那作以下操做:
[root@nod1 ~]# crm crm(live)# status Last updated: Wed Jul 22 22:46:26 2015 Last change: Wed Jul 22 22:28:01 2015 Stack: classic openais (with plugin) Current DC: nod2.test.com - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 2 Resources configured Online: [ nod1.test.com nod2.test.com ] Resource Group: webservice webip(ocf::heartbeat:IPaddr):Started nod2.test.com webserver(lsb:nginx):Started nod2.test.com Failed actions: webserver_start_0 on nod1.test.com 'unknown error' (1): call=30, status=complete, last-rc-change='Wed Jul 22 22:37:02 2015', queued=0ms, exec=70ms
先把上邊資源的報錯信息清理掉:
[root@nod1 ~]# crm crm(live)# resource crm(live)resource# cleanup webserver #清理資源的一些狀態信息 Cleaning up webserver on nod1.test.com Cleaning up webserver on nod2.test.com Waiting for 2 replies from the CRMd.. OK crm(live)resource# cd crm(live)# status Last updated: Wed Jul 22 22:47:53 2015 Last change: Wed Jul 22 22:47:47 2015 Stack: classic openais (with plugin) Current DC: nod2.test.com - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 2 Resources configured Online: [ nod1.test.com nod2.test.com ] Resource Group: webservice webip(ocf::heartbeat:IPaddr):Started nod2.test.com webserver(lsb:nginx):Started nod2.test.com
接下來刪除webservice這個組資源:
[root@nod1 ~]# crm crm(live)# resource crm(live)resource# status Resource Group: webservice webip(ocf::heartbeat:IPaddr):Started webserver(lsb:nginx):Started crm(live)configure# delete webservice #刪除組資源 crm(live)configure# verify crm(live)configure# commit crm(live)# status Last updated: Wed Jul 22 23:00:13 2015 Last change: Wed Jul 22 23:00:09 2015 Stack: classic openais (with plugin) Current DC: nod2.test.com - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 2 Resources configured Online: [ nod1.test.com nod2.test.com ] webip(ocf::heartbeat:IPaddr):Started nod1.test.com #組被刪除後,兩個資源被集羣平均分佈在各節點上 webserver(lsb:nginx):Started nod2.test.com #webserver運行在nod2.test.com上
4.一、定義排列約束(colocation)
排列約束是定義讓兩個資源是否在一塊兒:
[root@nod1 ~]# crm crm(live)#configure crm(live)configure# help colocation #查看colocation幫助信息 crm(live)configure# colocation webserver_with_webip inf: webserver webip #這裏表示webserver資源與webip在一塊兒的多是正無窮的,即兩資源必定要在一塊兒 crm(live)configure# show xml #查看咱們定義的約束 crm(live)configure# verify crm(live)configure# commit crm(live)configure# cd .. crm(live)# status Last updated: Wed Jul 22 23:09:11 2015 Last change: Wed Jul 22 23:09:08 2015 Stack: classic openais (with plugin) Current DC: nod2.test.com - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 2 Resources configured Online: [ nod1.test.com nod2.test.com ] webip(ocf::heartbeat:IPaddr):Started nod1.test.com #如今兩個資源又都在nod1.test.com上運行了 webserver(lsb:nginx):Started nod1.test.com
4.二、定義順序約束(order)
順序約束表示資源的啓動按照必定的順序進行,而關閉則是一個相反的過程:
[root@nod1 ~]# crm crm(live)configure# help order #查看幫助 crm(live)configure# order webip_before_webserver mandatory: webip webserver #表示webip先於webserver啓動,詳細請看幫助信息 crm(live)configure# verify crm(live)configure# commit crm(live)configure# show xml #查看定義的詳情
4.三、定義位置約束(location)
位置約束表示資源更傾向運行在哪一個節點上。
[root@nod1 ~]# crm crm(live)# status Last updated: Wed Jul 22 23:20:08 2015 Last change: Wed Jul 22 23:15:39 2015 Stack: classic openais (with plugin) Current DC: nod2.test.com - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 2 Resources configured Online: [ nod1.test.com nod2.test.com ] webip(ocf::heartbeat:IPaddr):Started nod1.test.com #此時資源是運行在nod1.test.com上的 webserver(lsb:nginx):Started nod1.test.com
定義位置約束讓資源更傾向運行在nod2.test.com上:
[root@nod1 ~]# crm crm(live)# configure crm(live)configure# help location #查看幫助信息 crm(live)configure# location webip_on_nod2 webip inf: nod2.test.com #表示webip在nod2.test.com上的傾向性是正無窮的 crm(live)configure# verify crm(live)configure# commit crm(live)configure# cd crm(live)# status Last updated: Wed Jul 22 23:23:21 2015 Last change: Wed Jul 22 23:22:50 2015 Stack: classic openais (with plugin) Current DC: nod2.test.com - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 2 Resources configured Online: [ nod1.test.com nod2.test.com ] webip(ocf::heartbeat:IPaddr):Started nod2.test.com webserver(lsb:nginx):Started nod2.test.com
上邊webip與webserver資源都已轉移到了nod2.test.com,但webserver資源咱們並無定義它的位置約束,爲何它也轉移到了nod2.test.com上了呢?由於咱們定義過webip與webserver的排序約束,這兩個資源在一塊兒的分數(score)是inf(正無窮)的,因此webip在哪裏,webserver就在哪裏。
location的定義還有另一種格式,以下:
[root@nod1 ~]# crm crm(live)configure# delete webip_on_nod2 #先刪除上邊定義的location crm(live)configure# verify crm(live)configure# commit crm(live)configure# location webip_on_nod1 webip rule inf: #uname eq nod1.test.com #表示webip運行在名稱爲nod1.test.com主機上的傾向性是正無窮的 crm(live)configure# verify crm(live)configure# commit crm(live)configure# cd crm(live)# status Last updated: Wed Jul 22 23:33:38 2015 Last change: Wed Jul 22 23:33:18 2015 Stack: classic openais (with plugin) Current DC: nod2.test.com - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 2 Resources configured Online: [ nod1.test.com nod2.test.com ] webip(ocf::heartbeat:IPaddr):Started nod1.test.com webserver(lsb:nginx):Started nod1.test.com #上邊的兩個資源又轉移到了nod1.test.com節點上。
接着再來定義一個location:
crm(live)configure# location webserver_not_on_nod1 webserver rule -inf: #uname eq nod1.test.com #這裏表示webserver資源不在nod1上的分數是負無窮 crm(live)configure# verify crm(live)configure# commit crm(live)configure# cd crm(live)# status Last updated: Wed Jul 22 23:41:25 2015 Last change: Wed Jul 22 23:41:19 2015 Stack: classic openais (with plugin) Current DC: nod2.test.com - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 2 Resources configured Online: [ nod1.test.com nod2.test.com ] webip(ocf::heartbeat:IPaddr):Started nod2.test.com webserver(lsb:nginx):Started nod2.test.com
webip與webserver從nod1.test.com上轉移到了nod2.test上,爲何呢?雖然定義了webserver資源不在nod1上的分數是負無窮,但咱們不是定義了webip對nod1.test.com的傾向性是正無窮麼,這個「inf+(-inf)」等於什麼呢?答案是「-inf」,因此資源絕對不會在nod1.test.com上。
五、案例
一個高可用集羣通常會包含三類資源,一是虛擬ip,二是服務,三是共享存儲,下邊咱們再把共享存儲加上來一塊兒說說高可用的實現,因有新的資源加入,在資源的約束上又會有所不一樣,因此先把上邊的定義的ip資源、服務資源刪除,從新來講說有三種資源的高可用性,怎樣刪除集羣中的資源這裏就再也不贅述了,能夠看看前邊的操做。
資源刪除後就是一個乾淨的集羣,以下所示:
[root@nod1 ~]# crm crm(live)# status Last updated: Fri Jul 24 20:58:49 2015 Last change: Fri Jul 24 20:58:32 2015 Stack: classic openais (with plugin) Current DC: nod1.test.com - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 0 Resources configured Online: [ nod1.test.com nod2.test.com ]
接下來準備共享存儲,這裏以nod0.test.com這個節點提供NFS共享存儲爲例:
[root@nod0 ~]# yum -y install nfs-utils [root@nod0 ~]# vim /etc/exports /web/htdocs 192.168.0.0/24(rw) [root@nod0 ~]# mkdir -pv /web/htdocs [root@nod0 ~]# vim /web/htdocs/index.html [root@nod0 ~]# service rpcbind start Starting rpcbind: [ OK ] [root@nod0 ~]# service nfs start Starting NFS services: [ OK ] Starting NFS mountd: [ OK ] Starting NFS daemon: [ OK ] Starting RPC idmapd: [ OK ] [root@nod0 ~]# vim /etc/exports /web/htdocs 192.168.0.0/24(rw,no_root_squash) [root@nod0 ~]# mkdir -pv /web/htdocs/ [root@nod0 ~]# echo "<h>NFS node</h>" > /web/htdocs/index.html #這是提供的測試頁面 [root@nod2 ~]# mount -t nfs 192.168.0.200:/web/htdocs /usr/share/nginxhtml/ #nfs第一次掛載很慢,因此先手動掛載一次
再在nod2.test.com上啓動nginx,測試一下可否訪問nod2.test.com節點上的ip:192.168.0.202測試頁面:
[root@nod2 ~]# service nginx start 正在啓動 nginx: [肯定]
測試經過了要關閉nginx服務,卸載共享存儲: [root@nod2 ~]# umount /usr/share/nginx/html/ You have new mail in /var/spool/mail/root [root@nod2 ~]# service nginx stop 中止 nginx: [肯定]
接下來就去定義高可用集羣的資源了:
[root@nod1 ~]# crm crm(live)# configure crm(live)configure# primitive webip ocf:IPaddr params ip=192.168.0.100 op monitor timeout=10s interval=30s crm(live)configure# primitive webserver lsb:nginx op monitor timeout=10s interval=30s crm(live)configure# primitive webstore ocf:Filesystem params device="192.168.0.200:/web/htdocs" directory="/usr/share/nginx/html" fstype="nfs" op monitor timeout=30s interval=60s crm(live)configure# verify WARNING: webip: specified timeout 10s for monitor is smaller than the advised 20s WARNING: webserver: specified timeout 10s for monitor is smaller than the advised 15 WARNING: webstore: default timeout 20s for start is smaller than the advised 60 #表示nfs共享存儲要定義start時的超時時間,默認是20s,但建議是60s WARNING: webstore: default timeout 20s for stop is smaller than the advised 60 #表示nfs共享存儲要定義stop時的超時時間,默認是20s,但建議是60s WARNING: webstore: specified timeout 30s for monitor is smaller than the advised 40 在校驗時報了以下錯誤,大概是說在設置資源時監控的時間值不對,按照提示作修改就是 crm(live)configure# cd .. There are changes pending. Do you want to commit them (y/n)? n #這裏不要提交,固然也能夠用"edit"命令調用vi編輯器去編輯xml文件 crm(live)# configure #進入配置模式從新定義資源 crm(live)configure# primitive webip ocf:IPaddr params ip=192.168.0.222 op monitor timeout=20s interval=30s crm(live)configure# verify crm(live)configure# primitive webserver lsb:nginx op monitor timeout=15s interval=30s crm(live)configure# verify crm(live)configure# primitive webstore ocf:Filesystem params device="192.168.0.200:/web/htdocs" directory="/usr/share/nginx/html" fstype="nfs" op monitor timeout=30s interval=60s op start timeout=60s op stop timeout=60s crm(live)configure# verify WARNING: webstore: specified timeout 30s for monitor is smaller than the advised 40 #這裏還有一個值設置不對 crm(live)configure# edit #直接進入編輯模式進行修改,修改後就是下邊這樣 node nod1.test.com \ attributes standby=off node nod2.test.com \ attributes standby=off primitive webip IPaddr \ params ip=192.168.0.222 \ op monitor timeout=20s interval=30s primitive webserver lsb:nginx \ op monitor timeout=15s interval=30s primitive webstore Filesystem \ params device="192.168.0.200:/web/htdocs" directory="/usr/share/nginx/html" fstype=nfs \ op monitor timeout=40s interval=60s \ op start timeout=60s interval=0 \ op stop timeout=60s interval=0 property cib-bootstrap-options: \ dc-version=1.1.11-97629de \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes=2 \ stonith-enabled=false \ no-quorum-policy=ignore \ last-lrm-refresh=1437576541 #vim:set syntax=pcmk #記得保存退出 crm(live)configure# verify #如今校驗就沒有錯誤了 crm(live)configure# commit #提交配置
接下來定義三個資源的一些約束,思考一下,有VIP、有服務、有共享存儲的一個高可用集羣須要怎樣一些約束關係呢?第一:集羣在正常工做時三個資源應該是運行在一個節點上的,而三個資源間又有一些小的約束關係,VIP要與服務(nginx)要在一塊兒,服務(nginx)要與共享存儲在一塊兒,這些能夠用排列約束(colocation),也能夠用組(group)的方式實現;第二:各個資源的啓動次序,VIP應該是先於服務啓動,共享存儲得先掛載上才啓動服務吧;接下來就去定義這些:
crm(live)configure# group webservice webip webstore webserver #定義一個組包含三個資源 crm(live)configure# order webip_before_webstore_before_webserver inf: webip webstore webserver #定義順序約束,定義這三個資源的啓動順序必定(inf)是先啓動webip,接着是webstore,最後是webserver,而關閉則是相反的過程 crm(live)configure# verify crm(live)configure# show xml # 查看配置的xml文件
若是這三個資源對集羣節點沒有傾向性那就直接能夠commit了,特別是在當今虛擬化氾濫的年代,高可用同樣部署xem、kvm、openstack這樣的虛擬環境下,集羣資源對虛擬資源的傾向性表現得不明顯了。
crm(live)configure# commit crm(live)configure# cd .. crm(live)# status Last updated: Fri Jul 24 22:25:06 2015 Last change: Fri Jul 24 22:25:02 2015 Stack: classic openais (with plugin) Current DC: nod1.test.com - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 3 Resources configured Online: [ nod1.test.com nod2.test.com ] Resource Group: webservice webip(ocf::heartbeat:IPaddr):Started nod1.test.com webstore(ocf::heartbeat:Filesystem):Started nod1.test.com webserver(lsb:nginx):Started nod1.test.com #從上邊的輸出信息可知資源運行在了nod1.test.com這個節點上了
如今訪問http服務測試一下,訪問的是咱們定義的VIP,以下:
如今測試一下集羣資源是否能正常轉移,把nod1.test.com節點置於standby狀態,看資源是否能轉移到nod2.test.com節點上:
[root@nod1 ~]# crm node standby [root@nod1 ~]# crm status Last updated: Fri Jul 24 22:28:03 2015 Last change: Fri Jul 24 22:27:53 2015 Stack: classic openais (with plugin) Current DC: nod1.test.com - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 3 Resources configured Node nod1.test.com: standby Online: [ nod2.test.com ] Resource Group: webservice webip(ocf::heartbeat:IPaddr):Started nod2.test.com webstore(ocf::heartbeat:Filesystem):Started nod2.test.com webserver(lsb:nginx):Started nod2.test.com #上邊輸出信息中看到資源都轉移到了nod2.test.com上了
再去刷新一下訪問頁面:以下依然是有效的,以下:
經測試資源能正常切換,接下來還要測試定義的資源監控是否生效,能夠去嘗試中止nginx服務或umount共享存儲,等監控資源的時間到時集羣就會嘗試從新啓動服務或掛載共享存儲:
[root@nod2 ~]# service nginx stop 中止 nginx: [肯定]
過一會後,集羣就監控到異常了,以下:
[root@nod1 ~]# crm status Last updated: Fri Jul 24 22:42:05 2015 Last change: Fri Jul 24 22:36:00 2015 Stack: classic openais (with plugin) Current DC: nod1.test.com - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 3 Resources configured Online: [ nod1.test.com nod2.test.com ] Resource Group: webservice webip(ocf::heartbeat:IPaddr):Started nod2.test.com webstore(ocf::heartbeat:Filesystem):Started nod2.test.com webserver(lsb:nginx):Started nod2.test.com Failed actions: webserver_monitor_30000 on nod2.test.com 'not running' (7): call=41, status=complete, last-rc-change='Fri Jul 24 22:37:25 2015', queued=0ms, exec=0ms
再來測試一下共享存儲是否能監控並恢復,以下:
[root@nod2 ~]# umount /usr/share/nginx/html/
如今去訪問web,就是打開nginx的默認頁面了,以下:
當檢測時間一到,集羣就會發現異常,並嘗試恢復,以下:
[root@nod1 ~]# crm status Last updated: Fri Jul 24 22:44:03 2015 Last change: Fri Jul 24 22:36:00 2015 Stack: classic openais (with plugin) Current DC: nod1.test.com - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 3 Resources configured Online: [ nod1.test.com nod2.test.com ] Resource Group: webservice webip(ocf::heartbeat:IPaddr):Started nod2.test.com webstore(ocf::heartbeat:Filesystem):Started nod2.test.com webserver(lsb:nginx):Started nod2.test.com Failed actions: webserver_monitor_30000 on nod2.test.com 'not running' (7): call=41, status=complete, last-rc-change='Fri Jul 24 22:37:25 2015', queued=0ms, exec=0ms webstore_monitor_60000 on nod2.test.com 'not running' (7): call=39, status=complete, last-rc-change='Fri Jul 24 22:42:55 2015', queued=0ms, exec=0ms
如今訪問web頁面又恢復了,以下:
至此,corosync+pacemaker+crmsh的高可用的實現已演示完畢。
六、總結
做爲一個Linux運維工程師,掌握高可用架構是必不可少的技能,剛學習高可用時感受那些理論知識就很差理解,可如今把上邊的實驗作完後,感受對高可用架構有了新的認識,並對上一博客中提到的理論知識也有了新的認識。
在利用corosync+pacemaker且是兩個節點實現高可用時,須要注意的是要設置全局屬性把stonith設備關閉,忽略法定票數不大於一半的機制,即:
crm(live)configure# property no-quorum-policy=ignore crm(live)configure# property stonith-enabled=false