1、集羣簡介node
引自suse官方關於corosync的高可用集羣的框架圖:git
由圖,咱們能夠看到,suse官方將集羣的Architecture Layers分紅四層。最低層Messaging/Infrastructure Layer提供了HeartBeat節點間傳遞心跳信息,即爲心跳層。第二層Membership Layer層爲集羣事務決策層,決定了哪些節點做爲集羣節點,並傳遞給集羣內全部節點,若是集羣中成員擁有的法定票數不大於半數,該怎麼做出決策等,通俗點講,就是投票系統,同時,還提供了構建成員關係的視圖。第三層Resource Allocation Layer爲資源分配層,包含了crm,cib;crm做爲資源分配層的核心組件,包括了Local Resource Manager、Transition Engine、Policy Engine三大組件,這三大組件都是在crm的基礎之上得以實現的,同時,每一個節點的crm還維持本節點的cib。在crm中選出一個節點做爲DC(Designated Coordinator),DC負責維持主cib,因此全部cib的修改都由DC來實現,然後DC同步給其餘節點,一個集羣只有一個DC。cib是一個在內存中xml格式書寫的保存着集羣配置各條目的配置信息(集羣狀態,各節點,各資源,約束關係),可使用gui修改,也可使用crm_sh命令行修改。Transition Engine和Policy Engine,這兩個組件只有DC擁有,PE是當集羣狀態改變時,根據配置文件中的約束信息,節點粘性計算要轉移到的節點一應狀態信息,並寫入cib,TE根據PE做出的指示進行轉移資源等操做。DC將改變的狀態信息傳至各crm,各節點的crm將要做出的改變傳給lrm做出相應的更改,lrm有start|stop|monitor三種狀態信息,在接受crm指令做出更改後,將狀態信息傳回crm。說到lrm了,下面咱們來看看第四層Resources Layer層,即資源代理層,負責根據lrm傳來的指令,找到對應的執行腳本執行,資源代理層有lsb格式,ocf格式的腳本,ocf相對於lsb,除了能接收更多參數外,還能提供監控功能。github
HA Cluster的工做模型web
A/P:兩節點集羣,active,passive:工做於主備模型shell
HA Services一般只有一個;HA resources可能會有多個apache
A/A:兩節點集羣,active/active;工做於雙主模型bootstrap
N-M:N個節點,M個服務;一般N>Mvim
N-N:N個節點,N個服務安全
HA Cluster的組合方式:bash
heartbeat v2(v1 crm)
heartbeat v2(v2 crm)
corosync + pacemaker
cman + rgmanager
2、corosync + pacemaker
corosync + pacemaker框架圖(引自馬哥文檔)
Pacemaker Stack棧由衆多組件組成,其主要組件爲corosync + pacemaker。corosync能實現底層傳遞心跳信息,但不提供資源管理器,所以,由工做於corosync之上的pacemaker提供資源管理的功能,並藉助Resource Agents和Cluster Glue爲不具有高可用的服務與資源提供高可用服務。
框架圖的箭頭描述了編譯安裝過程當中的依賴關係,首先安裝corosync和cluster glue,其次安裝Resource Agents,以後再安裝pacemaker,接着安裝distibuted lock manager(分佈式鎖管理器),最後安裝cLVM2,GFS2,OCFS2(非必須)。cLVM2,GFS2,OCFS2提供集羣文件系統。
colosync
OpenAIS:Application Interface Standard 開放應用接口標準
提供了一種集羣模式,包含集羣框架,集羣成員管理,通訊方式,集羣監測,但沒有集羣資源管理。因其爲開源的,因此每種分支包含的組件不一樣。其主要分支有:picacho,witetank,wilson。colosync是由wilson把關於OpenAIS的功能獨立出來造成的項目,實際上colosync是一個集羣管理引擎,只是OpenAIS的一個子組件。今後OpenAIS分裂爲兩個項目,corosync和wilson。
pacemaker
(引自pacemaker官方站點http://clusterlabs.org/wiki/Main_Page)
上圖描述了corosync + pacemaker的四種工做模型。
配置前準備工做
等同HA集羣部署前的準備工做 ----> 看這裏
實現web高可用 ----> 看這裏
heartbeatV3編譯安裝 ----> 看這裏
MYSQL高可用 ----> 看這裏
ipvs+ldirectord實現ipvs高可用----> 看這裏
3、corosync + pacemaker部署
配置狀況:
系統版本:CentOS6.7 x86_64
corosync.x86_64 0:1.4.7-5.el6
pacemaker.x86_64 0:1.1.14-8.el6_8.2
node1:192.168.0.15
node2:192.168.0.16
web server VIP:192.168.0.25
node1:
[root@node1 ~]# yum install corosync pacemaker [root@node1 ~]# cd /etc/corosync/ [root@node1 corosync]# cp corosync.conf.example corosync.conf [root@node1 corosync]# vim corosync.conf ##totem定義集羣內各節點間是如何通訊的,totem本是一種協議,專用於corosync專用於各節點間的協議,協議是有版本的 totem { ##版本號 version: 2 ##安全認證on|off,使用corosync-keygen命令生成密鑰 secauth: on ##用於安全認證開啓的線程,0爲不基於線程模式工做 threads: 0 ##接口設置 interface { ##環數量,若是一個主機有多塊網卡,避免心跳信息迴流 ringnumber: 0 ##多播監聽的網絡地址(本機網段爲192.168.0.0/24) bindnetaddr: 192.168.0.0 ##多播地址,可設置爲239.255.x.x mcastaddr: 239.165.17.17 ##多播地址監聽的端口 mcastport: 5405 ##生存時間爲1 ttl: 1 } } ##定義日誌信息 logging { ##是否記錄fileline fileline: off ##是否將日誌發往標準錯誤輸出(屏幕) to_stderr: no ##是否記錄在日誌文件中 to_logfile: yes ##日誌文件目錄 logfile: /var/log/cluster/corosync.log ##是否將日誌發往系統日誌 to_syslog: no ##調試 debug: off ##是否打開時間戳 timestamp: on ##是否包含AMF組件的日誌信息 logger_subsys { subsys: AMF debug: off } }
pacemaker與corosync結合有兩種方式,一是做爲corosync的插件,二是pacemaker做爲單獨的服務。不過通常在CentOS6的系統上通常是將pacemaker作爲corosync的插件,用corosync啓動pacemaker服務。所以,須要在corosync的配置文件中添加關於pacemaker的內容。
service { ver: 0 name: pacemaker ##是否讓pacemaker獨立啓動mgmtd進程(可省略) use_mgmtd: yes } ##指明運行插件的用戶與組(可省略) aisexec{ user: root group: root }
驗證網卡是否支持多播,如沒有MULTICAST,手動打開
[root@node1 corosync]# ip link show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:0c:29:de:67:fa brd ff:ff:ff:ff:ff:ff
建立安全認證文件,如/dev/random中的隨機數不足1024,敲擊鍵盤使其增長值1024
[root@node1 corosync]# corosync-keygen Corosync Cluster Engine Authentication key generator. Gathering 1024 bits for key from /dev/random. Press keys on your keyboard to generate entropy. Writing corosync key to /etc/corosync/authkey.
node2:
[root@node2 ~]# yum install corosync pacemaker 將node1的配置文件複製到node2 [root@node1 corosync]# scp -p authkey corosync.conf node2:/etc/corosync/ authkey 100% 128 0.1KB/s 00:00 corosync.conf 100% 2757 2.7KB/s 00:00
啓動服務
[root@node1 corosync]# service corosync start; ssh node2 'service corosync start'
查看corosync引擎是否正常啓動:
[root@node1 corosync]# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log Nov 21 12:36:25 corosync [MAIN ] Corosync Cluster Engine ('1.4.7'): started and ready to provide service. Nov 21 12:36:25 corosync [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'
查看初始化成員節點通知是否正常發出:
[root@node1 corosync]# grep TOTEM /var/log/cluster/corosync.log Nov 21 12:36:26 corosync [TOTEM ] Initializing transport (UDP/IP Multicast). Nov 21 12:36:26 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
檢查啓動過程當中是否有錯誤產生:
[root@node1 corosync]# grep ERROR: /var/log/cluster/corosync.log | grep -v unpack_resources ##下面的錯誤信息表示packmaker不久以後將再也不做爲corosync的插件運行,所以,建議使用cman做爲集羣基礎架構服務;此處可安全忽略。 Nov 21 12:36:26 corosync [pcmk ] ERROR: process_ais_conf: You have configured a cluster using the Pacemaker plugin for Corosync. The plugin is not supported in this environment and will be removed very soon. Nov 21 12:36:26 corosync [pcmk ] ERROR: process_ais_conf: Please see Chapter 8 of 'Clusters from Scratch' (http://www.clusterlabs.org/doc) for details on using Pacemaker with CMAN Nov 21 12:36:28 corosync [pcmk ] ERROR: pcmk_wait_dispatch: Child process mgmtd exited (pid=1859, rc=100)
查看pacemaker是否正常啓動:
[root@node1 corosync]# grep pcmk_startup /var/log/cluster/corosync.log Nov 21 12:36:26 corosync [pcmk ] info: pcmk_startup: CRM: Initialized Nov 21 12:36:26 corosync [pcmk ] Logging: Initialized pcmk_startup Nov 21 12:36:26 corosync [pcmk ] info: pcmk_startup: Maximum core file size is: 18446744073709551615 Nov 21 12:36:26 corosync [pcmk ] info: pcmk_startup: Service: 9 Nov 21 12:36:26 corosync [pcmk ] info: pcmk_startup: Local hostname: node1
4、安裝crmsh
官方下載地址:https://github.com/ClusterLabs/crmsh
發行版本rpm包:http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/
[root@node1 ~]# vim /etc/yum.repos.d/CentOS-Base.repo [network_ha-clustering_Stable] name=Stable High Availability/Clustering packages (CentOS_CentOS-6) type=rpm-md baseurl=http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/ gpgcheck=1 gpgkey=http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6//repodata/repomd.xml.key enabled=1 #配置好yum源後能夠直接yum了,注意crmsh和pssh都要裝上 [root@node1 ~]# yum install crmsh [root@node1 ~]# yum install pssh* #查看crm的狀態 [root@node1 corosync]# crm status Last updated: Mon Nov 21 14:50:27 2016 Last change: Mon Nov 21 14:50:26 2016 by hacluster via crmd on node1 Stack: classic openais (with plugin) Current DC: node1 (version 1.1.14-8.el6_8.2-70404b0) - partition with quorum 2 nodes and 0 resources configured, 2 expected votes Online: [ node1 node2 ]
5、crmsh命令簡介
一、查看配置信息 crm(live)# configure crm(live)configure# show node node1 node node2 property cib-bootstrap-options: \ dc-version=1.1.14-8.el6_8.2-70404b0 \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes=2 二、由於無stonith設備,因此禁用stonith,stonith的值爲布爾類型,true和false crm(live)configure# property stonith-enabled=false 三、提交配置 crm(live)configure# commit 四、介紹node子命令 crm(live)configure# cd .. #返回上一級 crm(live)# node crm(live)node# help attribute Manage attributes clearstate Clear node state #清理當前node的狀態信息 delete Delete node #刪除節點 fence Fence node maintenance Put node into maintenance mode online Set node online #將當前節點從新上線,standby轉爲online ready Put node into ready mode server Show node hostname or server address show Show node #顯示當前全部節點 standby Put node into standby #將當前節點轉爲備用 status Show nodes' status as XML status-attr Manage status attributes utilization Manage utilization attributes cd Navigate the level structure help Show help (help topics for list of topics) ls List levels and commands #指明級別和子命令列出當前可用節點 quit Exit the interactive shell up Go back to previous level 五、介紹resource子命令 crm(live)# resource crm(live)resource# help ban Ban a resource from a node #禁止資源在一個節點 cleanup Cleanup resource status #清理資源狀態 constraints Show constraints affecting a resource #顯示影響資源的約束 demote Demote a master-slave resource #降級主從資源 failcount Manage failcounts #管理員錯誤狀態統計數據 locate Show the location of resources #顯示資源的位置 maintenance Enable/disable per-resource maintenance mode #啓用/禁用每一個資源維護模式 manage Put a resource into managed mode #資源定義爲可被管理狀態 meta Manage a meta attribute #管理meta 屬性 move Move a resource to another node #將資源移動到另外一個節點 operations Show active resource operations #顯示活動的資源的操做 param Manage a parameter of a resource #管理資源參數 promote Promote a master-slave resource refresh Refresh CIB from the LRM status reprobe Probe for resources not started by the CRM restart Restart resources #重啓一個資源 scores Display resource scores #查看資源 secret Manage sensitive parameters start Start resources #開啓一個資源 status Show status of resources #資源當前狀態 stop Stop resources #中止一個資源 trace Start RA unmanage Put a resource into unmanaged mode #資源定義爲不可被管理狀態 untrace Stop RA tracing utilization Manage a utilization attribute 六、介紹資源代理RA子命令 crm(live)# ra crm(live)ra# help classes List classes and providers #列出資源代理類型 info Show meta data for a RA #顯示資源代理的幫助信息 list List RA for a class (and provider) #列出資源代理類型中所擁有的資源代理 providers Show providers for a RA and a class validate Validate parameters for RA
七、介紹configure子命令 node define a cluster node #定義一個集羣節點 primitive define a resource #定義資源 monitor add monitor operation to a primitive #對一個資源添加監控選項(如超時時間,啓動失敗後的操做) group define a group #定義一個組類型(包含一個或多個資源,這些資源可經過「組」這個資源統一進行調度) clone define a clone #定義一個克隆類型(能夠在同一個集羣內的多個節點運行多份克隆) ms define a master-slave resource #定義一個主從類型(集羣內的節點只能有一個運行主資源,其它從的作備用) rsc_template define a resource template #定義一個資源模板 location a location preference #定義位置約束優先級(默認運行於那一個節點(若是位置約束的值相同,默認傾向性那一個高,就在那一個節點上運行)) colocation colocate resources #排列約束資源(多個資源在一塊兒的可能性) order order resources #順序約束,定義資源在同一個節點上啓動時的前後順序 rsc_ticket resources ticket dependency property set a cluster property #設置集羣屬性 rsc_defaults set resource defaults #設置資源默認屬性(粘性) fencing_topology node fencing order #隔離節點順序 role define role access rights #定義角色的訪問權限 user define user access rights #定義用用戶訪問權限 op_defaults set resource operations defaults #設置資源默認選項 schema set or display current CIB RNG schema show display CIB objects #顯示集羣信息庫對 edit edit CIB objects #編輯集羣信息庫對象(vim模式下編輯) filter filter CIB objects #過濾CIB對象 delete delete CIB objects #刪除CIB對象 default-timeouts set timeouts for operations to minimums from the meta-data rename rename a CIB object #重命名CIB對象 modgroup modify group #改變資源組 refresh refresh from CIB #從新讀取CIB信息 erase erase the CIB #清除CIB信息 ptest show cluster actions if changes were committed rsctest test resources as currently configured cib CIB shadow management cibstatus CIB status management and editing template edit and import a configuration from a template commit commit the changes to the CIB #將更改後的信息提交寫入CIB verify verify the CIB with crm_verify #CIB語法驗證 upgrade upgrade the CIB to version 1.0 save save the CIB to a file #將當前CIB導出到一個文件中(導出的文件存於切換crm 以前的目錄) load import the CIB from a file #從文件內容載入CIB
6、使用crmsh配置pacemaker
配置兩節點的corosync/pacemaker集羣,設置兩個全局屬性
stonith-enable=false
no-quorum-policy=ignore
crm(live)configure# property stonith-enabled=false crm(live)configure# property no-quorum-policy=ignore crm(live)configure# commit
配置web高可用集羣
crm(live)# cd configure crm(live)configure# primitive webip ocf:heartbeat:IPaddr params ip=192.168.0.25 nic=eth0 cidr_netmask=24 crm(live)configure# verify crm(live)configure# commit crm(live)configure# primitive webserver lsb:httpd crm(live)configure# verify crm(live)configure# commit 查看資源運行 [root@node2 ~]# crm status Last updated: Mon Nov 21 17:11:54 2016 Last change: Mon Nov 21 17:11:08 2016 by root via cibadmin on node1 Stack: classic openais (with plugin) Current DC: node2 (version 1.1.14-8.el6_8.2-70404b0) - partition with quorum 2 nodes and 2 resources configured, 2 expected votes Online: [ node1 node2 ] Full list of resources: webip (ocf::heartbeat:IPaddr): Started node1 webserver (lsb:httpd): Started node2 高可用集羣默認爲資源平均分配,所以咱們要經過組或者約束使資源在同一個節點 crm(live)configure# group webservice webip webserver crm(live)configure# verify crm(live)configure# commit 查看資源運行 [root@node2 ~]# crm status Last updated: Mon Nov 21 17:16:23 2016 Last change: Mon Nov 21 17:16:08 2016 by root via cibadmin on node1 Stack: classic openais (with plugin) Current DC: node2 (version 1.1.14-8.el6_8.2-70404b0) - partition with quorum 2 nodes and 2 resources configured, 2 expected votes Online: [ node1 node2 ] Full list of resources: Resource Group: webservice webip (ocf::heartbeat:IPaddr): Started node1 webserver (lsb:httpd): Started node1
測試:
經過排列約束定義資源 crm(live)configure# delete webservice crm(live)configure# commit crm(live)configure# colocation webserver_with_webip inf: webserver webip crm(live)configure# verify crm(live)configure# commit 能夠看到,ip與server又在一塊兒了 [root@node2 ~]# crm status Last updated: Mon Nov 21 17:35:02 2016 Last change: Mon Nov 21 17:34:43 2016 by root via cibadmin on node1 Stack: classic openais (with plugin) Current DC: node2 (version 1.1.14-8.el6_8.2-70404b0) - partition with quorum 2 nodes and 2 resources configured, 2 expected votes Online: [ node1 node2 ] Full list of resources: webip (ocf::heartbeat:IPaddr): Started node1 webserver (lsb:httpd): Started node1 經過順序約束定義資源啓動順序 crm(live)configure# order webip_before_webserver Mandatory: webip webserver crm(live)configure# verify crm(live)configure# commit crm(live)configure# show node node1 node node2 primitive webip IPaddr \ params ip=192.168.0.25 nic=eth0 cidr_netmask=24 primitive webserver lsb:httpd order webip_before_webserver Mandatory: webip webserver colocation webserver_with_webip inf: webserver webip property cib-bootstrap-options: \ dc-version=1.1.14-8.el6_8.2-70404b0 \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes=2 \ stonith-enabled=false \ no-quorum-policy=ignore 經過位置約束定義資源傾向性 crm(live)configure# location webip_on_node2 webip rule 50: #uname eq node2 crm(live)configure# show node node1 node node2 primitive webip IPaddr \ params ip=192.168.0.25 nic=eth0 cidr_netmask=24 primitive webserver lsb:httpd order webip_before_webserver Mandatory: webip webserver location webip_on_node2 webip \ rule 50: #uname eq node2 colocation webserver_with_webip inf: webserver webip property cib-bootstrap-options: \ dc-version=1.1.14-8.el6_8.2-70404b0 \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes=2 \ stonith-enabled=false \ no-quorum-policy=ignore crm(live)configure# verify crm(live)configure# commit 能夠看到資源都轉移到node2上了 [root@node2 ~]# crm status Last updated: Mon Nov 21 17:49:08 2016 Last change: Mon Nov 21 17:47:58 2016 by root via cibadmin on node1 Stack: classic openais (with plugin) Current DC: node2 (version 1.1.14-8.el6_8.2-70404b0) - partition with quorum 2 nodes and 2 resources configured, 2 expected votes Online: [ node1 node2 ] Full list of resources: webip (ocf::heartbeat:IPaddr): Started node2 webserver (lsb:httpd): Started node2
此外,能夠在全局配置中定義當前資源對當前節點的粘性
crm(live)configure# property default-resource-stickiness=50 crm(live)configure# verify crm(live)configure# commit 注意:資源粘性默認大於資源的傾向性
7、配置集羣對資源的監控
由於corosync+pacemaker集羣默認對節點高可用,可是對於節點上資源的運行狀態沒法監控,所以,咱們要配置集羣對於資源的監控,在資源因意外狀況下,沒法提供服務時,對資源提供高可用。
crm(live)configure# primitive webip ocf:heartbeat:IPaddr params ip=192.168.0.25 nic=eth0 cidr_netmask=24 op monitor interval=10s timeout=20s crm(live)configure# verify crm(live)configure# commit 注意:timeout時間不得少於20s,不然會提示 crm(live)configure# verify WARNING: webip: specified timeout 10s for monitor is smaller than the advised 20s crm(live)configure# primitive webserver lsb:httpd op monitor interval=10s timeout=20s crm(live)configure# verify crm(live)configure# commit crm(live)configure# group webservice webip webserver crm(live)configure# verify crm(live)configure# commit 此時,資源運行於node1 [root@node2 ~]# crm status Last updated: Mon Nov 21 18:09:43 2016 Last change: Mon Nov 21 18:09:23 2016 by root via cibadmin on node1 Stack: classic openais (with plugin) Current DC: node2 (version 1.1.14-8.el6_8.2-70404b0) - partition with quorum 2 nodes and 2 resources configured, 2 expected votes Online: [ node1 node2 ] Full list of resources: Resource Group: webservice webip (ocf::heartbeat:IPaddr): Started node1 webserver (lsb:httpd): Started node1 此時,在node1上手動結束httpd,測試監控 [root@node1 ~]# killall httpd [root@node1 ~]# ps aux | grep httpd root 5567 0.0 0.1 103304 888 pts/1 S+ 18:13 0:00 grep httpd [root@node1 ~]# ps aux | grep httpd root 5637 0.0 0.7 175304 3760 ? Ss 18:13 0:00 /usr/sbin/httpd apache 5639 0.0 0.4 175304 2432 ? S 18:13 0:00 /usr/sbin/httpd apache 5640 0.0 0.4 175304 2432 ? S 18:13 0:00 /usr/sbin/httpd apache 5641 0.0 0.5 175304 2448 ? S 18:13 0:00 /usr/sbin/httpd apache 5642 0.0 0.4 175304 2432 ? S 18:13 0:00 /usr/sbin/httpd apache 5643 0.0 0.4 175304 2432 ? S 18:13 0:00 /usr/sbin/httpd apache 5644 0.0 0.4 175304 2432 ? S 18:13 0:00 /usr/sbin/httpd apache 5645 0.0 0.4 175304 2432 ? S 18:13 0:00 /usr/sbin/httpd apache 5646 0.0 0.4 175304 2432 ? S 18:13 0:00 /usr/sbin/httpd root 5654 0.0 0.1 103304 884 pts/1 S+ 18:13 0:00 grep httpd 能夠看到,httpd又被啓動了
本文隨馬哥培訓班視頻寫出,僅做往後回憶複習之用,文中簡介內容有用到google翻譯,英語很差,見諒,見諒!!!