corosync-2.4.0-4.el6.x86_64.rpm The Corosync Cluster Engine and Application Programming Interfaces。
corosynclib-2.4.0-4.el6.x86_64.rpm The Corosync Cluster Engine Libraries
crmsh-2.2.0-7.1.x86_64.rpm
crmsh-scripts-2.2.0-7.1.x86_64.rpm
dlm-4.0.6-1.el6.x86_64.rpm
dlm-lib-4.0.6-1.el6.x86_64.rpm
libqb-1.0-1.el6.x86_64.rpm
perl-TimeDate-1.16-13.el6.noarch.rpm
python-dateutil-1.4.1-7.el6.noarch.rpm
python-parallax-1.0.1-28.1.noarch.rpm
resource-agents-3.9.5-46.el6.x86_64.rpm
pacemaker-1.1.15-11.x86_64.rpm
pacemaker-cli-1.1.15-11.x86_64.rpm
pacemaker-cluster-libs-1.1.15-11.x86_64.rpm
pacemaker-cts-1.1.15-11.x86_64.rpm
pacemaker-libs-1.1.15-11.x86_64.rpm
pacemaker-remote-1.1.15-11.x86_64.rpm
python-lxml-2.2.3-1.1.el6.x86_64.rpm
python-six-1.9.0-2.el6.noarch.rpmhtml
HA Cluster:node
Messaging and Infrastructure Layer|Heartbeat Layer 集羣信息事務層python
Membership Layer 集羣成員關係層linux
CCM 投票系統git
Resource Allocation Layer 資源分配層github
CRM,算法
DC:LRM,PE,TE,CIBshell
Other:LRM,CIB數據庫
Resource Layer 資源代理express
RA
Distributed Lock Manager
The kernel dlm requires a user daemon to control membership.
dlm位於內核,集羣架構(corosync和group management)在用戶空間。位於內核的dlm須要調整或恢復必定的集羣事件。dlm負責接收這些事件和,若是須要從新配置內核dlm。
dlm_controld 經過sysfs 和 configfs 文件控制配置dlm。sysfs 和configfs 文件做爲 dlm-internal的接口。
cman 初始化腳本一般啓動 dlm_controld 守護進程。
DLM啓動前必須已經啓動syslog、network、corosync
/etc/init.d/dlm中的啓動參數DLM_CONTROLD_OPTS是從/etc/sysconfig/dlm中讀取,包含的是dlm_controld的啓動參數。dlm_members存放於/sys/kernel/config/dlm/cluster/spaces/下,每一次打開dlm時老是先到此讀取,若是不存在則本身建立;
開啓dlm_controld時,須要讀取集羣的配置,集羣的名字、集羣的通訊模式等消息,從集羣通訊的ring_id中得到通訊成員的信息node_id和通訊IP等消息(rrp模式記錄兩個通訊IP)。
主機Join時,首先檢查是否在memb中,若是在則檢查主機是否有fence,和是否被fence(fence操做時必須達到with quorum,不然不能繼續往下執行),若是被fence的話須要重啓以後才能從新Join;
開啓dlm_controld時,須要讀取集羣的配置,集羣的名字、集羣的通訊模式等消息,從集羣通訊的ring_id中得到通訊成員的信息node_id和通訊IP等消息(rrp模式記錄兩個通訊IP)。
/etc/rc.d/init.d/dlm_controld 根據集羣事件配置dlm的守護進程。
服務啓動時,初始化corosync。
1. /usr/sbin/dlm_controld
命令行選項重載在 cluster.conf 中的設置。/etc/cluster/cluster.conf, 不直接被讀寫,其餘集羣部件加載內容到內存,這些值經過libccs庫被訪問。
dlm 和 dlm_controld 的配置項被加載到 cluster.conf 的<dlm />部分,頂部爲 <cluster> 部分。.
服務啓動時,會初始化 corosync配置。
2. /usr/sbin/dlm_stonith
3. /usr/sbin/dlm_tool
[root@vClass-QIVXM init.d]# dlm_tool -h Usage: dlm_tool [command] [options] [name] Commands: ls, status, dump, dump_config, fence_ack log_plock, plocks join, leave, lockdebug Options: -n Show all node information in ls -e 0|1 Exclusive create off/on in join, default 0 -f 0|1 FS (filesystem) flag off/on in join, default 0 -m <mode> Permission mode for lockspace device (octal), default 0600 -s Summary following lockdebug output (experimental) -v Verbose lockdebug output -w Wide lockdebug output -h Print help, then exit -V Print program version information, then exit
4. 相關命令
/sbin/restorecon 用來恢復SELinux文件屬性即恢復文件的安全上下文
policycoreutils-2.0.83-19.1.el6.x86_64.rpm SELinux policy core utilities
/etc/sysconfig/dlm
/var/log/messages
dlm_controld[108162]: 163350 corosync cfg init error
Pacemaker,即Cluster Resource Manager(CRM),管理整個HA,客戶端經過pacemaker管理監控整個集羣。
Pacemaker工做在資源分配層,提供資源管理器的功能,並以crmsh這個資源配置的命令接口來配置資源。
CRM支持ocf和lsb兩種資源類型:
ocf格式的啓動腳本在/usr/lib/ocf/resource.d/下面。
lsb的腳本通常在/etc/rc.d/init.d/下面。
stonithd: 心跳系統
lrmd: 本地資源管理守護進程,提供了一個通用的接口支持的資源類型。直接調用資源代理(腳本)。
CIB: 集羣信息庫。包含全部集羣選項,節點,資源,他們彼此之間的關係和現狀的定義。同步更新到全部集羣節點。
CRMD: 集羣資源管理守護進行。主要是消息代理的 PEngine和LRM,還選舉一個領導者(DC)統籌活動(包括啓動/中止資源)的集羣
pengine: 政策引擎。根據當前狀態和配置集羣計算的下一個狀態。產生一個過分圖,包含行動和依賴關係的列表
CCM: 共識集羣成員,心跳成員層。
attrd
CIB使用XML表示集羣中全部資源的配置和當前狀態。CIB的內容會被自動的在整個集羣中同步,使用PEngine計算集羣的理想狀態,生成指令列表,而後輸送到DC(指定協調員)。Pacemaker集羣中全部節點選舉的DC節點做爲主決策節點。若是當選DC節點宕機,它會在全部的節點上,迅速創建一個新的DC。DC將PEngine生成的策略,傳遞給其餘節點上的LRMD(本地資源管理守護進程)或CRMD經過集羣消息傳遞基礎結構。當集羣中有節點宕機,PEngine從新計算理想策略。在某些狀況下,可能有必要關閉節點,以保護共享數據或完整的資源回收。爲此,Pacemaker配備了stonithd設備。 stonith能夠將其餘節點「爆頭」,一般是實現與電源開關。Pacemaker會將STONITH設備,配置爲資源保存在CIB中,使他們能夠更容易地檢測資源失敗或宕機。
默認的表決規則建議集羣中的節點個數爲奇數且不低於3。當集羣只有2個節點,其中1個節點崩壞,因爲不符合默認的表決規則,集羣資源不發生轉移,集羣總體仍不可用。no-quorum-policy="ignore"能夠解決此雙節點的問題,但不要用於生產環境。換句話說,生產環境仍是至少要3節點。
pacemaker根據信息層傳遞的健康信息來決定節點服務的啓動或者中止
資源約束表示了資源間的相互關係:
定義約束時,還須要指定分數。各類分數是集羣工做方式的重要組成部分。從遷移資源到決定在已降級集羣中中止哪些資源的整個過程是經過以某種方式修改分數來實現的。分數按每一個資源來計算,資源分數爲負的任何節點都沒法運行該資源。分數較高的約束先應用,分數較低的約束後應用。經過使用不一樣的分數爲既定資源建立更多位置約束,能夠指定資源要故障轉移至的目標節點的順序
/etc/rc.d/init.d/pacemaker
/usr/sbin/cibadmin 直接訪問集羣配置
/usr/sbin/crm_diff
/usr/sbin/crm_error
/usr/sbin/crm_failcount 管理記錄每一個資源的故障計數的計數器。
/usr/sbin/crm_mon 顯示集羣狀態概要
/usr/sbin/crm_report
/usr/sbin/crm_resource
/usr/sbin/crm_shadow
/usr/sbin/crm_simulate
/usr/sbin/crm_standby
/usr/sbin/crm_ticket
/usr/sbin/crm_verify
/usr/sbin/crmadmin
/usr/sbin/iso8601
/usr/sbin/attrd_updater
/usr/sbin/crm_attribute 容許查詢、修改和刪除節點屬性和羣集選項
/usr/sbin/crm_master
/usr/sbin/crm_node
/usr/sbin/fence_legacy
/usr/sbin/fence_pcmk
/usr/sbin/pacemakerd
/usr/sbin/stonith_admin
cibadmin
Provides direct access to the cluster configuration.
crm_mon
實時監測顯示集羣節點狀態。 須要pacemaker服務正常,否者一直等待。
Stack: corosync
Current DC: vClass-CPLjR (version 1.1.15-11-e174ec8) - partition with quorum
Last updated: Mon Jul 24 16:52:50 2017 Last change: Mon Jul 24 16:48:53 2017 by hacluster via crmd on vClass-2lgAr
2 nodes and 0 resources configured
Online: [ vClass-2lgAr vClass-CPLjR ]
No active resources
若是狀態很差,會出現「腦裂」現象。即在全部節點上分別運行crm_mon,看到的Current DC不是統一的,而是各自自己。出現此問題其中一種可能的緣由是開啓了防火牆。
crm_failcount 管理記錄每一個資源的故障計數的計數器。
可查詢指定節點上每一個資源的故障計數。此工具還可用於重設置故障計數,並容許資源在它屢次失敗的節點上再次運行。
當資源在當前節點上趨向失敗時強制將該資源故障轉移到其餘節點。資源攜帶了一個 resource-stickiness 屬性以肯定它但願在某個節點上運行的自願程度。它還具備 migration-threshold 屬性,可用於肯定資源應故障轉移到其餘節點的閾值。
可將 failcount 屬性添加到資源,它的值將根據資源監視到的故障而遞增。將 failcount的值與 migration-threshold 的值相乘,可肯定該資源的故障轉移分數。若是此數字超過該資源的自選設置,則該資源將被移到其餘節點而且不會在原始節點上再次運行,直到重設置故障計數。
crm_failcount - A convenience wrapper for crm_attribute Set, update or remove the failcount for the specified resource on the named node Usage: crm_failcount -r resource_name command [options] Options: --help This text --version Version information -V, --verbose Increase debug output -q, --quiet Print only the value on stdout -r, --resource-id=value The resource to update. Commands: -G, --query Query the current value of the attribute/option -v, --update=value Update the value of the attribute/option -D, --delete Delete the attribute/option Additional Options: -N, --node=value Set an attribute for the named node (instead of the current one). -l, --lifetime=value Until when should the setting take affect. Valid values: reboot, forever -i, --id=value (Advanced) The ID used to identify the attribute
[root@vClass-CPLjR ~]# crm_failcount -r myvip scope=status name=fail-count-myvip value=0
[root@vClass-CPLjR ~]# crm_failcount -r myvip -G -Q
0
重設置節點 node1 上資源 myrsc 的故障計數: #crm_failcount -D -U node1 -r my_rsc 查詢節點 node1 上資源 myrsc 的當前故障計數: #crm_failcount -G -U node1 -r my_rsc
crm_attribute 容許查詢、修改和刪除節點屬性和羣集選項
在 CIB 的主機 myhost 的 nodes 部分中查詢 location 屬性的值: crm_attribute -G -t nodes -U myhost -n location 在 CIB 的 crm_config 部分中查詢 cluster-delay 屬性的值: crm_attribute -G -t crm_config -n cluster-delay 在 CIB 的 crm_config 部分中查詢 cluster-delay 屬性的值。只打印值: crm_attribute -G -Q -t crm_config -n cluster-delay 從 CIB 的 nodes 部分刪除主機 myhost 的 location 屬性: crm_attribute -D -t nodes -U myhost -n location 將值爲 office 的名爲 location 的新屬性添加到 CIB 中 nodes 部分的set子部分(設置將應用到主機 myhost): crm_attribute -t nodes -U myhost -s set -n location -v office 更改 myhost 主機的 nodes 部分中的 location 屬性: crm_attribute -t nodes -U myhost -n location -v backoffice
/etc/sysconfig/pacemaker
做爲通訊層並提供關係管理服務. 提供集羣的信息層(messaging layer)的功能,傳遞心跳信息和集羣事務信息,包括: cfg、cmap、CPG(closed process group)、quorum算法、sam、totem協議、Extended Virtual Synchrony 算法。
/etc/rc.d/init.d/corosync 命令: corosync 端口:5404(udp) 5405(udp) 配置: /etc/sysconfig/corosync /etc/corosync/corosync.conf 日誌: /var/log/messages /var/log/cluster/cluster.log
/etc/rc.d/init.d/corosync-notifyd Corosync Dbus and snmp notifier 命令: corosync-notifyd 端口: 配置: /etc/sysconfig/corosync-notifyd 日誌
/usr/bin/corosync-blackbox
/usr/bin/corosync-xmlproc
/usr/bin/cpg_test_agent
/usr/bin/sam_test_agent
/usr/bin/votequorum_test_agent
/usr/sbin/corosync
/usr/sbin/corosync-cfgtool
/usr/sbin/corosync-cmapctl
/usr/sbin/corosync-cpgtool
/usr/sbin/corosync-keygen 爲corosync生成authkey的命令,此命令是根據內核的熵池來生成認證文件的,若是熵池的隨機性不足,則會運行此命令後一直卡着,此時用戶只有不斷的敲擊鍵盤使產生足夠的隨機數後才能生成authkdy文件
/usr/sbin/corosync-notifyd
/usr/sbin/corosync-quorumtool
corosync-cmapctl 查看corosync的服務狀態
[root@vClass-2lgAr ~]# corosync-cmapctl -h usage: corosync-cmapctl [-b] [-dghsTtp] [params...] -b show binary values Set key: corosync-cmapctl -s key_name type value where type is one of ([i|u][8|16|32|64] | flt | dbl | str | bin) for bin, value is file name (or - for stdin) Load settings from a file: corosync-cmapctl -p filename the format of the file is: [^[^]]<key_name>[ <type> <value>] Keys prefixed with single caret ('^') are deleted (see -d). Keys (actually prefixes) prefixed with double caret ('^^') are deleted by prefix (see -D). <type> and <value> are optional (not checked) in above cases. Other keys are set (see -s) so both <type> and <value> are required. Delete key: corosync-cmapctl -d key_name... Delete multiple keys with prefix: corosync-cmapctl -D key_prefix... Get key: corosync-cmapctl [-b] -g key_name... Display all keys: corosync-cmapctl [-b] Display keys with prefix key_name: corosync-cmapctl [-b] key_name... Track changes on keys with key_name: corosync-cmapctl [-b] -t key_name Track changes on keys with key prefix: corosync-cmapctl [-b] -T key_prefix
[root@vClass-2lgAr ~]# corosync-cmapctl aisexec.group (str) = root aisexec.user (str) = root config.totemconfig_reload_in_progress (u8) = 0 internal_configuration.service.0.name (str) = corosync_cmap internal_configuration.service.0.ver (u32) = 0 internal_configuration.service.1.name (str) = corosync_cfg internal_configuration.service.1.ver (u32) = 0 internal_configuration.service.2.name (str) = corosync_cpg internal_configuration.service.2.ver (u32) = 0 internal_configuration.service.3.name (str) = corosync_quorum internal_configuration.service.3.ver (u32) = 0 internal_configuration.service.4.name (str) = corosync_pload internal_configuration.service.4.ver (u32) = 0 internal_configuration.service.5.name (str) = corosync_votequorum internal_configuration.service.5.ver (u32) = 0 logging.debug (str) = off logging.fileline (str) = off logging.logfile (str) = /var/log/cluster/corosync.log logging.logger_subsys.QUORUM.debug (str) = off logging.logger_subsys.QUORUM.subsys (str) = QUORUM logging.timestamp (str) = on logging.to_logfile (str) = yes logging.to_stderr (str) = no logging.to_syslog (str) = no quorum.expected_votes (u32) = 3 quorum.last_man_standing (u8) = 1 quorum.last_man_standing_window (u32) = 10000 quorum.provider (str) = corosync_votequorum quorum.wait_for_all (u8) = 1 runtime.blackbox.dump_flight_data (str) = no runtime.blackbox.dump_state (str) = no runtime.config.totem.consensus (u32) = 12000 runtime.config.totem.downcheck (u32) = 1000 runtime.config.totem.fail_recv_const (u32) = 2500 runtime.config.totem.heartbeat_failures_allowed (u32) = 0 runtime.config.totem.hold (u32) = 1894 runtime.config.totem.join (u32) = 50 runtime.config.totem.max_messages (u32) = 17 runtime.config.totem.max_network_delay (u32) = 50 runtime.config.totem.merge (u32) = 200 runtime.config.totem.miss_count_const (u32) = 5 runtime.config.totem.rrp_autorecovery_check_timeout (u32) = 1000 runtime.config.totem.rrp_problem_count_mcast_threshold (u32) = 100 runtime.config.totem.rrp_problem_count_threshold (u32) = 10 runtime.config.totem.rrp_problem_count_timeout (u32) = 2000 runtime.config.totem.rrp_token_expired_timeout (u32) = 2380 runtime.config.totem.send_join (u32) = 800 runtime.config.totem.seqno_unchanged_const (u32) = 30 runtime.config.totem.token (u32) = 10000 runtime.config.totem.token_retransmit (u32) = 2380 runtime.config.totem.token_retransmits_before_loss_const (u32) = 4 runtime.config.totem.window_size (u32) = 300 runtime.connections.active (u64) = 9 runtime.connections.attrd:121768:0x55fb1ec2ac70.client_pid (u32) = 121768 runtime.connections.attrd:121768:0x55fb1ec2ac70.dispatched (u64) = 4 runtime.connections.attrd:121768:0x55fb1ec2ac70.flow_control (u32) = 0 runtime.connections.attrd:121768:0x55fb1ec2ac70.flow_control_count (u64) = 0 runtime.connections.attrd:121768:0x55fb1ec2ac70.invalid_request (u64) = 0 runtime.connections.attrd:121768:0x55fb1ec2ac70.name (str) = attrd runtime.connections.attrd:121768:0x55fb1ec2ac70.overload (u64) = 0 runtime.connections.attrd:121768:0x55fb1ec2ac70.queue_size (u32) = 0 runtime.connections.attrd:121768:0x55fb1ec2ac70.recv_retries (u64) = 0 runtime.connections.attrd:121768:0x55fb1ec2ac70.requests (u64) = 4 runtime.connections.attrd:121768:0x55fb1ec2ac70.responses (u64) = 2 runtime.connections.attrd:121768:0x55fb1ec2ac70.send_retries (u64) = 0 runtime.connections.attrd:121768:0x55fb1ec2ac70.service_id (u32) = 2 runtime.connections.cib:121766:0x55fb1ee2fce0.client_pid (u32) = 121766 runtime.connections.cib:121766:0x55fb1ee2fce0.dispatched (u64) = 22 runtime.connections.cib:121766:0x55fb1ee2fce0.flow_control (u32) = 0 runtime.connections.cib:121766:0x55fb1ee2fce0.flow_control_count (u64) = 0 runtime.connections.cib:121766:0x55fb1ee2fce0.invalid_request (u64) = 0 runtime.connections.cib:121766:0x55fb1ee2fce0.name (str) = cib runtime.connections.cib:121766:0x55fb1ee2fce0.overload (u64) = 0 runtime.connections.cib:121766:0x55fb1ee2fce0.queue_size (u32) = 0 runtime.connections.cib:121766:0x55fb1ee2fce0.recv_retries (u64) = 0 runtime.connections.cib:121766:0x55fb1ee2fce0.requests (u64) = 7 runtime.connections.cib:121766:0x55fb1ee2fce0.responses (u64) = 2 runtime.connections.cib:121766:0x55fb1ee2fce0.send_retries (u64) = 0 runtime.connections.cib:121766:0x55fb1ee2fce0.service_id (u32) = 2 runtime.connections.closed (u64) = 37 runtime.connections.corosync-cmapct:53176:0x55fb1ee3ff60.client_pid (u32) = 53176 runtime.connections.corosync-cmapct:53176:0x55fb1ee3ff60.dispatched (u64) = 0 runtime.connections.corosync-cmapct:53176:0x55fb1ee3ff60.flow_control (u32) = 0 runtime.connections.corosync-cmapct:53176:0x55fb1ee3ff60.flow_control_count (u64) = 0 runtime.connections.corosync-cmapct:53176:0x55fb1ee3ff60.invalid_request (u64) = 0 runtime.connections.corosync-cmapct:53176:0x55fb1ee3ff60.name (str) = corosync-cmapct runtime.connections.corosync-cmapct:53176:0x55fb1ee3ff60.overload (u64) = 0 runtime.connections.corosync-cmapct:53176:0x55fb1ee3ff60.queue_size (u32) = 0 runtime.connections.corosync-cmapct:53176:0x55fb1ee3ff60.recv_retries (u64) = 0 runtime.connections.corosync-cmapct:53176:0x55fb1ee3ff60.requests (u64) = 0 runtime.connections.corosync-cmapct:53176:0x55fb1ee3ff60.responses (u64) = 0 runtime.connections.corosync-cmapct:53176:0x55fb1ee3ff60.send_retries (u64) = 0 runtime.connections.corosync-cmapct:53176:0x55fb1ee3ff60.service_id (u32) = 0 runtime.connections.crmd:121769:0x55fb1ee35410.client_pid (u32) = 121769 runtime.connections.crmd:121769:0x55fb1ee35410.dispatched (u64) = 13 runtime.connections.crmd:121769:0x55fb1ee35410.flow_control (u32) = 0 runtime.connections.crmd:121769:0x55fb1ee35410.flow_control_count (u64) = 0 runtime.connections.crmd:121769:0x55fb1ee35410.invalid_request (u64) = 0 runtime.connections.crmd:121769:0x55fb1ee35410.name (str) = crmd runtime.connections.crmd:121769:0x55fb1ee35410.overload (u64) = 0 runtime.connections.crmd:121769:0x55fb1ee35410.queue_size (u32) = 0 runtime.connections.crmd:121769:0x55fb1ee35410.recv_retries (u64) = 0 runtime.connections.crmd:121769:0x55fb1ee35410.requests (u64) = 7 runtime.connections.crmd:121769:0x55fb1ee35410.responses (u64) = 2 runtime.connections.crmd:121769:0x55fb1ee35410.send_retries (u64) = 0 runtime.connections.crmd:121769:0x55fb1ee35410.service_id (u32) = 2 runtime.connections.crmd:121769:0x55fb1ee35ab0.client_pid (u32) = 121769 runtime.connections.crmd:121769:0x55fb1ee35ab0.dispatched (u64) = 1 runtime.connections.crmd:121769:0x55fb1ee35ab0.flow_control (u32) = 0 runtime.connections.crmd:121769:0x55fb1ee35ab0.flow_control_count (u64) = 0 runtime.connections.crmd:121769:0x55fb1ee35ab0.invalid_request (u64) = 0 runtime.connections.crmd:121769:0x55fb1ee35ab0.name (str) = crmd runtime.connections.crmd:121769:0x55fb1ee35ab0.overload (u64) = 0 runtime.connections.crmd:121769:0x55fb1ee35ab0.queue_size (u32) = 0 runtime.connections.crmd:121769:0x55fb1ee35ab0.recv_retries (u64) = 0 runtime.connections.crmd:121769:0x55fb1ee35ab0.requests (u64) = 3 runtime.connections.crmd:121769:0x55fb1ee35ab0.responses (u64) = 3 runtime.connections.crmd:121769:0x55fb1ee35ab0.send_retries (u64) = 0 runtime.connections.crmd:121769:0x55fb1ee35ab0.service_id (u32) = 3 runtime.connections.pacemakerd:121764:0x55fb1ec27530.client_pid (u32) = 121764 runtime.connections.pacemakerd:121764:0x55fb1ec27530.dispatched (u64) = 8 runtime.connections.pacemakerd:121764:0x55fb1ec27530.flow_control (u32) = 0 runtime.connections.pacemakerd:121764:0x55fb1ec27530.flow_control_count (u64) = 0 runtime.connections.pacemakerd:121764:0x55fb1ec27530.invalid_request (u64) = 0 runtime.connections.pacemakerd:121764:0x55fb1ec27530.name (str) = pacemakerd runtime.connections.pacemakerd:121764:0x55fb1ec27530.overload (u64) = 0 runtime.connections.pacemakerd:121764:0x55fb1ec27530.queue_size (u32) = 0 runtime.connections.pacemakerd:121764:0x55fb1ec27530.recv_retries (u64) = 0 runtime.connections.pacemakerd:121764:0x55fb1ec27530.requests (u64) = 8 runtime.connections.pacemakerd:121764:0x55fb1ec27530.responses (u64) = 2 runtime.connections.pacemakerd:121764:0x55fb1ec27530.send_retries (u64) = 0 runtime.connections.pacemakerd:121764:0x55fb1ec27530.service_id (u32) = 2 runtime.connections.pacemakerd:121764:0x55fb1ec2b7d0.client_pid (u32) = 121764 runtime.connections.pacemakerd:121764:0x55fb1ec2b7d0.dispatched (u64) = 1 runtime.connections.pacemakerd:121764:0x55fb1ec2b7d0.flow_control (u32) = 0 runtime.connections.pacemakerd:121764:0x55fb1ec2b7d0.flow_control_count (u64) = 0 runtime.connections.pacemakerd:121764:0x55fb1ec2b7d0.invalid_request (u64) = 0 runtime.connections.pacemakerd:121764:0x55fb1ec2b7d0.name (str) = pacemakerd runtime.connections.pacemakerd:121764:0x55fb1ec2b7d0.overload (u64) = 0 runtime.connections.pacemakerd:121764:0x55fb1ec2b7d0.queue_size (u32) = 0 runtime.connections.pacemakerd:121764:0x55fb1ec2b7d0.recv_retries (u64) = 0 runtime.connections.pacemakerd:121764:0x55fb1ec2b7d0.requests (u64) = 3 runtime.connections.pacemakerd:121764:0x55fb1ec2b7d0.responses (u64) = 3 runtime.connections.pacemakerd:121764:0x55fb1ec2b7d0.send_retries (u64) = 0 runtime.connections.pacemakerd:121764:0x55fb1ec2b7d0.service_id (u32) = 3 runtime.connections.pacemakerd:121764:0x55fb1ec2c070.client_pid (u32) = 121764 runtime.connections.pacemakerd:121764:0x55fb1ec2c070.dispatched (u64) = 0 runtime.connections.pacemakerd:121764:0x55fb1ec2c070.flow_control (u32) = 0 runtime.connections.pacemakerd:121764:0x55fb1ec2c070.flow_control_count (u64) = 0 runtime.connections.pacemakerd:121764:0x55fb1ec2c070.invalid_request (u64) = 0 runtime.connections.pacemakerd:121764:0x55fb1ec2c070.name (str) = pacemakerd runtime.connections.pacemakerd:121764:0x55fb1ec2c070.overload (u64) = 0 runtime.connections.pacemakerd:121764:0x55fb1ec2c070.queue_size (u32) = 0 runtime.connections.pacemakerd:121764:0x55fb1ec2c070.recv_retries (u64) = 0 runtime.connections.pacemakerd:121764:0x55fb1ec2c070.requests (u64) = 1 runtime.connections.pacemakerd:121764:0x55fb1ec2c070.responses (u64) = 1 runtime.connections.pacemakerd:121764:0x55fb1ec2c070.send_retries (u64) = 0 runtime.connections.pacemakerd:121764:0x55fb1ec2c070.service_id (u32) = 1 runtime.connections.stonithd:121767:0x55fb1ec29000.client_pid (u32) = 121767 runtime.connections.stonithd:121767:0x55fb1ec29000.dispatched (u64) = 7 runtime.connections.stonithd:121767:0x55fb1ec29000.flow_control (u32) = 0 runtime.connections.stonithd:121767:0x55fb1ec29000.flow_control_count (u64) = 0 runtime.connections.stonithd:121767:0x55fb1ec29000.invalid_request (u64) = 0 runtime.connections.stonithd:121767:0x55fb1ec29000.name (str) = stonithd runtime.connections.stonithd:121767:0x55fb1ec29000.overload (u64) = 0 runtime.connections.stonithd:121767:0x55fb1ec29000.queue_size (u32) = 0 runtime.connections.stonithd:121767:0x55fb1ec29000.recv_retries (u64) = 0 runtime.connections.stonithd:121767:0x55fb1ec29000.requests (u64) = 6 runtime.connections.stonithd:121767:0x55fb1ec29000.responses (u64) = 2 runtime.connections.stonithd:121767:0x55fb1ec29000.send_retries (u64) = 0 runtime.connections.stonithd:121767:0x55fb1ec29000.service_id (u32) = 2 runtime.services.cfg.0.rx (u64) = 0 runtime.services.cfg.0.tx (u64) = 0 runtime.services.cfg.1.rx (u64) = 0 runtime.services.cfg.1.tx (u64) = 0 runtime.services.cfg.2.rx (u64) = 0 runtime.services.cfg.2.tx (u64) = 0 runtime.services.cfg.3.rx (u64) = 0 runtime.services.cfg.3.tx (u64) = 0 runtime.services.cfg.service_id (u16) = 1 runtime.services.cmap.0.rx (u64) = 3 runtime.services.cmap.0.tx (u64) = 2 runtime.services.cmap.service_id (u16) = 0 runtime.services.cpg.0.rx (u64) = 5 runtime.services.cpg.0.tx (u64) = 5 runtime.services.cpg.1.rx (u64) = 0 runtime.services.cpg.1.tx (u64) = 0 runtime.services.cpg.2.rx (u64) = 1 runtime.services.cpg.2.tx (u64) = 0 runtime.services.cpg.3.rx (u64) = 55 runtime.services.cpg.3.tx (u64) = 22 runtime.services.cpg.4.rx (u64) = 0 runtime.services.cpg.4.tx (u64) = 0 runtime.services.cpg.5.rx (u64) = 3 runtime.services.cpg.5.tx (u64) = 2 runtime.services.cpg.6.rx (u64) = 0 runtime.services.cpg.6.tx (u64) = 0 runtime.services.cpg.service_id (u16) = 2 runtime.services.pload.0.rx (u64) = 0 runtime.services.pload.0.tx (u64) = 0 runtime.services.pload.1.rx (u64) = 0 runtime.services.pload.1.tx (u64) = 0 runtime.services.pload.service_id (u16) = 4 runtime.services.quorum.service_id (u16) = 3 runtime.services.votequorum.0.rx (u64) = 7 runtime.services.votequorum.0.tx (u64) = 4 runtime.services.votequorum.1.rx (u64) = 0 runtime.services.votequorum.1.tx (u64) = 0 runtime.services.votequorum.2.rx (u64) = 0 runtime.services.votequorum.2.tx (u64) = 0 runtime.services.votequorum.3.rx (u64) = 0 runtime.services.votequorum.3.tx (u64) = 0 runtime.services.votequorum.service_id (u16) = 5 runtime.totem.pg.mrp.rrp.0.faulty (u8) = 0 runtime.totem.pg.mrp.srp.avg_backlog_calc (u32) = 0 runtime.totem.pg.mrp.srp.avg_token_workload (u32) = 0 runtime.totem.pg.mrp.srp.commit_entered (u64) = 2 runtime.totem.pg.mrp.srp.commit_token_lost (u64) = 0 runtime.totem.pg.mrp.srp.consensus_timeouts (u64) = 0 runtime.totem.pg.mrp.srp.continuous_gather (u32) = 0 runtime.totem.pg.mrp.srp.continuous_sendmsg_failures (u32) = 0 runtime.totem.pg.mrp.srp.firewall_enabled_or_nic_failure (u8) = 0 runtime.totem.pg.mrp.srp.gather_entered (u64) = 2 runtime.totem.pg.mrp.srp.gather_token_lost (u64) = 0 runtime.totem.pg.mrp.srp.mcast_retx (u64) = 0 runtime.totem.pg.mrp.srp.mcast_rx (u64) = 67 runtime.totem.pg.mrp.srp.mcast_tx (u64) = 32 runtime.totem.pg.mrp.srp.memb_commit_token_rx (u64) = 4 runtime.totem.pg.mrp.srp.memb_commit_token_tx (u64) = 4 runtime.totem.pg.mrp.srp.memb_join_rx (u64) = 6 runtime.totem.pg.mrp.srp.memb_join_tx (u64) = 3 runtime.totem.pg.mrp.srp.memb_merge_detect_rx (u64) = 813 runtime.totem.pg.mrp.srp.memb_merge_detect_tx (u64) = 0 runtime.totem.pg.mrp.srp.members.168900605.config_version (u64) = 2 runtime.totem.pg.mrp.srp.members.168900605.ip (str) = r(0) ip(10.17.55.253) runtime.totem.pg.mrp.srp.members.168900605.join_count (u32) = 1 runtime.totem.pg.mrp.srp.members.168900605.status (str) = joined runtime.totem.pg.mrp.srp.members.168900606.config_version (u64) = 2 runtime.totem.pg.mrp.srp.members.168900606.ip (str) = r(0) ip(10.17.55.254) runtime.totem.pg.mrp.srp.members.168900606.join_count (u32) = 1 runtime.totem.pg.mrp.srp.members.168900606.status (str) = joined runtime.totem.pg.mrp.srp.mtt_rx_token (u32) = 1895 runtime.totem.pg.mrp.srp.operational_entered (u64) = 2 runtime.totem.pg.mrp.srp.operational_token_lost (u64) = 0 runtime.totem.pg.mrp.srp.orf_token_rx (u64) = 1255 runtime.totem.pg.mrp.srp.orf_token_tx (u64) = 1 runtime.totem.pg.mrp.srp.recovery_entered (u64) = 2 runtime.totem.pg.mrp.srp.recovery_token_lost (u64) = 0 runtime.totem.pg.mrp.srp.rx_msg_dropped (u64) = 0 runtime.totem.pg.mrp.srp.token_hold_cancel_rx (u64) = 9 runtime.totem.pg.mrp.srp.token_hold_cancel_tx (u64) = 5 runtime.totem.pg.msg_queue_avail (u32) = 0 runtime.totem.pg.msg_reserved (u32) = 1 runtime.votequorum.ev_barrier (u32) = 3 runtime.votequorum.this_node_id (u32) = 168900606 runtime.votequorum.two_node (u8) = 0 runtime.votequorum.wait_for_all_status (u8) = 1 service.name (str) = pacemaker service.ver (str) = 0 totem.cluster_name (str) = fMOQ0nGciUIFxoRq totem.config_version (u64) = 2 totem.crypto_cipher (str) = none totem.crypto_hash (str) = none totem.interface.0.bindnetaddr (str) = 10.17.55.0 totem.interface.0.mcastaddr (str) = 239.255.1.1 totem.interface.0.mcastport (u16) = 5405 totem.interface.0.ttl (u8) = 1 totem.send_join (u32) = 800 totem.token (u32) = 10000 totem.version (u32) = 2 totem.window_size (u32) = 300 uidgid.gid.189 (u8) = 1
/etc/corosync/corosync.conf.example 主配置文件模板, corosync可執行文件的配置文件
/etc/corosync/corosync.conf.example.udpu
/etc/corosync/corosync.xml.example
/etc/corosync/uidgid.d
/etc/logrotate.d/corosync
/etc/sysconfig/corosync
/etc/sysconfig/corosync-notifyd
經過 主配置文件模板:/etc/corosync/corosync.conf.example 複製生成配置文件 /etc/corosync/corosync.conf, 修改以下:
# Please read the corosync.conf.5 manual page compatibility:whitetank # 表示兼容 whitetank版本,實際上是corosync0.8以前的版本,即 openais-0.80.z。 缺省是 whiteank。 totem { # top level 定義集羣環境下各corosync間通信機制,totem 協議。 7個配置項,一個必選項,5個可選項,1個是ip6使用的 version: 2 # corosync 配置文件的版本號,固定爲:2
# secauth:Enable mutual node authentication.If you choose to # enable this(on),then do remember to create ashared # secret with corosync-keygen. # secauth:off secauth: on # 表示基於authkey的方式來驗證各節點, 啓動加密, 缺省值是on。##安全認證,當使用aisexec時,會很是消耗CPU threads: 0 # 啓動的線程數,0表示不啓動線程機制,默認便可。 根據CPU個數和核心數肯定
token: 10000 # token的時效,單位: 毫秒, 缺省值1000
token_retransmits_before_loss_const: 10 # 缺省值: 4
vsftype: none # 缺省值: ykd。 虛擬同步過濾器類型。 支持:YKD dynamic linear voting
rrp_mode: active # 指定冗餘換的模式。支持:active,passive,none。 若是隻有一個interface,自動設置爲none。
# 網絡通信接口,若是定義多個,須要設置 rrp_mode
# interface: define at least one interface to communicate # over. If you define more than one interface stanza, you must # also set rrp_mode. interface { # 定義哪一個接口來傳遞心跳信息和集羣事務信息 # Rings must be consecutively numbered, starting at 0. ringnumber: 0 # 表示心跳信息發出後可以在網絡中轉幾圈,保持默認值便可。 必須是以0開始的數字。 ##冗餘環號,節點有多個網卡是可定義對應網卡在一個環內 # This is normally the *network* address of the # interface to bind to. This ensures that you can use # identical instances of this configuration file # across all your cluster nodes, without having to # modify this option. bindnetaddr: 192.168.0.0 # 綁定的網絡地址,##綁定心跳網段
# However, if you have multiple physical network # interfaces configured for the same subnet, then the # network address alone is not sufficient to identify # the interface Corosync should bind to. In that case, # configure the *host* address of the interface # instead: #bindnetaddr: 192.168.1.1
# When selecting a multicast address, consider RFC # 2365(which, among other things, specifies that # 239.255.x.x addresses are left to the discretion of # the network administrator). Do not reuse multicast # addresses across multiple Corosync clusters sharing # the same network. mcastaddr: 239.255.21.111 # 監聽的多播地址,不要使用默認,##心跳組播地址
# Corosync uses the port you specify here for UDP # messaging, and also the immediately preceding # port. Thus if you set this to 5405, Corosync sends # messages over UDP ports 5405 and 5404. mcastport: 5405 # corosync 間傳遞信息使用的端口,默認便可。 ##心跳組播使用端口
# Time-to-live for cluster communication packets. The # number of hops(routers) that this ring will allow # itself to pass. Note that multicast routing must be # specifically enabled on most network routers. ttl: 1 # 包的生存週期,保持默認1便可。 若是網絡是通過route的網絡,能夠設置大一些。範圍 1-255. 僅僅在 multicast 類型下有效。
# broadcast: yes # 廣播方式,不要使用mcastaddr參數。 若是設置了 broadcast 爲yes, mcastaddr不能設置。
# transport: udp # 控制傳輸的方法, 若是要徹底消除多播,指定udpu單播傳輸參數。這須要用nodelist指定成員列表。transport默認是udp,也能夠設置成updu或iba。
}
#interface {
# ringnumber: 1
# bindnetaddr: 10.0.42.0
# mcastaddr: 239.255.42.2
# mcastport: 5405
#} } logging { # top level, 定義日誌選項 # Log the source file and line where messages are being # generated. When in doubt, leave off. Potentially useful for # debugging. fileline: off ##指定要打印的行
# Log to standard error. When in doubt, set to no. Useful when # running in the foreground(when invoking corosync -f) to_stderr: no ##是否發送到標準錯誤輸出
# Log to a log file. When set to no, the log file option # must not be set. to_logfile: yes ##記錄到文件 logfile: /var/log/cluster/corosync.log
# Log to the system log daemon. When in doubt, set to yes. to_syslog: no # 關閉日誌發往 syslog ##記錄到syslog
# Log debug messages(very verbose). When in doubt, leave off. debug: off
# Log messages with time stamps. When in doubt, set to on # (unless you are only logging to syslog, where double # time stamps can be annoying). timestamp: on # 打印日誌時是否記錄時間戳,會消耗較多的cpu資源
logger_subsys { subsys: AMF debug: off }
}
event { # top level, 事件服務配置
}
amf {
mode: disabled
}
quorum {
provider: corosync_votequorum # 啓動了votequorum
expected_votes: 7 # 7表示,7個節點,quorum爲4。若是設置了nodelist參數,expected_votes無效
wait_for_all: 1 # 值爲1表示,當集羣啓動,集羣quorum被掛起,直到全部節點在線並加入集羣,這個參數是Corosync 2.0新增的。
last_man_standing: 1 # 爲1表示,啓用LMS特性。默認這個特性是關閉的,即值爲0。
# 這個參數開啓後,當集羣的處於表決邊緣(如expected_votes=7,而當前online nodes=4),處於表決邊緣狀態超過last_man_standing_window參數指定的時間,
# 則從新計算quorum,直到online nodes=2。若是想讓online nodes可以等於1,必須啓用auto_tie_breaker選項,生產環境不推薦。
last_man_standing_window: 10000 # 單位爲毫秒。在一個或多個主機從集羣中丟失後,從新計算quorum
}
# 新增長如下內容 service { ver: 0 # ver: 1表示corosync不自動啓動pacemaker。若是要corosync自動啓動pacemaker,設置ver爲0。 name: pacemaker # 表示以插件化方式啓用 pacemaker, ##定義corosync啓動時同時啓動pacemaker
# use_mgmtd: yes } aisexec { # 運行openaix時所使用的用戶及組,默認時也是採用root,能夠不定義 user:root group:root }
nodelist { # 每一個節點,必須至少有一個ring0_addr字段,其它可能的選項有ring{X}_addr和nodeid,{X}是ring的序號,ring{X}_addr指定節點IP,nodeid是須要同時使用IPv4和IPv6時才指定。
node {
ring0_addr: 192.168.42.1
ring1_addr: 10.0.42.1
nodeid: 1 # ip4 可選項,ip6必選項。 32 bits value bound with ring 0。 0值是保留的,不能使用
}
node {
ring0_addr: 192.168.42.2
ring1_addr: 10.0.42.2
nodeid: 2
}
}
一、token和token_retransmits_before_loss_const相乘的結果決定了集羣的故障轉移時間。token的單位是毫秒。若是某個節點超過$(token*token_retransmits_before_loss_const)未響應,則被認爲節點死亡。
二、若是啓用了secauth選項,則節點間的通信使用128位密鑰加密,密鑰存儲在/etc/corosync/authkey,可使用corosync-keygen生成。
三、Corosync配置須要冗餘網絡(用到不僅一個網絡接口),必須採用RRR模式,注意下述推薦的接口配置:
3.一、每一個接口的配置必須有一個惟一的ringnumber,且起始數值爲0。
3.二、bindnetaddr是你想要綁定的IP地址網段,
3.三、多組播地址mcastaddr不能在跨越集羣的邊界重用,即歷來沒有兩個獨立的集羣使用了相同的多播組地址。多播組的地址必須遵循RFC 2365, 「Administratively Scoped IP Multicast」
3.四、防火牆配置方面,Corosync僅須要UDP通訊,使用mcastport(接收)和mcastport - 1(發送)。
四、pacemaker服務能夠在corosync.conf中聲明,也能夠在/etc/corosync/service.d/pacemaker中聲明。
注意:若是是在Ubuntu 14.04下的Corosync(version 2),須要註釋服務stanza中啓動pacemaker的命令。此外,要須要注意Corosync和Pacemaker的啓動順序,須要手動指定:
# update-rc.d pacemaker start 20 2 3 4 5 . stop 00 0 1 6 .
/etc/corosync/uidgid.d/pacemaker必須增長:
uidgid {
uid: hacluster
gid: haclient
}
五、注意全部節點上的corosync.conf和authkey都要保持同步。
六、service節點中,ver: 1表示corosync不自動啓動pacemaker。若是要corosync自動啓動pacemaker,設置ver爲0。因爲centos7沒有/etc/rc.d/init.d/pacemaker腳本(pacemaker服務在centos7中能夠用systemctl命令設置),故個人配置中沒有這個小節。可手動建立/etc/rc.d/init.d/pacemaker,與下一篇關於haproxy的文章中建立/etc/rc.d/init.d/haproxy腳本的方法相同。
votequorum方式配置
votequorum庫是Corosync項目中的一部分。採用votequorum是爲了不腦裂發生,以及:
一、查詢quorum狀態;
二、得到quorum服務所知道的節點列表;
三、接收quorum狀態改變的通知;
四、改變votes的數量,並分配域一個節點(Change the number of votes assigned to a node)
五、Change the number of expected votes for a cluster to be quorate
六、Connect an additional quorum device to allow small clusters remain quorate during node outages
votequorum庫被建立於用來替換和取代qdisk(表決盤)。
/var/log/cluster/
/var/log/messages
Can't read file /etc/corosync/corosync.conf reason = (No such file or directory)
注意:
1.啓動時,必須有 corosync 配置文件 corosync.conf
2.
配置管理羣集 High Availability cluster command-line interface
The crm shell is a command-line interface for High-Availability
cluster management on GNU/Linux systems. It simplifies the
configuration, management and troubleshooting of Pacemaker-based
clusters, by providing a powerful and intuitive set of features.
資源粘性表示資源是否傾向於留在當前節點,若是爲正整數,表示傾向,負數則會離開,-inf表示正無窮,inf表示正無窮。
在雙節點集羣中,因爲票數是偶數,小心跳出現問題(腦裂)時,兩個節點都將達不到法定票數,默認quorum策略會關閉集羣服務,爲了不這種狀況,能夠增長票數爲奇數(如前文的增長ping節點),或者調整默認quorum策略爲【ignore】。
crm(live)configure# property no-quorum-policy=ignore
故障發生時,資源會遷移到正常節點上,但當故障節點恢復後,資源可能再次回到原來節點,這在有些狀況下並不是是最好的策略,由於資源的遷移是有停機時間的,特別是一些複雜的應用,如oracle數據庫,這個時間會更長。爲了不這種狀況,能夠根據須要,使用本文1.3.1介紹的資源粘性策略。
crm(live)configure# rsc_defaults resource-stickiness=100 ##設置資源粘性爲100
定義資源約束時,也能夠指定每一個約束的分數。分數表示指派給此資源約束的值。分數較高的約束先應用,分數較低的約束後應用。經過使用不一樣的分數爲既定資源建立更多位置約束,能夠指定資源要故障轉移至的目標節點的順序。
無
/etc/bash_completion.d/crm.sh
/usr/sbin/crm
Help overview for crmsh Available topics: Overview Help overview for crmsh Topics Available topics Description Program description CommandLine Command line options Introduction Introduction Interface User interface Completion Tab completion Shorthand Shorthand syntax Features Features Shadows Shadow CIB usage Checks Configuration semantic checks Templates Configuration templates Testing Resource testing Security Access Control Lists (ACL) Resourcesets Syntax: Resource sets AttributeListReferences Syntax: Attribute list references AttributeReferences Syntax: Attribute references RuleExpressions Syntax: Rule expressions Reference Command reference Available commands: cd Navigate the level structure help Show help (help topics for list of topics) ls List levels and commands quit Exit the interactive shell report Create cluster status report status Cluster status up Go back to previous level assist/ Configuration assistant template Create template for primitives weak-bond Create a weak bond between resources cib/ CIB shadow management # CIB管理模塊 cibstatus CIB status management and editing commit Copy a shadow CIB to the cluster delete Delete a shadow CIB diff Diff between the shadow CIB and the live CIB import Import a CIB or PE input file to a shadow list List all shadow CIBs new Create a new shadow CIB reset Copy live cib to a shadow CIB use Change working CIB cibstatus/ CIB status management and editing load Load the CIB status section node Change node status op Edit outcome of a resource operation origin Display origin of the CIB status section quorum Set the quorum run Run policy engine save Save the CIB status section show Show CIB status section simulate Simulate cluster transition ticket Manage tickets cluster/ Cluster setup and management add Add a new node to the cluster copy Copy file to other cluster nodes diff Diff file across cluster health Cluster health check init Initializes a new HA cluster remove Remove a node from the cluster run Execute an arbitrary command on all nodes start Start cluster services status Cluster status check stop Stop cluster services wait_for_startup Wait for cluster to start configure/ CIB configuration ##CRM配置,包含資源粘性、資源類型、資源約束等。邏輯上被分爲四部分:nodes, resources, constraints, and (cluster) properties and attributes.
acl_target Define target access rights ACL cib CIB shadow management cibstatus CIB status management and editing clone Define a clone resource colocation Colocate resources constraints commit Commit the changes to the CIB ##提交配置 default-timeouts Set timeouts for operations to minimums from the meta-data delete Delete CIB objects edit Edit CIB objects erase Erase the CIB fencing_topology Node fencing order define fencing order (stonith resource priorities)
filter Filter CIB objects graph Generate a directed graph group Define a group resources load Import the CIB from a file location A location preference contraints modgroup Modify group monitor Add monitor operation to a primitive resources ms Define a master-slave resource resources node Define a cluster node op_defaults Set resource operations defaults attribute order Order resources contraints primitive Define a resource 定義一個資源 resources property Set a cluster property attributes ptest Show cluster actions if changes were committed refresh Refresh from CIB rename Rename a CIB object role Define role access rights ACL access control lists rsc_defaults Set resource defaults attributes rsc_template Define a resource template 爲了簡化大量配置,定義一個模板,原生資源繼承在模板中的全部屬性。 rsc_ticket Resources ticket dependency rsctest Test resources as currently configured save Save the CIB to a file schema Set or display current CIB RNG schema set Set an attribute value show Display CIB objects 顯示CIB對象 show_property Show property value 顯示屬性值 tag Define resource tags template Edit and import a configuration from a template upgrade Upgrade the CIB user Define user access rights ACL access control lists validate-all Help for command validate-all validate_all Call agent validate-all for resource verify Verify the CIB with crm_verify ##檢查當前配置語法 xml Raw xml corosync/ Corosync management add-node Add a corosync node del-node Remove a corosync node diff Diffs the corosync configuration edit Edit the corosync configuration get Get a corosync configuration value log Show the corosync log file pull Pulls the corosync configuration push Push the corosync configuration reload Reload the corosync configuration set Set a corosync configuration value show Display the corosync configuration status Display the corosync status history/ Cluster history detail Set the level of detail shown diff Cluster states/transitions difference events Show events in log exclude Exclude log messages graph Generate a directed graph from the PE file info Cluster information summary latest Show latest news from the cluster limit Limit timeframe to be examined log Log content node Node events peinputs List or get PE input files refresh Refresh live report resource Resource events session Manage history sessions setnodes Set the list of cluster nodes show Show status or configuration of the PE input file source Set source to be examined transition Show transition wdiff Cluster states/transitions difference maintenance/ Maintenance mode commands # 保持模式控制, 控制整個集羣或一個資源代理是否有 maintenance 模式 action Invoke a resource action off Disable maintenance mode on Enable maintenance mode node/ Node management ##節點管理 attribute Manage attributes 設置、顯示、刪除節點的屬性值 clearstate Clear node state 使節點處於 maintenance 狀態 delete Delete node 刪除一個節點,此節點必定不能出去active狀態。(offline狀態不是 非active狀態),經過中止corosync服務,能夠置node爲非active狀態。 fence Fence node 關閉一個節點。依賴stonith資源,若是沒有stonith,此命令無效 maintenance Put node into maintenance mode 使節點處於maintenance狀態,上面運行的資源,將脫離crm的管理。經過readby能夠恢復。 online Set node online 節點上線,不加指定node參數,將操做本地。 ready Put node into ready mode 使節點從 maintenace 狀態變爲ready狀態。上面運行的資源,收crm的管理。 show Show node 顯示節點: 節點名、id、狀態 standby Put node into standby 節點下線,不加指定node參數,將操做本地。 status Show nodes' status as XML xml方式顯示節點信息 status-attr Manage status attributes utilization Manage utilization attributes options/ User preferences add-quotes Add quotes around parameters containing spaces check-frequency When to perform semantic check check-mode How to treat semantic errors colorscheme Set colors for output editor Set preferred editor program manage-children How to handle children resource attributes output Set output type pager Set preferred pager program reset Reset user preferences to factory defaults save Save the user preferences to the rc file set Set the value of a given option show Show current user preference skill-level Set skill level sort-elements Sort CIB elements user Set the cluster user wait Synchronous operation ra/ Resource Agents (RA) lists and documentation 資源代理 class:provider:agent classes List classes and providers 顯示資源代理的分類和提供者: lsb ocf service stonith info Show meta data for a RA 獲取資源代理的元數據及參數信息。 輸入info 按下 tab鍵,顯示系統全部可能的RA list List RA for a class (and provider) 根據class(和provider)列表顯示系統的RA, 輸入 list,按下 tab 鍵,顯示系統支持的class providers Show providers for a RA and a class 獲取資源代理的提供者,如: heartbeat pacemaker rabbitmq,位置: /usr/lib/ocf/resource.d validate Validate parameters for RA resource/ Resource management ##資源管理模塊 ban Ban a resource from a node cleanup Cleanup resource status constraints Show constraints affecting a resource demote Demote a master-slave resource failcount Manage failcounts ## 設置、顯示、刪除 資源的失敗數(在節點上)。 maintenance Enable/disable per-resource maintenance mode manage Put a resource into managed mode meta Manage a meta attribute migrate Migrate a resource to another node ## 遷移一個資源到另外一個節點 operations Show active resource operations param Manage a parameter of a resource promote Promote a master-slave resource refresh Refresh CIB from the LRM status reprobe Probe for resources not started by the CRM restart Restart resources scores Display resource scores secret Manage sensitive parameters start Start resources ## 啓動資源 status Show status of resources ## 顯示資源狀態: started、 stop Stop resources ## 中止資源 trace Start RA tracing unmanage Put a resource into unmanaged mode unmigrate Unmigrate a resource to another node untrace Stop RA tracing utilization Manage a utilization attribute script/ Cluster script management json JSON API for cluster scripts list List available scripts run Run the script show Describe the script verify Verify the script site/ GEO clustering site support ticket Manage site tickets template/ Edit and import a configuration from a template apply Process and apply the current configuration to the current CIB delete Delete a configuration edit Edit a configuration list List configurations/templates load Load a configuration new Create a new configuration from templates show Show the processed configuration
primitive 定義原始資源, 原始資源是最基本的資源類型。
Usage: primitive <rsc> {[<class>:[<provider>:]]<type>|@<template>} [description=<description>] [[params] attr_list] [meta attr_list] [utilization attr_list] [operations id_spec] [op op_type [<attribute>=<value>...] ...] attr_list :: [$id=<id>] [<score>:] [rule...] <attr>=<val> [<attr>=<val>...]] | $id-ref=<id> id_spec :: $id=<id> | $id-ref=<id> op_type :: start | stop | monitor
元屬性 meta
元屬性是能夠爲資源添加的選項。它們告訴 CRM 如何處理特定資源。 能夠爲添加的每一個資源定義選項。羣集使用這些選項來決定資源的行爲方式,它們會告知 CRM 如何對待特定的資源。可以使用 crm_resource --meta 命令或 GUI 來設置資源選項,請參見。
原始資源選項:
選項 |
描述 |
---|---|
priority |
若是不容許全部的資源都處於活動狀態,羣集會中止優先級較低的資源以便保持較高優先級資源處於活動狀態。 |
target-role |
羣集應試圖將此資源保持在何種狀態?容許的值:Stopped 和 Started。 |
is-managed |
是否容許羣集啓動和中止資源?容許的值:true 和 false。 |
resource-stickiness |
資源留在所處位置的自願程度如何?默認爲 default- resource-stickiness的值。 |
migration-threshold |
節點上的此資源應發生多少故障後才能肯定該節點沒有資格主管此資源?默認值:none。 |
multiple-active |
若是發現資源在多個節點上活動,羣集該如何操做?容許的值:block(將資源標記爲未受管)、stop_only 和 stop_start。 |
failure-timeout |
在恢復爲如同未發生故障同樣正常工做(並容許資源返回它發生故障的節點)以前,須要等待幾秒鐘?默認值:never。 |
實例屬性
實例屬性是特定資源類的參數,用於肯定資源類的行爲方式及其控制的服務實例。有關更多信息,請參考部分 17.5, 實例屬性。
clone 定義一個克隆資源。 克隆是能夠在多個主機上處於活動狀態的資源。若是各個資源代理支持,則任何資源都可克隆。
crm(live)configure# help clone Define a clone The clone command creates a resource clone. It may contain a clone命令建立一個資源clone。它可能包含一個單獨的原生資源或一組資源。 single primitive resource or one group of resources. Usage: clone <name> <rsc> [description=<description>] [meta <attr_list>] [params <attr_list>] attr_list :: [$id=<id>] <attr>=<val> [<attr>=<val>...] | $id-ref=<id> Example: clone cl_fence apc_1 \ meta clone-node-max=1 globally-unique=false
group 組資源。 組包含一系列須要放置在一塊兒、按順序啓動和以反序中止的資源
ms 主資源。 主資源是一種特殊的克隆資源,主資源能夠具備多種模式。主資源必須只能包含一個組或一個常規資源。
show_property 顯示集羣屬性。不帶屬性時,顯示集羣全部屬性。
crm(live)configure# show_property stop-orphan-resources election-timeout dc-deadtime node-health-green placement-strategy node-action-limit symmetric-cluster stonith-timeout maintenance-mode enable-acl default-action-timeout batch-limit node-health-yellow pe-warn-series-max start-failure-is-fatal enable-startup-probes shutdown-escalation stop-orphan-actions stop-all-resources default-resource-stickiness no-quorum-policy cluster-recheck-interval dc-version cluster-infrastructure startup-fencing concurrent-fencing crmd-integration-timeout stonith-enabled stonith-watchdog-timeout pe-input-series-max crmd-finalization-timeout stonith-action have-watchdog pe-error-series-max migration-limit is-managed-default load-threshold node-health-red node-health-strategy remove-after-stop cluster-delay crmd-transition-delay crm(live)configure# show_property enable-acl false
pe-warn-series-max、pe-input-series-max、pe-error-series-max 表明日誌深度。
cluster-recheck-interval是節點從新檢查的頻率。
no-quorum-policy="ignore"能夠解決此雙節點的問題, 缺省是「stop」。在雙節點一下,須要忽略法定仲裁,集羣才正常。即設置屬性值爲ignore。
stonith-enabled:stonith是一種可以接受指令斷電的物理設備,測試環境無此設備,若是不關閉該選項,執行crm命令老是含其報錯信息。stonith翻譯爲爆頭,就是fence。
migration-threshold: 屬性,可用於肯定資源因故障轉移到其餘節點的閾值。
maintenance-mode:保持模式下,全部的資源處理 unmanaged狀態。在crm_mon 命令下顯示: *** Resource management is DISABLED ***。 資源的start、stop命令無效。經過資源的 maintenance命令能夠控制具體的資源。
定義HAProxy和VIP必須在同一節點上運行:
#crm configure colocation haproxy-with-vip INFINITY: haproxy myvip
定義先接管VIP以後才啓動HAProxy:
# crm configure order haproxy-after-vip mandatory: myvip haproxy
因爲須要將集羣資源綁定到VIP,須要修改各節點的內核參數:
# echo 'net.ipv4.ip_nonlocal_bind = 1'>>/etc/sysctl.conf # sysctl -p
init
crm(live)cluster# help init Initializes a new HA cluster Installs and configures a basic HA cluster on a set of nodes. Usage: init node1 node2 node3 init --dry-run node1 node2 node3
crm(live)cluster# init ddd INFO: Initialize a new cluster INFO: Nodes: ddd ERROR: [ddd]: Start: Exited with error code 255, Error output: command-line: line 0: Bad configuration option: ControlPersist ERROR: [ddd]: Clean: Exited with error code 255, Error output: command-line: line 0: Bad configuration option: ControlPersist ERROR: cluster.init: Failed to connect to one or more of these hosts via SSH: ddd
crm(live)configure# show node 168900606: vClass-2lgAr property cib-bootstrap-options: \ have-watchdog=false \ dc-version=1.1.15-11-e174ec8 \ cluster-infrastructure=corosync \ cluster-name=fMOQ0nGciUIFxoRq crm(live)configure# property stonith-enabled=true crm(live)configure# show node 168900606: vClass-2lgAr property cib-bootstrap-options: \ have-watchdog=false \ dc-version=1.1.15-11-e174ec8 \ cluster-infrastructure=corosync \ cluster-name=fMOQ0nGciUIFxoRq \ stonith-enabled=true crm(live)configure#
crm(live)configure# property stonith-action=reboot crm(live)configure# property stonith-timeout=120s crm(live)configure# property no-quorum-policy=stop crm(live)configure# rsc_defaults resource-stickiness=1024 crm(live)configure# property symmetric-cluster=false crm(live)configure# property crmd-transition-delay=5s crm(live)configure# property start-failure-is-fatal="FALSE" crm(live)configure# rsc_defaults migration-threshold=2 crm(live)configure# show node 168900606: vClass-2lgAr property cib-bootstrap-options: \ have-watchdog=false \ dc-version=1.1.15-11-e174ec8 \ cluster-infrastructure=corosync \ cluster-name=fMOQ0nGciUIFxoRq \ stonith-enabled=true \ stonith-action=reboot \ stonith-timeout=120s \ no-quorum-policy=stop \ symmetric-cluster=false \ crmd-transition-delay=5s \ start-failure-is-fatal=FALSE rsc_defaults rsc-options: \ resource-stickiness=1024 \ migration-threshold=2
info
crm(live)ra# help info Show meta data for a RA Show the meta-data of a resource agent type. This is where users can find information on how to use a resource agent. It is also possible to get information from some programs: pengine, crmd, cib, and stonithd. Just specify the program name instead of an RA. Usage: info [<class>:[<provider>:]]<type> info <type> <class> [<provider>] (obsolete) Example: info apache info ocf:pacemaker:Dummy info stonith:ipmilan info pengine crm(live)ra# info lsb:netconsole lsb:netconsole netconsole Operations' defaults (advisory minimum): start timeout=15 stop timeout=15 status timeout=15 restart timeout=15 force-reload timeout=15 monitor timeout=15 interval=15
failcount
Manage failcounts Show/edit/delete the failcount of a resource. Usage: failcount <rsc> set <node> <value> failcount <rsc> delete <node> failcount <rsc> show <node> Example: failcount fs_0 delete node2
ban / unban 阻止一個資源到一個節點上
crm(live)resource# help ban ## 當ban 的node爲資源正在運行的節點,將觸發遷移 Ban a resource from a node 建立一個 rsc_location constraint(約束), 經過 crm_resource --clear 能夠清除此約束。 經過資源的contraints命令,能夠查看資源的位置約束條件 Ban a resource from running on a certain node. If no node is given as argument, the resource is banned from the current location. See migrate for details on other arguments. Usage: ban <rsc> [<node>] [<lifetime>] [force]
crm(live)resource# help unban 通 unmigrade/unmove. 清除 migrate/ban/move 生成的 約束條件。 Unmigrate a resource to another node (Redirected from unban to unmigrate) Remove the constraint generated by the previous migrate command. Usage: unmigrate <rsc>
manage/unmange 使資源是否在管理狀態。
Put a resource into unmanaged mode 是資源不受管理。 能夠對資源進行 啓動(start)、中止(stop)。不支持: 遷移(migrate、move),所在節點中止後,資源處於stop狀態, Unmanage a resource using the is-managed attribute. If there are multiple meta attributes sets, the attribute is set in all of them. If the resource is a clone, all is-managed attributes are removed from the children resources. For details on group management see options manage-children. Usage: unmanage <rsc>
maintenance 是資源是否處於保持狀態。
crm(live)resource# help maintenace 處於保持狀態的資源,不受crm的管理,不支持對資源的 啓動(start)、中止(stop)、前置(migrate、move)。設置後,在資源中有一個屬性對應: maintenance=false/true Enable/disable per-resource maintenance mode Enables or disables the per-resource maintenance mode. When this mode is enabled, no monitor operations will be triggered for the resource. Usage: maintenance <resource> [on|off|true|false] Example: maintenance rsc1 maintenance rsc2 off
crm(live)resource# help contraints
Show constraints affecting a resource
Display the location and colocation constraints affecting the
resource.
Usage:
constraints <rsc>
migrate(move)/unmigrate (unmove)
crm(live)resource# help migrate ## 遷移一個資源到另外一個節點。 生一個 contraints (能夠帶時效的)。經過資源的contraints能夠查看。 Migrate a resource to another node Migrate a resource to a different node. If node is left out, the resource is migrated by creating a constraint which prevents it from running on the current node. Additionally, you may specify a lifetime for the constraint---once it expires, the location constraint will no longer be active. Usage: migrate <rsc> [<node>] [<lifetime>] [force]
crm(live)resource# help unmigrate ## 取消 migrate 產生的 location 約束
Unmigrate a resource to another node
Remove the constraint generated by the previous migrate command.
Usage:
unmigrate <rsc>
crm(live)resource# help scores Display resource scores Display the allocation scores for all resources. Usage: scores
crm(live)resource# help operations
Show active resource operations
Show active operations, optionally filtered by resource and node.
Usage:
operations [<rsc>] [<node>]
crm(live)resource# help meta ## 經過 ra 的 info 能夠查看資源的參數和支持操做 Manage a meta attribute Show/edit/delete a meta attribute of a resource. Currently, all meta attributes of a resource may be managed with other commands such as resource stop. Usage: meta <rsc> set <attr> <value> meta <rsc> delete <attr> meta <rsc> show <attr> Example: meta ip_0 set target-role stopped
param 設置、刪除、顯示資源的屬性
crm(live)resource# help param Manage a parameter of a resource Show/edit/delete a parameter of a resource. Usage: param <rsc> set <param> <value> param <rsc> delete <param> param <rsc> show <param> Example: param ip_0 show ip
/etc/crm/crm.conf
注:
crmsh 從2014年10月份的版本中2.2.0開始使用systemd服務,即 systemctl命令行管理 corosync、pacemaker服務。
PCS(Pacemaker/Corosync configuration system)命令
fence-agents
1、創建羣集:
建立集羣
啓動集羣
設置資源默認粘性(防止資源回切)
設置資源超時時間
二個節點時,忽略節點quorum功能
沒有 Fencing設備時,禁用STONITH 組件功能
在 stonith-enabled="false" 的狀況下,分布式鎖管理器 (DLM) 等資源以及依賴DLM 的全部服務(例如 cLVM二、GFS2 和 OCFS2)都將沒法啓動。
驗證羣集配置信息
2、創建羣集資源
一、查看可用資源
二、配置虛擬IP
3、調整羣集資源
一、配置資源約束
[shell]# pcs resource group add WebSrvs ClusterIP ## 配置資源組,組中資源會在同一節點運行
[shell]# pcs resource group remove WebSrvs ClusterIP ## 移除組中的指定資源
[shell]# pcs resource master WebDataClone WebData ## 配置具備多個狀態的資源,如 DRBD master/slave狀態
[shell]# pcs constraint colocation add WebServer ClusterIP INFINITY ## 配置資源捆綁關係
[shell]# pcs constraint colocation remove WebServer ## 移除資源捆綁關係約束中資源
[shell]# pcs constraint order ClusterIP then WebServer ## 配置資源啓動順序
[shell]# pcs constraint order remove ClusterIP ## 移除資源啓動順序約束中資源
[shell]# pcs constraint ## 查看資源約束關係, pcs constraint --full
二、配置資源位置
[shell]# pcs constraint location WebServer prefers node11 ## 指定資源默認某個節點,node=50 指定增長的 score
[shell]# pcs constraint location WebServer avoids node11 ## 指定資源避開某個節點,node=50 指定減小的 score
[shell]# pcs constraint location remove location-WebServer ## 移除資源節點位置約束中資源ID,可用pcs config獲取
[shell]# pcs constraint location WebServer prefers node11=INFINITY ## 手工移動資源節點,指定節點資源的 score of INFINITY
[shell]# crm_simulate -sL ## 驗證節點資源 score 值
三、修改資源配置
[shell]# pcs resource update WebFS ## 更新資源配置
[shell]# pcs resource delete WebFS ## 刪除指定資源
四、管理羣集資源
[shell]# pcs resource disable ClusterIP ## 禁用資源
[shell]# pcs resource enable ClusterIP ## 啓用資源
[shell]# pcs resource failcount show ClusterIP ## 顯示指定資源的錯誤計數
[shell]# pcs resource failcount reset ClusterIP ## 清除指定資源的錯誤計數
[shell]# pcs resource cleanup ClusterIP ## 清除指定資源的狀態與錯誤計數
包括:建立資源、配置約束、指定故障轉移節點和故障回覆節點、配置資源監視、啓動或刪除資源、配置資源組或克隆資源,以及手動遷移資源。
建立VIP資源:
corosync和pacemaker狀態無誤,就能建立VIP資源了。個人VIP是「10.0.0.10」:
# crm configure primitive myvip ocf:heartbeat:IPaddr2 params ip="10.0.0.10" cidr_netmask="24" op monitor interval="30s"
常見的HA開源方案:
heartbeat v1 + haresources
heartbeat v2 + crm
heartbeat v3 + cluster-glue + pacemaker
corosync + cluster-glue + pacemaker
cman + rgmanager
keepalived + script
根據 OCF 規範,有一些關於操做必須返回的退出代碼的嚴格定義。羣集會始終檢查返回代碼與預期結果是否相符。若是結果與預期值不匹配,則將操做視爲失敗,並將啓動恢復操做。有三種類型的故障恢復:
恢復類型 |
描述 |
羣集執行的操做 |
---|---|---|
軟 |
發生臨時錯誤。 |
重啓動資源或將它移到新位置。 |
硬 |
發生非臨時錯誤。該錯誤可能特定於當前節點。 |
將資源移到別處,避免在當前節點上重試該資源。 |
致命 |
發生全部羣集節點共有的非臨時錯誤。這意味着指定了錯誤的配置。 |
中止資源,避免在任何羣集節點上啓動該資源。 |
假定將某個操做視爲已失敗,下表歸納了不一樣的 OCF 返回代碼以及收到相應的錯誤代碼時羣集將啓動的恢復類型。
OCF 返回代碼 |
OCF 別名 |
描述 |
恢復類型 |
---|---|---|---|
0 |
OCF_SUCCESS |
成功。命令成功完成。這是全部啓動、中止、升級和降級命令的所需結果。 |
軟 |
1 |
OCF_ERR_GENERIC |
通用 |
軟 |
2 |
OCF_ERR_ARGS |
此計算機上的資源配置無效(例如,它引用了節點上找不到的某個位置/工具)。 |
硬 |
3 |
OCF_ERR_UNIMPLEMENTED |
請求的操做未實現。 |
硬 |
4 |
OCF_ERR_PERM |
資源代理不具有完成該任務的足夠特權。 |
硬 |
5 |
OCF_ERR_INSTALLED |
資源所需的工具未安裝在此計算機上。 |
硬 |
6 |
OCF_ERR_CONFIGURED |
資源的配置無效(例如,缺乏必需的參數)。 |
致命 |
7 |
OCF_NOT_RUNNING |
資源未運行。羣集將不會嘗試中止爲任何操做返回此代碼的資源。 此 OCF 返回代碼可能須要或不須要資源恢復,這取決於所需的資源狀態。若是出現意外,則執行軟恢復。 |
不適用 |
8 |
OCF_RUNNING_MASTER |
資源正在主節點中運行。 |
軟 |
9 |
OCF_FAILED_MASTER |
資源在主節點中,但已失敗。資源將再次被降級、中止再重啓動(而後也可能升級)。 |
軟 |
其餘 |
不適用 |
自定義錯誤代碼。 |
軟 |
STONITH : ("Shoot The Other Node In The Head" or "Shoot The Offending Node In The Head")。 關閉其餘節點。 STONITH 設備是一個電源開關,用於羣集重設置被認爲出現故障的節點。重設置沒有檢測信號的節點是確保存在但出現故障的節點未執行數據破壞的惟一可靠方法。
Resource Agent: 用來控制服務啓停、監控服務狀態的腳本集合,這些腳本將被LRM調用從而實現各類資源啓動、中止、監控等等。
1. 啓動corosync-notifyd服務失敗,在/var/log/messages出現以下信息
err daemon vClass-6WUNV notifyd[68616]: [error] Not compiled with DBus support enabled, exiting.
2. 各節點之間ssh互信
[root@node1 ~]# ssh-keygen -t rsa -f ~/.ssh/id_rsa -P '' [root@node1 ~]# ssh-copy-id -i .ssh/id_rsa.pub root@node2.test.com
創建互信後,經過shell訪問時,依然須要輸入密碼。須要修改配置文件 /etc/ssh/sshd_config 中 StrictModes 爲 no。 缺省爲yes
#LoginGraceTime 2m #PermitRootLogin yes StrictModes no #MaxAuthTries 6 #MaxSessions 10
修改完畢配置文件,須要重啓 sshd服務
service sshd restart
3. ssh太低致使出現以下問題
crm(live)cluster# health INFO: Check the health of the cluster INFO: Nodes: vClass-CPLjR, vClass-2lgAr ERROR: [vClass-CPLjR]: Start: Exited with error code 255, Error output: command-line: line 0: Bad configuration option: ControlPersist ERROR: [vClass-CPLjR]: Clean: Exited with error code 255, Error output: command-line: line 0: Bad configuration option: ControlPersist ERROR: cluster.health: Failed to connect to one or more of these hosts via SSH: vClass-CPLjR
4. 遇到第一個問題,若是有多個網卡用來作心跳怎麼辦?
心跳IP的配置在corosync.conf中totem配置項下的interface子項。多個心跳網卡配置多個interface,並把ringnumber加1(第一個是0),可是要注意totem配置中添加rrp_mode:active或者passive否則啓動orosync時會報錯。acitve對應延遲較低,可是性能較差,passive表示沒看到英文解釋。。。默認狀況下單個心跳rrp_mode是none的。
TIPs:rrp即路由冗餘協議,咱們接觸到比較多的是keepalived裏面的vrrp。
5. 資源默認不起動問題
configure下直接edite,在資源配置下方的meta後添加
meta target-role="Started"
corosync 官網: http://corosync.github.io/corosync/
pacemaker 官網: http://clusterlabs.org/
crmsh 官網: https://github.com/ClusterLabs/crmsh/tree/master
http://flymanhi.blog.51cto.com/1011558/1435851/
https://linux.die.net/man/8/dlm_controld
PCS命令配置corosync&pacemaker羣集操做步驟 https://wenku.baidu.com/view/b2a3199bc281e53a5902ff7c.html
corosync+pacemaker安裝配置實驗 https://wenku.baidu.com/view/e69f537904a1b0717fd5ddff.html?re=view
corosync+pacemaker+crmsh實現高可用 http://www.2cto.com/net/201507/425844.html
STONITH https://en.wikipedia.org/wiki/STONITH