HA_Cluster:corosync+pacemaker實現web高可用

時間 2019-12-05

標籤 cluster corosync+pacemaker corosync pacemaker 實現 web 可用欄目 HTML 简体版

原文原文鏈接

環境說明:

ubuntu 14.4
集羣所用兩臺服務器ip: 10.11.8.192 和 10.11.8.193
nfs服務ip: 10.11.8.43html

前提配置:

1.時間同步
2.主機名及hosts文件配置(兩臺主機都須要配置)node

vim /etc/sysctl.d/10-kernel-hardening.conf  #配置主機名爲node1,添加
kernel.hostname = node1
vim /etc/sysctl.d/10-kernel-hardening.conf  #配置主機名爲node2,添加
kernel.hostname = node2

爲防止DNS解析出錯,使用hosts文件,保證主機名與'uname -n'一致web

vim /etc/hosts  #都添加hosts記錄
10.11.8.192 node1
10.11.8.193 node2

3.配置基於ssh祕鑰的雙機互信apache

Node1:bootstrap

root@node1:~# ssh-keygen -t rsa
root@node1:~# ssh-copy-id -i ~/.ssh/id_rsa.pub root@node2 #yes接受key,輸入node2的密碼

Node2:ubuntu

root@node2:~# ssh-keygen -t rsa
root@node2:~# ssh-copy-id -i ~/.ssh/id_rsa.pub root@node1

安裝corosync和pacemaker:vim

cluster-glue cluster-glue-dev heartbeat resource-agents corosync
heartbeat-dev pacemaker corosync-lib libesmtp pacemaker-dev服務器

配置corosync，（如下命令在node1上執行）:

1.編輯/etc/corosync/corosync.conf網絡

# Please read the openais.conf.5 manual page

totem {
    version: 2

    # How long before declaring a token lost (ms)
    token: 3000

    # How many token retransmits before forming a new configuration
    token_retransmits_before_loss_const: 10

    # How long to wait for join messages in the membership protocol (ms)
    join: 60

    # How long to wait for consensus to be achieved before starting a new round of membership configuration (ms)
    consensus: 3600

    # Turn off the virtual synchrony filter
    vsftype: none

    # Number of messages that may be sent by one processor on receipt of the token
    max_messages: 20

    # Limit generated nodeids to 31-bits (positive signed integers)
    clear_node_high_bit: yes

    # Disable encryption
     secauth: off  #啓動認證功能

    # How many threads to use for encryption/decryption
     threads: 0

    # Optionally assign a fixed node id (integer)
    # nodeid: 1234

    # This specifies the mode of redundant ring, which may be none, active, or passive.
     rrp_mode: none

     interface {
        # The following values need to be set based on your environment 
        ringnumber: 0
        bindnetaddr: 10.11.8.0  #主機所在的網絡地址
        mcastaddr: 226.93.2.1  #組播地址,只要不被佔用便可使用 ps: 224.0.2.0～238.255.255.255爲用戶可用的組播地址（臨時組地址），全網範圍內有效；
        mcastport: 5405  #組播端口
    }
}

amf {
    mode: disabled
}

quorum {
    # Quorum for the Pacemaker Cluster Resource Manager
    provider: corosync_votequorum
    expected_votes: 1
}

aisexec {
        user:   root
        group:  root
}

logging {
        fileline: off
        to_stderr: no  #輸出到標準輸出
        to_logfile: yes  #輸出到日誌文件
        logfile: /var/log/corosync.log  #日誌文件位置
        to_syslog: no  #輸出到系統日誌
        syslog_facility: daemon
        debug: off
        timestamp: on
        logger_subsys {
                subsys: AMF
                debug: off
                tags: enter|leave|trace1|trace2|trace3|trace4|trace6
        }
}
# 添加pacemaker服務配置
service {
    ver: 1
    name: pacemaker
}

PS: 官方文檔已修改, pacemaker需獨立啓動dom

Corosync In the past the Corosync process would launch pacemaker, this
is no longer the case. Pacemaker must be launched after Corosync has
successfully started.
來源： http://clusterlabs.org/wiki/Initial_Configuration#Corosync

/etc/init.d/corosync start
 /etc/init.d/pacemaker start

2.生成節點間通訊時用到的認證密鑰文件：

root@node1:~# corosync-keygen -l

option: -l 從/dev/urandom獲取隨機數
corosync-keygen: 不加-l 參數, 會從/dev/random中獲取隨機數, 若是隨機數不夠, 會卡住

3.將corosync和authkey複製至node2:

root@node1:~# scp -p corosync authkey node2:/etc/corosync/

4.分別編輯兩個節點/etc/default/corosync文件

# vim /etc/default/corosync
START=yes

若是不修改, 命令正常執行, 無回顯, 進程不啓動

啓動corosync+pacemaker:

root@node1:~# /etc/init.d/corosync start
root@node1:~# /etc/init.d/pacemaker start
root@node1:~# tail -f /var/log/corosync.log #查看日誌文件
root@node1:~# netstat -tunlp #查看端口監聽狀況
udp        0      0 10.11.8.192:5404        0.0.0.0:*                           1431/corosync   
udp        0      0 10.11.8.192:5405        0.0.0.0:*                           1431/corosync   
udp        0      0 226.93.2.1:5405         0.0.0.0:*                           1431/corosync

node1正常啓動後便可啓動node2

root@node1:~# ssh node2 -- /etc/init.d/corosync start
root@node1:~# ssh node2 -- /etc/init.d/pacemaker start

查看集羣節點狀態:

Last updated: Wed May 18 08:49:46 2016
Last change: Mon May 16 06:12:56 2016 via crm_attribute on node1
Stack: corosync
Current DC: node1 (168495296) - partition with quorum
Version: 1.1.10-42f2063
2 Nodes configured
0 Resources configured


Online: [ node1 node2 ]

# ps auxf #查看集羣進程
root      1472  0.0  1.1 107512  9040 pts/0    S    08:32   0:00 pacemakerd
haclust+  1474  0.0  2.0 110260 15636 ?        Ss   08:32   0:00  \_ /usr/lib/pacemaker/cib
root      1475  0.0  1.2 107264  9668 ?        Ss   08:32   0:00  \_ /usr/lib/pacemaker/stonithd
root      1476  0.0  0.9  81824  6992 ?        Ss   08:32   0:00  \_ /usr/lib/pacemaker/lrmd
haclust+  1477  0.0  0.8  97688  6800 ?        Ss   08:32   0:00  \_ /usr/lib/pacemaker/attrd
haclust+  1478  0.0  2.9 110264 22136 ?        Ss   08:32   0:00  \_ /usr/lib/pacemaker/pengine
haclust+  1479  0.0  1.8 166560 14000 ?        Ss   08:32   0:00  \_ /usr/lib/pacemaker/crmd

配置集羣屬性:

1.禁用stonith

corosync默認啓用裏stonith, 此集羣沒有stonith設備, 因而禁用

# crm configure property stonith-enabled=false

PS: crm有命令模式和交互模式兩種使用方法

使用以下命令查看當前的配置信息：

# crm configure show
node node1.magedu.com
node node2.magedu.com
property $id="cib-bootstrap-options" \
  dc-version="1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87" \
  cluster-infrastructure="openais" \
  expected-quorum-votes="2" \
  stonith-enabled="false

從中能夠看出stonith已經被禁用.

2.修改忽略quorum不能知足的集羣狀態檢查
離線一個節點後, 集羣狀態爲"WITHOUT quorum"，即已經失去了quorum，此時集羣服務自己已經不知足正常運行的條件，這對於只有兩節點的集羣來說是不合理的。

# crm configure property no-quorum-policy=ignore

給集羣添加資源:

1.查看資源

corosync支持heartbeat，LSB和ocf等類型的資源代理，目前較爲經常使用的類型爲LSB和OCF兩類，stonith類專爲配置stonith設備而用；

能夠經過以下命令查看當前集羣系統所支持的類型：

# crm ra classes 
heartbeat
lsb
ocf / heartbeat pacemaker
stonith

若是想要查看某種類別下的所用資源代理的列表，可使用相似以下命令實現：

# crm ra list lsb
# crm ra list ocf heartbeat
# crm ra info ocf:heartbeat:IPaddr

2.添加資源

crm(live)configure# primitive webip ocf:heartbeat:IPaddr params ip=10.11.8.200 nic=eth1 cidr_netmask=23
crm(live)configure# primitive filesystem ocf:heartbeat:Filesystem params device=10.11.8.43:/www/html directory=/var/www/html fstype=nfs
crm(live)configure# primitive httpd lsb:apache2

crm(live)# status 
Last updated: Wed May 18 09:12:56 2016
Last change: Wed May 18 09:12:52 2016 via cibadmin on node1
Stack: corosync
Current DC: node1 (168495296) - partition with quorum
Version: 1.1.10-42f2063
2 Nodes configured
3 Resources configured


Online: [ node1 node2 ]

 webip (ocf::heartbeat:IPaddr): Started node1 
 filesystem (ocf::heartbeat:Filesystem): Started node2 
 httpd (lsb:apache2): Started node1

3.資源約束

集羣的3個資源沒有運行在同一節點上, 這不是我想要的

crm(live)configure# group webservice webip filesystem httpd
crm(live)# status 
Last updated: Wed May 18 09:22:48 2016
Last change: Wed May 18 09:20:48 2016 via crm_attribute on node2
Stack: corosync
Current DC: node1 (168495296) - partition with quorum
Version: 1.1.10-42f2063
2 Nodes configured
3 Resources configured


Online: [ node1 node2 ]

 Resource Group: webservice
 webip (ocf::heartbeat:IPaddr): Started node1 
 filesystem (ocf::heartbeat:Filesystem): Started node1 
 httpd (lsb:apache2): Started node1

三種資源約束方法：
1）Resource Location（資源位置）：定義資源能夠、不能夠或儘量在哪些節點上運行；
2）Resource Collocation（資源排列）：排列約束用以定義集羣資源能夠或不能夠在某個節點上同時運行；
3）Resource Order（資源順序）：順序約束定義集羣資源在節點上啓動的順序；

定義約束時，還須要指定分數。各類分數是集羣工做方式的重要組成部分。其實，從遷移資源到決定在已降級集羣中中止哪些資源的整個過程是經過以某種方式修改分數來實現的。分數按每一個資源來計算，資源分數爲負的任何節點都沒法運行該資源。在計算出資源分數後，集羣選擇分數最高的節點。INFINITY（無窮大）目前定義爲 1,000,000。加減無窮大遵循如下3個基本規則：
1）任何值 + 無窮大 = 無窮大
2）任何值 - 無窮大 = -無窮大
3）無窮大 - 無窮大 = -無窮大

定義資源約束時，也能夠指定每一個約束的分數。分數表示指派給此資源約束的值。分數較高的約束先應用，分數較低的約束後應用。經過使用不一樣的分數爲既定資源建立更多位置約束，能夠指定資源要故障轉移至的目標節點的順序。

所以，對於前述的WebIP和WebSite可能會運行於不一樣節點的問題，能夠經過如下命令來解決：

# crm configure colocation website-with-ip INFINITY: WebSite WebIP

接着，咱們還得確保WebSite在某節點啓動以前得先啓動WebIP，這可使用以下命令實現：

# crm configure order httpd-after-ip mandatory: WebIP WebSite

此外，因爲HA集羣自己並不強制每一個節點的性能相同或相近，因此，某些時候咱們可能但願在正常時服務總能在某個性能較強的節點上運行，這能夠經過位置約束來實現：

# crm configure location prefer-node1 WebSite rule 200: node1

這條命令實現了將WebSite約束在node1上，且指定其分數爲200；

測試:

crm(live)node# standby 
crm(live)node# cd
crm(live)# status 
Last updated: Wed May 18 09:25:24 2016
Last change: Wed May 18 09:25:20 2016 via crm_attribute on node1
Stack: corosync
Current DC: node1 (168495296) - partition with quorum
Version: 1.1.10-42f2063
2 Nodes configured
3 Resources configured


Node node1 (168495296): standby
Online: [ node2 ]

 Resource Group: webservice
 webip    (ocf::heartbeat:IPaddr):    Started node2 
 filesystem    (ocf::heartbeat:Filesystem):    Started node2 
 httpd    (lsb:apache2):    Started node2