[ Openstack ] Openstack-Mitaka 高可用之 Pacemaker+corosync+pcs 高可用集羣

時間 2019-11-25

標籤 openstack mitaka 用之 pacemaker+corosync+pcs pacemaker corosync pcs 可用集羣欄目負載均衡简体版

原文原文鏈接

介紹及特色

    Pacemaker：工做在資源分配層，提供資源管理器的功能
   Corosync：提供集羣的信息層功能，傳遞心跳信息和集羣事務信息
   Pacemaker + Corosync 就能夠實現高可用集羣架構node

集羣搭建

如下三個節點都須要執行：mysql

# yum install pcs -y
# systemctl start  pcsd ; systemctl enable pcsd
# echo 'hacluster' | passwd --stdin hacluster
# yum install haproxy  rsyslog -y
# echo 'net.ipv4.ip_nonlocal_bind = 1' >> /etc/sysctl.conf        # 啓動服務的時候，容許忽視VIP的存在
# echo 'net.ipv4.ip_forward = 1' >> /etc/sysctl.conf        # 開啓內核轉發功能
# sysctl -p

在任意節點建立用於haproxy監控Mariadb的用戶redis

MariaDB [(none)]> CREATE USER 'haproxy'@'%' ;

配置haproxy用於負載均衡器sql

[root@controller1 ~]# egrep -v "^#|^$" /etc/haproxy/haproxy.cfg
    log         127.0.0.1 local2
    chroot      /var/lib/haproxy
    pidfile     /var/run/haproxy.pid
    maxconn     4000
    user        haproxy
    group       haproxy
    daemon
    # turn on stats unix socket
    stats socket /var/lib/haproxy/stats
defaults
    mode                    http
    log                     global
    option                  httplog
    option                  dontlognull
    option http-server-close
    option forwardfor       except 127.0.0.0/8
    option                  redispatch
    retries                 3
    timeout http-request    10s
    timeout queue           1m
    timeout connect         10s
    timeout client          1m
    timeout server          1m
    timeout http-keep-alive 10s
    timeout check           10s
    maxconn                 4000
listen galera_cluster
    mode tcp            
    bind 192.168.0.10:3306
    balance source
    option mysql-check user haproxy
    server controller1 192.168.0.11:3306 check inter 2000 rise 3 fall 3 backup
    server controller2 192.168.0.12:3306 check inter 2000 rise 3 fall 3 
    server controller3 192.168.0.13:3306 check inter 2000 rise 3 fall 3 backup

listen memcache_cluster
    mode tcp
    bind 192.168.0.10:11211
    balance source
    option tcplog
    server controller1 192.168.0.11:11211 check inter 2000 rise 3 fall 3 
    server controller2 192.168.0.12:11211 check inter 2000 rise 3 fall 3
    server controller3 192.168.0.13:11211 check inter 2000 rise 3 fall 3

注意：數據庫

    （1）確保haproxy配置無誤，建議首先修改ip和端口啓動測試是否成功。
   （2）Mariadb-Galera和rabbitmq默認監聽到 0.0.0.0 修改調整監聽到本地 192.168.0.x
   （3）將haproxy正確的配置拷貝到其餘節點，無需手動啓動haproxy服務vim

爲haproxy配置日誌（全部controller節點執行）：網絡

# vim /etc/rsyslog.conf
…
$ModLoad imudp
$UDPServerRun 514
…
local2.*                                                /var/log/haproxy/haproxy.log
…

# mkdir -pv /var/log/haproxy/
mkdir: created directory ‘/var/log/haproxy/’

# systemctl restart rsyslog

啓動haproxy進行驗證操做：架構

# systemctl start haproxy
[root@controller1 ~]# netstat -ntplu | grep ha
tcp        0      0 192.168.0.10:3306       0.0.0.0:*               LISTEN      15467/haproxy       
tcp        0      0 192.168.0.10:11211      0.0.0.0:*               LISTEN      15467/haproxy       
udp        0      0 0.0.0.0:43268           0.0.0.0:*                           15466/haproxy

驗證成功，關閉haproxy
# systemctl stop haproxy

在controller1節點上執行：負載均衡

[root@controller1 ~]# pcs cluster auth controller1 controller2 controller3 -u hacluster -p hacluster --force
controller3: Authorized
controller2: Authorized
controller1: Authorized

建立集羣：

[root@controller1 ~]# pcs cluster setup --name openstack-cluster controller1 controller2 controller3  --force
Destroying cluster on nodes: controller1, controller2, controller3...
controller3: Stopping Cluster (pacemaker)...
controller2: Stopping Cluster (pacemaker)...
controller1: Stopping Cluster (pacemaker)...
controller3: Successfully destroyed cluster
controller1: Successfully destroyed cluster
controller2: Successfully destroyed cluster

Sending 'pacemaker_remote authkey' to 'controller1', 'controller2', 'controller3'
controller3: successful distribution of the file 'pacemaker_remote authkey'
controller1: successful distribution of the file 'pacemaker_remote authkey'
controller2: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
controller1: Succeeded
controller2: Succeeded
controller3: Succeeded

Synchronizing pcsd certificates on nodes controller1, controller2, controller3...
controller3: Success
controller2: Success
controller1: Success
Restarting pcsd on the nodes in order to reload the certificates...
controller3: Success
controller2: Success
controller1: Success

啓動集羣的全部節點：

[root@controller1 ~]# pcs cluster start --all
controller2: Starting Cluster...
controller1: Starting Cluster...
controller3: Starting Cluster...
[root@controller1 ~]# pcs cluster enable --all
controller1: Cluster Enabled
controller2: Cluster Enabled
controller3: Cluster Enabled

查看集羣信息：

[root@controller1 ~]# pcs status
Cluster name: openstack-cluster
WARNING: no stonith devices and stonith-enabled is not false
Stack: corosync
Current DC: controller3 (version 1.1.16-12.el7_4.4-94ff4df) - partition with quorum
Last updated: Thu Nov 30 19:30:43 2017
Last change: Thu Nov 30 19:30:17 2017 by hacluster via crmd on controller3

3 nodes configured
0 resources configured

Online: [ controller1 controller2 controller3 ]

No resources


Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@controller1 ~]# pcs cluster status
Cluster Status:
 Stack: corosync
 Current DC: controller3 (version 1.1.16-12.el7_4.4-94ff4df) - partition with quorum
 Last updated: Thu Nov 30 19:30:52 2017
 Last change: Thu Nov 30 19:30:17 2017 by hacluster via crmd on controller3
 3 nodes configured
 0 resources configured

PCSD Status:
  controller2: Online
  controller3: Online
  controller1: Online

三個節點都在線
默認的表決規則建議集羣中的節點個數爲奇數且不低於3。當集羣只有2個節點，其中1個節點崩壞，因爲不符合默認的表決規則，集羣資源不發生轉移，集羣總體仍不可用。no-quorum-policy="ignore"能夠解決此雙節點的問題，但不要用於生產環境。換句話說，生產環境仍是至少要3節點。
pe-warn-series-max、pe-input-series-max、pe-error-series-max表明日誌深度。
cluster-recheck-interval是節點從新檢查的頻率。

[root@controller1 ~]#  pcs property set pe-warn-series-max=1000 pe-input-series-max=1000 pe-error-series-max=1000 cluster-recheck-interval=5min

禁用stonith：
stonith是一種可以接受指令斷電的物理設備，環境無此設備，若是不關閉該選項，執行pcs命令老是含其報錯信息。

[root@controller1 ~]# pcs property set stonith-enabled=false

二個節點時，忽略節點quorum功能：

[root@controller1 ~]# pcs property set no-quorum-policy=ignore

驗證集羣配置信息

[root@controller1 ~]# crm_verify -L -V

爲集羣配置虛擬 ip

[root@controller1 ~]# pcs resource create ClusterIP ocf:heartbeat:IPaddr2 \
 ip="192.168.0.10" cidr_netmask=32 nic=eno16777736 op monitor interval=30s

到此，Pacemaker+corosync 是爲 haproxy服務的，添加haproxy資源到pacemaker集羣

[root@controller1 ~]# pcs resource create lb-haproxy systemd:haproxy --clone

說明：建立克隆資源，克隆的資源會在所有節點啓動。這裏haproxy會在三個節點自動啓動。

查看Pacemaker資源狀況

[root@controller1 ~]# pcs resource 
 ClusterIP    (ocf::heartbeat:IPaddr2):    Started controller1        # 心跳的資源綁定在第三個節點的
 Clone Set: lb-haproxy-clone [lb-haproxy]        # haproxy克隆資源
     Started: [ controller1 controller2 controller3 ]

注意：這裏必定要進行資源綁定，不然每一個節點都會啓動haproxy，形成訪問混亂
將這兩個資源綁定到同一個節點上

[root@controller1 ~]# pcs constraint colocation add lb-haproxy-clone ClusterIP INFINITY

綁定成功

[root@controller1 ~]# pcs resource
 ClusterIP    (ocf::heartbeat:IPaddr2):    Started controller3
 Clone Set: lb-haproxy-clone [lb-haproxy]
     Started: [ controller1]
     Stopped: [ controller2 controller3 ]

配置資源的啓動順序，先啓動vip，而後haproxy再啓動，由於haproxy是監聽到vip

[root@controller1 ~]# pcs constraint order ClusterIP then lb-haproxy-clone

手動指定資源到某個默認節點，由於兩個資源綁定關係，移動一個資源，另外一個資源自動轉移。

[root@controller1 ~]# pcs constraint location ClusterIP prefers controller1
[root@controller1 ~]# pcs resource
 ClusterIP    (ocf::heartbeat:IPaddr2):    Started controller1
 Clone Set: lb-haproxy-clone [lb-haproxy]
     Started: [ controller1 ]
     Stopped: [ controller2 controller3 ]
[root@controller1 ~]# pcs resource defaults resource-stickiness=100        # 設置資源粘性，防止自動切回形成集羣不穩定
如今vip已經綁定到controller1節點
[root@controller1 ~]# ip a | grep global
    inet 192.168.0.11/24 brd 192.168.0.255 scope global eno16777736
    inet 192.168.0.10/32 brd 192.168.0.255 scope global eno16777736
    inet 192.168.118.11/24 brd 192.168.118.255 scope global eno33554992

嘗試經過vip鏈接數據庫

Controller1:

[root@controller1 haproxy]# mysql -ugalera -pgalera -h 192.168.0.10

Controller2:

高可用配置成功。

測試高可用是否正常

在controller1節點上直接執行 poweroff -f

[root@controller1 ~]# poweroff -f

vip很快就轉移到controller2節點上

再次嘗試訪問數據庫

無任何問題，測試成功。

查看集羣信息：

[root@controller2 ~]# pcs status 
Cluster name: openstack-cluster
Stack: corosync
Current DC: controller3 (version 1.1.16-12.el7_4.4-94ff4df) - partition with quorum
Last updated: Thu Nov 30 23:57:28 2017
Last change: Thu Nov 30 23:54:11 2017 by root via crm_attribute on controller1

3 nodes configured
4 resources configured

Online: [ controller2 controller3 ]
OFFLINE: [ controller1 ]            # controller1 已經下線

Full list of resources:

 ClusterIP    (ocf::heartbeat:IPaddr2):    Started controller2
 Clone Set: lb-haproxy-clone [lb-haproxy]
     Started: [ controller2 ]
     Stopped: [ controller1 controller3 ]

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled