Linux High Availabi RHCS

時間 2019-12-07

標籤 linux high availabi rhcs 欄目 Linux 简体版

原文原文鏈接

-----本文大綱html

簡介
node

術語python

環境配置mysql

實現過程web

命令行管理工具sql

-------------apache

1、簡介bash

RHCS 即 RedHat Cluster Suite ，中文意思即紅帽集羣套件。紅帽集羣套件（RedHat Cluter Suite, RHCS）是一套綜合的軟件組件，能夠經過在部署時採用不一樣的配置，以知足你的對高可用性，負載均衡，可擴展性，文件共享和節約成本的須要。對於須要最大正常運行時間的應用來講，帶有紅帽集羣套件（Red Hat Cluster Suite）的紅帽企業 Linux 集羣是最佳的選擇。紅帽集羣套件專爲紅帽企業 Linux 量身設計，它提供有以下兩種不一樣類型的集羣：一、應用/服務故障切換－經過建立n個節點的服務器集羣來實現關鍵應用和服務的故障切換二、IP 負載均衡－對一羣服務器上收到的 IP 網絡請求進行負載均衡利用紅帽集羣套件，能夠以高可用性配置來部署應用，從而使其老是處於運行狀態－這賦予了企業向外擴展（scale- out）Linux 部署的能力。對於網絡文件系統（NFS）、Samba 和Apache 等大量應用的開源應用來講，紅帽集羣套件提供了一個隨時可用的全面故障切換解決方案。服務器

2、術語網絡

分佈式集羣管理器（CMAN）

Cluster Manager，簡稱CMAN，是一個分佈式集羣管理工具，它運行在集羣的各個節點上，爲RHCS提供集羣管理任務。

CMAN用於管理集羣成員、消息和通知。它經過監控每一個節點的運行狀態來了解節點成員之間的關係，當集羣中某個節點出現故障，節點成員關係將發生改變，CMAN及時將這種改變通知底層，進而作出相應的調整。

鎖管理（DLM）

Distributed Lock Manager，簡稱DLM，表示一個分佈式鎖管理器，它是RHCS的一個底層基礎構件，同時也爲集羣提供了一個公用的鎖運行機制，在RHCS集羣系統中，DLM運行在集羣的每一個節點上，GFS經過鎖管理器的鎖機制來同步訪問文件系統元數據。CLVM經過鎖管理器來同步更新數據到LVM卷和卷組。

DLM不須要設定鎖管理服務器，它採用對等的鎖管理方式，大大的提升了處理性能。同時，DLM避免了當單個節點失敗須要總體恢復的性能瓶頸，另外，DLM的請求都是本地的，不須要網絡請求，於是請求會當即生效。最後，DLM經過分層機制，能夠實現多個鎖空間的並行鎖模式。

配置文件管理（CCS）

Cluster Configuration System，簡稱CCS，主要用於集羣配置文件管理和配置文件在節點之間的同步。CCS運行在集羣的每一個節點上，監控每一個集羣節點上的單一配置文件 /etc/cluster/cluster.conf的狀態，當這個文件發生任何變化時，都將此變化更新到集羣中的每一個節點，時刻保持每一個節點的配置文件同步。例如，管理員在節點A上更新了集羣配置文件，CCS發現A節點的配置文件發生變化後，立刻將此變化傳播到其它節點上去。rhcs的配置文件是cluster.conf，它是一個xml文件，具體包含集羣名稱、集羣節點信息、集羣資源和服務信息、fence設備等，這個會在後面講述。

柵設備（FENCE）

FENCE設備是RHCS集羣中必不可少的一個組成部分，經過FENCE設備能夠避免因出現不可預知的狀況而形成的「腦裂」現象，FENCE設備的出現，就是爲了解決相似這些問題，Fence設備主要就是經過服務器或存儲自己的硬件管理接口，或者外部電源管理設備，來對服務器或存儲直接發出硬件管理指令，將服務器重啓或關機，或者與網絡斷開鏈接。

FENCE的工做原理是：當意外緣由致使主機異常或者宕機時，備機會首先調用FENCE設備，而後經過FENCE設備將異常主機重啓或者從網絡隔離，當FENCE操做成功執行後，返回信息給備機，備機在接到FENCE成功的信息後，開始接管主機的服務和資源。這樣經過FENCE設備，將異常節點佔據的資源進行了釋放，保證了資源和服務始終運行在一個節點上。

RHCS的FENCE設備能夠分爲兩種：內部FENCE和外部FENCE，經常使用的內部FENCE有IBM RSAII卡，HP的iLO卡，還有IPMI的設備等，外部fence設備有UPS、SAN SWITCH、NETWORK SWITCH等

高可用服務管理器

高可用性服務管理主要用來監督、啓動和中止集羣的應用、服務和資源。它提供了一種對集羣服務的管理能力，當一個節點的服務失敗時，高可用性集羣服務管理進程能夠將服務從這個失敗節點轉移到其它健康節點上來，而且這種服務轉移能力是自動、透明的。RHCS經過rgmanager來管理集羣服務，rgmanager運行在每一個集羣節點上，在服務器上對應的進程爲clurgmgrd。在一個RHCS集羣中，高可用性服務包含集羣服務和集羣資源兩個方面，集羣服務其實就是應用服務，例如apache、mysql等，集羣資源有不少種，例如一個IP地址、一個運行腳本、ext3/GFS文件系統等。在RHCS集羣中，高可用性服務是和一個失敗轉移域結合在一塊兒的，所謂失敗轉移域是一個運行特定服務的集羣節點的集合。在失敗轉移域中，能夠給每一個節點設置相應的優先級，經過優先級的高低來決定節點失敗時服務轉移的前後順序，若是沒有給節點指定優先級，那麼集羣高可用服務將在任意節點間轉移。所以，經過建立失敗轉移域不但能夠設定服務在節點間轉移的順序，並且能夠限制某個服務僅在失敗轉移域指定的節點內進行切換。

集羣配置管理工具

RHCS提供了多種集羣配置和管理工具，經常使用的有基於GUI的system-config-cluster、Conga等，也提供了基於命令行的管理工具。

system-config-cluster是一個用於建立集羣和配置集羣節點的圖形化管理工具，它有集羣節點配置和集羣管理兩個部分組成，分別用於建立集羣節點配置文件和維護節點運行狀態。通常用在RHCS早期的版本中。Conga是一種web集羣配置工具，與system-config-cluster不一樣的是，Conga是經過web方式來配置和管理集羣節點的。Conga有兩部分組成，分別是luci和ricci，luci安裝在一臺獨立的計算機上，用於配置和管理集羣，ricci安裝在每一個集羣節點上，Luci經過ricci和集羣中的每一個節點進行通訊。RHCS也提供了一些功能強大的集羣命令行管理工具，經常使用的有clustat、cman_tool、ccs_tool、fence_tool、clusvcadm等。

Redhat GFS

GFS是RHCS爲集羣系統提供的一個存儲解決方案，它容許集羣多個節點在塊級別上共享存儲，每一個節點經過共享一個存儲空間，保證了訪問數據的一致性，更切實的說，GFS是 RHCS提供的一個集羣文件系統，多個節點同時掛載一個文件系統分區，而文件系統數據不受破壞，這是單一的文件系統，例如EXT三、EXT2所不能作到的。爲了實現多個節點對於一個文件系統同時讀寫操做，GFS使用鎖管理器來管理I/O操做，當一個寫進程操做一個文件時，這個文件就被鎖定，此時不容許其它進程進行讀寫操做，直到這個寫進程正常完成才釋放鎖，只有當鎖被釋放後，其它讀寫進程才能對這個文件進行操做，另外，當一個節點在GFS文件系統上修改數據後，這種修改操做會經過RHCS底層通訊機制當即在其它節點上可見。在搭建RHCS集羣時，GFS通常做爲共享存儲，運行在每一個節點上，而且能夠經過RHCS管理工具對GFS進行配置和管理。這些須要說明的是RHCS和GFS之間的關係，通常初學者很容易混淆這個概念：運行RHCS，GFS不是必須的，只有在須要共享存儲時，才須要GFS支持，而搭建GFS集羣文件系統，必需要有RHCS的底層支持，因此安裝GFS文件系統的節點，必須安裝RHCS組件

3、環境配置

系統	角色	ip地址	安裝包
Centos6.5 x86_64	管理集羣節點端	192.168.1.110	luci
Centos6.5 x86_64	web節點端（node1）	192.168.1.103	ricci
Centos6.5 x86_64	web節點端（node2）	192.168.1.109	ricci
Centos6.5 x86_64	web節點端（node3）	192.168.1.108	ricci

4、實現過程

前提

時間同步

主機名解析

ssh互信

注：若是yum源中有epel源要禁用，這是由於，此套件爲redhat官方只承認本身發行的版本，若是不是承認的版本，可能將沒法啓動服務

管理集羣節點端

 
   [root@essun ~]# yum install -y luci
[root@essun ~]# service luci start
Adding following auto-detected host IDs (IP addresses/domain names), corresponding to `essun.node4.com' address, to the configuration of self-managed certificate `/var/lib/luci/etc/cacert.config' (you can change them by editing `/var/lib/luci/etc/cacert.config', removing the generated certificate `/var/lib/luci/certs/host.pem' and restarting luci):
    (none suitable found, you can still do it manually as mentioned above)
Generating a 2048 bit RSA private key
writing new private key to '/var/lib/luci/certs/host.pem'
Starting saslauthd:                                        [  OK  ]
Start luci...                                              [  OK  ]
Point your web browser to https://essun.node4.com:8084 (or equivalent) to access luci
[root@essun ~]# ss -tnpl |grep 8084
LISTEN     0      5                         *:8084                     *:*      users:(("python",2920,5))
[root@essun ~]# 
  

而節點間要安裝ricci，而且要爲各節點上的ricci用戶建立一個密碼，以便集羣服務管理各節點，爲每個節點提供一個測試頁面（此處以一個節點爲例，其它兩個節點安裝方式同樣）

 
   [root@essun .ssh]# yum install ricci -y
[root@essun ~]# service ricci start
Starting oddjobd:                                          [  OK  ]
generating SSL certificates...  done
Generating NSS database...  done
Starting ricci:                                            [  OK  ]
[root@essun ~]# ss -tnlp |grep ricci
LISTEN     0      5                        :::11111                   :::*      users:(("ricci",2241,3))
#ricci默認監聽於11111端口
[root@essun .ssh]# echo "ricci" |passwd --stdin ricci
Changing password for user ricci.
passwd: all authentication tokens updated successfully.
#此處以ricci爲密碼
[root@essun html]# echo "<h1>`hostname`</h1>" >index.html
[root@essun html]# service httpd start
Starting httpd:                                            [  OK  ] 
  

打開web界面就能夠配置了

輸入系統用戶及密碼就能夠登陸了，（注：必定要以https://協議開頭）

輸入正確的用戶及密碼就可登陸了，若是是root登陸會有警告提示信息

這時就可使用Manager Clusters管理集羣了

建立一個集羣

建立完成後

標籤說明

Nodes :節點信息

Fence Devices ：隔離設備

Failover Domins ：故障轉移域

Resources ：定義資源

Service Groups ：服務組

Cinfigure ：配置文件

定義故障轉移域

定義故障轉移域的優先級，當節點從新上線後，資源是否切換

添加後的狀態

能夠選擇的資源類型

添加一個ip地址

ip address ：ip 地址

Netmask Bits:掩碼位數

Montor Link :是否監控此連接

Disable Updates to Static Route：是否更新靜態路由

Number of Seconds to Sleep Removing an IP Address

多長時間無響應將轉移此ip 地址

提交後生成的一條記錄

將此資源添加到組中（也能夠在service group定義資源）

service name :服務的名字

Automatically Start This Service：是否自動運行此服務

Failover Domain:故障轉移區域

Recovery Policy：故障處理規則

Relocate :轉移到其它節點

Restart :在當前節點上重啓

Restart-Disable ：禁止重啓，直接轉移到其它節點上

Disable：禁用

還能夠添加資源

將己經定義的資源添加到組中（以前己經定義過的ip地址）

添加一個web服務

注：在默認類型中並無 httpd服務，因此只能經過腳原本調用

定義完成後就能夠提交了，若是此資源想撤銷，能夠點擊右上角remove便可

提交後的組資源

剛提交時，沒法檢測資源的狀態，要重啓一次

重啓組資源（Restart）

當前的狀態己經顯示運行於節點node1上

訪問測試一下，己經運行於node1上

模擬故障轉移

在節點中將node1離線

查看資源是否轉移，能夠查看service group，也可能經過網頁測試

服務己經切換到node3上，訪問一下網頁看一下效果

資源的確己經切換了，讓node1從新上線後，資源是不會切換回到node1上的，由於在定義節時己經設置了no failback

5、命令行管理工具

一、clustat

clustat 顯示集羣狀態。它爲您提供成員信息、仲裁查看、全部高可用性服務的狀態，並給出運行 clustat 命令的節點（本地）

命令參數

[root@essun html]# clustat --help
clustat: invalid option -- '-'
usage: clustat <options>
    -i <interval>      Refresh every <interval> seconds.  May not be used
                       with -x.
    -I                 Display local node ID and exit
    -m <member>        Display status of <member> and exit
    -s <service>       Display status of <service> and exit
    -v                 Display version and exit
    -x                 Dump information as XML
    -Q                 Return 0 if quorate, 1 if not (no output)
    -f                 Enable fast clustat reports
    -l                 Use long format for services
#查看節點狀態信息
[root@essun html]# clustat -l
Cluster Status for Cluster Node @ Wed May  7 11:32:55 2014
Member Status: Quorate
 Member Name                                        ID   Status
 ------ ----                                        ---- ------
 node1                                                  1 Online, Local, rgmanager
 node2                                                  2 Online
 node3                                                  3 Online
Service Information
------- -----------
Service Name      : service:webservice
  Current State   : started (112)
  Flags           : none (0)
  Owner           : node1
  Last Owner      : none
  Last Transition : Wed May  7 10:06:48 2014

二、clusvcadm

您可使用 clusvcadm 命令管理 HA 服務。使用它您能夠執行如下操做：

啓用並啓動服務

禁用服務

中止服務

凍結服務

解凍服務

遷移服務（只用於虛擬機服務）

從新定位服務

重啓服務

命令參數

 
    [root@essun html]# clusvcadm
usage: clusvcadm [command]
Resource Group Control Commands:
  -v                     Display version and exit
  -d <group>             Disable <group>.  This stops a group
                         until an administrator enables it again,
                         the cluster loses and regains quorum, or
                         an administrator-defined event script
                         explicitly enables it again.
  -e <group>             Enable <group>
  -e <group> -F          Enable <group> according to failover
                         domain rules (deprecated; always the
                         case when using central processing)
  -e <group> -m <member> Enable <group> on <member>
  -r <group> -m <member> Relocate <group> [to <member>]
                         Stops a group and starts it on another
                         cluster member.
  -M <group> -m <member> Migrate <group> to <member>
                         (e.g. for live migration of VMs)
  -q                     Quiet operation
  -R <group>             Restart a group in place.
  -s <group>             Stop <group>.  This temporarily stops
                         a group.  After the next group or
                         or cluster member transition, the group
                         will be restarted (if possible).
  -Z <group>             Freeze resource group.  This prevents
                         transitions and status checks, and is
                         useful if an administrator needs to
                         administer part of a service without
                         stopping the whole service.
  -U <group>             Unfreeze (thaw) resource group.  Restores
                         a group to normal operation.
  -c <group>             Convalesce (repair, fix) resource group.
                         Attempts to start failed, non-critical
                         resources within a resource group.
Resource Group Locking (for cluster Shutdown / Debugging):
  -l                     Lock local resource group managers.
                         This prevents resource groups from
                         starting.
  -S                     Show lock state
  -u                     Unlock resource group managers.
                         This allows resource groups to start.
#資源遷移
[root@essun html]# clusvcadm -r webservice -m node1
Trying to relocate service:webservice to node1...Success
service:webservice is now running on node1
[root@essun html]# curl http://192.168.1.150
<h1>essun.node1.com</h1> 
   

三、cman_tool

cman_tool是一種用來管理CMAN集羣管理子系統的工具集，cman_tool能夠用來添加集羣節點,殺死另外一個集羣節點或改變預期集羣的選票的價值。

注意：cman_tool發出的命令會影響你的集羣中的全部節點。

命令參數

 
    [root@essun html]# cman_tool -h
Usage:
cman_tool <join|leave|kill|expected|votes|version|wait|status|nodes|services|debug> [options]
Options:
  -h               Print this help, then exit
  -V               Print program version information, then exit
  -d               Enable debug output
join
  Cluster & node information is taken from configuration modules.
  These switches are provided to allow those values to be overridden.
  Use them with extreme care.
  -m <addr>        Multicast address to use
  -v <votes>       Number of votes this node has
  -e <votes>       Number of expected votes for the cluster
  -p <port>        UDP port number for cman communications
  -n <nodename>    The name of this node (defaults to hostname)
  -c <clustername> The name of the cluster to join
  -N <id>          Node id
  -C <module>      Config file reader (default: xmlconfig)
  -w               Wait until node has joined a cluster
  -q               Wait until the cluster is quorate
  -t               Maximum time (in seconds) to wait
  -k <file>        Private key file for Corosync communications
  -P               Don't set corosync to realtime priority
  -X               Use internal cman defaults for configuration
  -A               Don't load openais services
  -D<fail|warn|none> What to do about the config. Default (without -D) is to
                   validate the config. with -D no validation will be done.
                   -Dwarn will print errors but allow the operation to continue.
  -z           Disable stderr debugging output.
wait               Wait until the node is a member of a cluster
  -q               Wait until the cluster is quorate
  -t               Maximum time (in seconds) to wait
leave
  -w               If cluster is in transition, wait and keep trying
  -t               Maximum time (in seconds) to wait
  remove           Tell other nodes to ajust quorum downwards if necessary
  force            Leave even if cluster subsystems are active
kill
  -n <nodename>    The name of the node to kill (can specify multiple times)
expected
  -e <votes>       New number of expected votes for the cluster
votes
  -v <votes>       New number of votes for this node
status             Show local record of cluster status
nodes              Show local record of cluster nodes
  -a                 Also show node address(es)
  -n <nodename>      Only show information for specific node
  -F <format>        Specify output format (see man page)
services           Show local record of cluster services
version
  -r               Reload cluster.conf and update config version.
  -D <fail,warn,none> What to do about the config. Default (without -D) is to
                   validate the config. with -D no validation will be done. -Dwarn will print errors
                   but allow the operation to continue
  -S               Don't run ccs_sync to distribute cluster.conf (if appropriate)
#查看節點屬性
[root@essun html]# cman_tool status
Version: 6.2.0
Config Version: 6
Cluster Name: Cluster Node
Cluster Id: 26887
Cluster Member: Yes
Cluster Generation: 36
Membership state: Cluster-Member
Nodes: 3
Expected votes: 3
Total votes: 3
Node votes: 1
Quorum: 2
Active subsystems: 9
Flags:
Ports Bound: 0 11 177
Node name: node1
Node ID: 1
Multicast addresses: 239.192.105.112
Node addresses: 192.168.1.103 
   

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。