1.配置 RHCS 集羣的前提:node
時間同步python
名稱解析,這裏使用修改/etc/hosts 文件linux
配置好 yum 源,CentOS 6 的默認的就行web
關閉防火牆(或者開放集羣所需通訊端口),和selinux,shell
關閉 NetworkManager 服務apache
2. RHCS 所須要的主要軟件包爲 cman 和 rgmanagerbash
cman: 是集羣基礎信息層,在 CentOS 6中依賴 corosyncdom
rgmanager: 是集羣資源管理器, 相似於pacemaker 的功能tcp
luci: 提供了管理 rhcs 集羣的 web 界面, luci 管理集羣主要是經過跟 ricci 通訊來完成的。ide
ricci: 安裝在集羣的節點的接收來自 luci 管理請求的代理。
luci 跟 ricci 的關係就好像 ambari-server 跟 ambari-agent 同樣。
3.環境說明:
luci : 192.168.6.31 cent1.test.com ricci: 192.168.6.32 cent2.test.com ricci: 192.168.6.33 cent3.test.com ricci: 192.168.6.34 cent4.test.com
我這裏已經配好了主機名了,可是其餘的如時間同步,配置/etc/hosts/ 等都沒執行,爲了方便,因此寫了個 playbook 來進行初始化一下
--- - hosts:hdpservers remote_user: root vars: tasks: - name: add synctime cron cron: name='sync time' minute='*/5'job='/usr/sbin/ntpdate 192.168.6.31' - name: shutdown iptables service: name=`item`.`name`state=`item`.`state` enabled=`item`.`enabled` with_items: - { name: iptables, state: stopped,enabled: no} - { name: NetworkManager, state: stopped,enabled: no} tags: stop service - name: copy selinux conf file copy: src=`item`.`src` dest=`item`.`dest`owner=`item`.`owner` group=`item`.`group` mode=`item`.`mode` with_items: - { src: '/etc/selinux/config', dest:/etc/selinux/config, owner: root, group: root, mode: '0644'} - { src: '/etc/hosts', dest: /etc/hosts,owner: root, group: root, mode: '0644'} - name: cmd off selinux shell: setenforce 0
執行這個 playbook,進行初始化
[root@cent1 yaml]#ansible-playbook base.yml
4.在 cent1 上安裝 luci, luci 是一個 python 程序,依賴不少python包
[root@cent1 ~]#yum install luci
啓動 luci
[root@cent3 ~]#/etc/init.d/luci start Adding followingauto-detected host IDs (IP addresses/domain names), corresponding to `cent3'address, to the configuration of self-managed certificate`/var/lib/luci/etc/cacert.config' (you can change them by editing`/var/lib/luci/etc/cacert.config', removing the generated certificate`/var/lib/luci/certs/host.pem' and restarting luci): (none suitable found, you can still doit manually as mentioned above) Generating a 2048bit RSA private key writing newprivate key to '/var/lib/luci/certs/host.pem' 正在啓動saslauthd: [肯定] Start luci... [肯定] Point your webbrowser to https://cent1.hfln.com:8084 (or equivalent) to access luci
如今能夠在前臺登陸luci 了,看清是 https 哦
帳號密碼就是這臺主機的帳號和密碼
登陸成功啦,如今來配置 rhcs 的集羣,這個只是用來管理集羣的,真正的集羣還沒開始裝呢。
5.在 cnet2, cent3, cent4 中安裝 ricci, ricci 也依賴不少軟件,這裏使用 ansible 直接在三個節點上裝, 固然我已經配好了 cent1 到 其餘節點的免密鑰登陸了
[root@cent1 ~]#ansible rhcs -m yum -a "name=ricci"
裝好ricci 以後還要在 node 節點上給 ricci 用戶設置密碼,ricci用戶就是運行 ricci進程的用戶,這個密碼一會要用,這裏就簡單粗暴了,這個密碼還能夠用 ccs命令來進行設置
[root@cent1 ~]#ansible rhcs -m shell -a "echo '123456' | passwd --stdin ricci"
啓動 ricci
[root@cent1 ~]#ansible rhcs -m service -a "name=ricci state=started enabled=yes" [root@cent2 ~]# ss-tunlp |grep ricci tcp LISTEN 0 5 :::11111 :::* users:(("ricci",3237,3))
ricci 監聽在 11111 端口,像這種操做固然也是能夠寫到 playbook 當中的
6. 如今能夠在web 界面上配置集羣了,好比建立/添加/刪除一個集羣,管理node, resource, fence device, servicegroups, Failover Domains 等等集羣的全生命週期均可以在這裏完成。
這裏演示一個關於 web服務的高可用服務
Manage Clusters--> Create 是建立一個集羣
這個界面還算簡單吧;
Create Cluster 以後,那麼就開始嘗試安裝集羣軟件了.
在任意一個node上能夠看到 ricci 的工做進程:
[root@cent2 ~]# psaux |grep ricci ricci 3453 0.1 0.4 213664 4400 ? S<s 17:18 0:00 ricci -u ricci ricci 3489 0.0 0.1 54912 1908 ? S<s 17:22 0:00 /usr/libexec/ricci/ricci-worker -f /var/lib/ricci/queue/1500004777 root 3490 0.2 0.5 48552 5136 ? S 17:22 0:00 ricci-modrpm root 3567 0.0 0.0 103252 880 pts/0 S+ 17:24 0:00 grep ricci
/var/lib/ricci/queue/目錄下存放的是 luci 發給 ricci 的任務文件,是 XML 格式的
[root@cent2 ~]#file /var/lib/ricci/queue/1500004777 /var/lib/ricci/queue/1500004777:XML document text
7. 安裝成功了
能夠點任何一個node 進去看看
若是這底下的服務沒啓動的話,能夠嘗試手動起一下,通常來講是OK的。
8.添加資源
這裏沒有 fence 設備,不關注這個,添加兩個公共資源,並添加一個服務,而後來啓動服務
Resources -->Add : 添加一個資源
添加一個虛擬IP,這裏的 mask 要寫成上面這樣,不能寫成 255.255.255.0 這種,不然會致使沒法添加IP
rgmanager Startingstopped service service:web1 rgmanager start onip "192.168.6.100/255.255.255.0" returned 1 (generic error) rgmanager #68:Failed to start service:web1; return value: 1
再添加一個script資源
9.添加 Service
這裏的資源是共公的,假如這個集羣內有多個服務,那麼均可以使用這些資源,也能夠在
Service Groups 裏添加一個私有的資源。
如今添加一個Service:
Service Groups--> Add : 添加一個 Service,
Add Resource 將剛纔創建的兩個資源添加進來;
如今在集羣的節點上用命令查看一下,集羣內的任何節點均可以
[root@cent3 ~]#clustat Cluster Status forha1 @ Sun Jan 8 17:47:40 2017 Member Status:Quorate Member Name ID Status ------ ---- ---- ------ cent2.test.com 1 Online, rgmanager cent3.test.com 2 Online, Local, rgmanager cent4.test.com 3 Online, rgmanager Service Name Owner (Last) State ------- ---- ----- ------ ----- service:web1 cent2.test.com started
在 cent2 上 ip 和httpd 服務都已經起來了
[root@cent2 ~]# ipa 1: lo:<LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0:<BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen1000 link/ether 00:0c:29:91:b3:11 brdff:ff:ff:ff:ff:ff inet 192.168.6.32/24 brd 192.168.6.255scope global eth0 inet 192.168.6.100/24 scope global secondaryeth0 inet6 fe80::20c:29ff:fe91:b311/64 scopelink valid_lft forever preferred_lft forever [root@cent2 ~]#netstat -tunlp |grep 80 tcp 0 0 :::80 :::* LISTEN 34901/httpd
10.測試故障轉移:
關於 rhcs 中 service 的健康狀態檢測, 能夠經過 /var/log/cluster/rgmanager.log 日誌來查看
Jan 08 18:56:59rgmanager [ip] Checking 192.168.6.100/24, Level 10 Jan 08 18:56:59rgmanager [ip] 192.168.6.100/24 present on eth0 Jan 08 18:56:59rgmanager [ip] Link for eth0: Detected Jan 08 18:56:59rgmanager [ip] Link detected on eth0 Jan 08 18:56:59rgmanager [ip] Local ping to 192.168.6.100 succeeded
這裏能夠看到他會嘗試查看和 ping 192.168.6.100 ,這是針對 IP 資源的檢測方式
Jan 08 18:55:49rgmanager [script] Executing /etc/rc.d/init.d/httpd status
上面是 script 資源的檢測方式則是僅僅去用腳原本執行 status 參數。
在我嘗試將/etc/init.d/httpd/ stop 後,日誌出現了以下:
Jan 08 18:56:59rgmanager [script] Executing /etc/rc.d/init.d/httpd status Jan 08 18:56:59rgmanager [script] script:http1: status of /etc/rc.d/init.d/httpd failed(returned 3) # 這裏發現檢測失敗了 Jan 08 18:56:59rgmanager status on script "http1" returned 1 (generic error) Jan 08 18:56:59rgmanager Stopping service service:web1 Jan 08 18:56:59rgmanager [script] Executing /etc/rc.d/init.d/httpd stop Jan 08 18:56:59rgmanager [ip] Removing IPv4 address 192.168.6.100/24 from eth0 # 以上幾步在這個節點中止了 web1 服務 Jan 08 18:57:09rgmanager Service service:web1 is recovering Jan 08 18:57:14rgmanager Service service:web1 is now running on member 2 # 將web1 服務在 member 2 上恢復了,member 2 也就是 cent3.test.com
查看轉移後的集羣狀態:
[root@cent3 ~]# clustat Cluster Status for ha1 @ Sun Jan 8 20:25:26 2017 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ cent2.test.com 1 Online, rgmanager cent3.test.com 2 Online, Local, rgmanager cent4.test.com 3 Online, rgmanager Service Name Owner (Last) State ------- ---- ----- ------ ----- service:web1 cent3.test.com started
若是這種 script 的資源不符合你的需求,那麼能夠嘗試 apache 資源。即便你認爲這種 script 的資源檢查方式過於簡單,也能夠在腳本里添加功能來達到你的目的。
11.嘗試關閉節點,查看 Service 轉移狀況:
在關掉 cent3 以後,service 轉移到了 cent4上
[root@cent2 ~]#clustat Cluster Status forha1 @ Sun Jan 8 20:35:42 2017 Member Status:Quorate Member Name ID Status ------ ---- ---- ------ cent2.test.com 1 Online, Local, rgmanager cent3.test.com 2 Offline cent4.test.com 3 Online, rgmanager Service Name Owner (Last) State ------- ---- ----- ------ ----- service:web1 cent4.test.com started
接着關掉了 cent4,Service 又轉移到了 cent2
[root@cent2 ~]#clustat Cluster Status forha1 @ Sun Jan 8 20:36:27 2017 Member Status:Quorate Member Name ID Status ------ ---- ---- ------ cent2.test.com 1 Online, Local, rgmanager cent3.test.com 2 Offline cent4.test.com 3 Online Service Name Owner (Last) State ------- ---- ----- ------ ----- service:web1 cent2.test.com started
這裏的 cent4.test.com 仍然顯示 Online 是由於正在關機當中,還沒有真正關閉。
過了幾秒,彈出瞭如下提示信息:
[root@cent2 ~]# Message fromsyslogd@cent2 at Jan 8 20:36:42 ... rgmanager[5685]: #1: Quorum Dissolved
日誌裏顯示:
Jan 08 20:35:01rgmanager Member 2 shutting down Jan 08 20:36:18rgmanager Member 3 shutting down Jan 08 20:36:18rgmanager Starting stopped service service:web1 Jan 08 20:36:18rgmanager [ip] Link for eth0: Detected Jan 08 20:36:19rgmanager [ip] Adding IPv4 address 192.168.6.100/24 to eth0 Jan 08 20:36:19rgmanager [ip] Pinging addr 192.168.6.100 from dev eth0 Jan 08 20:36:21rgmanager [ip] Sending gratuitous ARP: 192.168.6.100 00:0c:29:91:b3:11 brdff:ff:ff:ff:ff:ff Jan 08 20:36:22rgmanager [script] Executing /etc/rc.d/init.d/httpd start Jan 08 20:36:22rgmanager Service service:web1 started Jan 08 20:36:42rgmanager #1: Quorum Dissolved Message fromsyslogd@cent2 at Jan 8 20:36:42 ... rgmanager[5685]: #1: Quorum Dissolved Jan 08 20:36:42rgmanager [script] Executing /etc/rc.d/init.d/httpd stop Jan 08 20:36:42rgmanager [ip] Removing IPv4 address 192.168.6.100/24 from eth0
服務中止了,這是由於 法定票數不足的緣由
[root@cent2 ~]#clustat Service statesunavailable: Operation requires quorum Cluster Status forha1 @ Sun Jan 8 20:37:00 2017 Member Status:Inquorate Member Name ID Status ------ ---- ---- ------ cent2.test.com 1 Online, Local cent3.test.com 2 Offline cent4.test.com 3 Offline