在RAC環境下配置OGG,要想實現RAC節點故障時,OGG能自動的failover到正常節點,要保證兩點:html
1. OGG的checkpoint,trail,BR文件放置在共享的集羣文件系統上,RAC各節點都能訪問到node
2. 須要有集羣軟件的來監測OGG進程,以及發生故障時,自動在正常節點重啓OGG(failover)數據庫
Oracle Grid Infrastructure Standalone Agents (XAG)搭配Oracle支持的集羣文件系統,能夠實現OGG的自動failover,本文介紹相關的配置步驟。oracle
要想使用XAG實現自動failover,相關軟件的版本必須知足要求:app
至於集羣文件系統,Oracle官方文檔給出的建議是ACFS,DBFS和OCFS,我以爲其餘集羣文件系統,好比Veritas 的集羣文件系統應該也能夠。工具
本文示例使用的是ACFS。測試
源端數據庫:11.2.0.4 RAC (ASM)操作系統
目標端數據庫:12.1.0.2 RAC(ASM).net
GoldenGate : 12.2.0.1.1命令行
操做系統:源端和目標端都是Oracle Enterprise Linux 6.5 (64bit)
XAG須要單獨去Oracle官網下載安裝 ,下載位置是:http://www.oracle.com/technetwork/database/database-technologies/clusterware/downloads/index.html
目前的版本是7,文件是xagpack_7b.zip
解壓縮文件,而後用GI的安裝用戶(通常是「grid」),執行xagsetup.sh進行安裝:
[grid@rac1 xag]$ ./xagsetup.sh --install --directory /u01/app/grid/xaghome --all_nodes
Installing Oracle Grid Infrastructure Agents on: rac1
Installing Oracle Grid Infrastructure Agents on: rac2
Done.
在目標端也安裝XAG,方法和源端相同。
11.2.0.4 在OEL上若是想用ACFS,必須安裝PSU補丁到11.2.0.4.4以上。補丁過程略過。
使用ACFS的磁盤組的屬性值COMPATIBLE.ASM和COMPATIBLE.ADVM必須設置爲11.2 :
使用ASMCMD或ASMCA建立ACFS卷:
建立通用ACFS
此時ACFS還不是CRS管理的,可使用ASMCMD的volinfo命令或/sbin/acfsutil registry查看ACFS信息
ASMCMD> volinfo -a
Diskgroup Name: DATA
Volume Name: VOLOGG1
Volume Device: /dev/asm/vologg1-426
State: ENABLED
Size (MB): 3072
Resize Unit (MB): 32
Redundancy: UNPROT
Stripe Columns: 4
Stripe Width (K): 128
Usage: ACFS
Mountpath: /u01/app/grid/acfsmounts/data_vol1
[root@rac1 ~]# /sbin/acfsutil registry
Mount Object:
Device: /dev/asm/vologg1-426
Mount Point: /u01/app/grid/acfsmounts/data_vol1
Disk Group: DATA
Volume: VOLOGG1
Options: none
Nodes: all
首先從通用ACFS的註冊信息中刪除咱們剛纔建立的ACFS的條目
[root@rac1 ~]# /sbin/acfsutil registry -d /u01/app/grid/acfsmounts/data_vol1
acfsutil registry: successfully removed ACFS mount point /u01/app/grid/acfsmounts/data_vol1 from Oracle Registry
而後,用SRVCTL工具進行CRS資源註冊:
[root@rac1 ~]# /u01/app/11.2.0/grid/bin/srvctl add filesystem -d /dev/asm/vologg1-426 -v VOLOGG1 -g DATA -m /u01/app/grid/acfsmounts/data_vol1 -u grid
[root@rac1 ~]# /u01/app/11.2.0/grid/bin/crsctl status resource -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.LISTENER.lsnr
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.asm
ONLINE ONLINE rac1 Started
ONLINE ONLINE rac2 Started
ora.data.vologg1.acfs
OFFLINE OFFLINE rac1
OFFLINE OFFLINE rac2
ora.gsd
OFFLINE OFFLINE rac1
OFFLINE OFFLINE rac2
ora.net1.network
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.ons
ONLINE ONLINE rac1
ONLINE ONLINE rac2
--------------------------------------------------------------------------------
手工啓動資源,(mount ACFS)
[root@rac1 ~]# /u01/app/11.2.0/grid/bin/srvctl start filesystem -d /dev/asm/vologg1-426
[root@rac1 ~]#
[root@rac1 ~]# /u01/app/11.2.0/grid/bin/crsctl status resource -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.LISTENER.lsnr
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.asm
ONLINE ONLINE rac1 Started
ONLINE ONLINE rac2 Started
ora.data.vologg1.acfs
ONLINE ONLINE rac1 mounted on /u01/app /grid/acfsmounts/dat a_vol1
ONLINE ONLINE rac2 mounted on /u01/app/grid/acfsmounts/dat a_vol1
[root@rac1 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_rac1-lv_root 45G 32G 12G 74% /
tmpfs 2.0G 437M 1.6G 23% /dev/shm
/dev/sda1 477M 55M 397M 13% /boot
/dev/asm/vologg1-426 3.0G 83M 3.0G 3% /u01/app/grid/acfsmounts/data_vol1
[root@rac2 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_rac1-lv_root 45G 25G 19G 58% /
tmpfs 2.0G 440M 1.6G 23% /dev/shm
/dev/sda1 477M 55M 397M 13% /boot
/dev/asm/vologg1-426 3.0G 83M 3.0G 3% /u01/app/grid/acfsmounts/data_vol1
12c建立ACFS和11g的主要區別是,沒有了通用和數據庫home用文件系統的選項,建立後會生成註冊文件系統到CRS的腳本。
運行系統生成的腳本,完成註冊及掛載:
[root@oel65vm11 scripts]# ./acfs_script.sh
ACFS file system /u01/app/grid/acfsmounts/ogg_vol1 is mounted on nodes oel65vm11,oel65vm12
查看資源信息:
[root@oel65vm11 bin]# ./crsctl status resource -t
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.VOLOGG2.advm
ONLINE ONLINE oel65vm11 STABLE
ONLINE ONLINE oel65vm12 STABLE
ora.DATA.dg
ONLINE ONLINE oel65vm11 STABLE
ONLINE ONLINE oel65vm12 STABLE
ora.LISTENER.lsnr
ONLINE ONLINE oel65vm11 STABLE
ONLINE ONLINE oel65vm12 STABLE
ora.asm
ONLINE ONLINE oel65vm11 Started,STABLE
ONLINE ONLINE oel65vm12 Started,STABLE
ora.data.vologg2.acfs
ONLINE ONLINE oel65vm11 mounted on /u01/app/grid/acfsmounts/ogg_vol1,STABLE
ONLINE ONLINE oel65vm12 mounted on /u01/app/grid/acfsmounts/ogg_vol1,STABLE
ora.net1.network
ONLINE ONLINE oel65vm11 STABLE
ONLINE ONLINE oel65vm12 STABLE
ora.ons
ONLINE ONLINE oel65vm11 STABLE
ONLINE ONLINE oel65vm12 STABLE
注意,全部節點必須關掉SELINUX,不然會出現ACFS無權寫入的錯誤。
這個版本的ogg同時支持11g和12c的數據庫,在圖形界面安裝時,用戶能夠選擇對應不一樣數據庫版本的ogg
將OGG安裝在前面建立的ACFS上:
源端的安裝位置:/u01/app/grid/acfsmounts/data_vol1/ogg12
目標端的安裝位置:/u01/app/grid/acfsmounts/ogg_vol1/ogg12
選擇自動啓動Manager進程。
l 變動源端數據庫爲歸檔模式,過程省略。
l 源端數據庫增長相關日誌及修改參數:
SQL> ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;
Database altered.
SQL> ALTER DATABASE FORCE LOGGING;
Database altered.
SQL> SELECT supplemental_log_data_min, force_logging FROM v$database;
SUPPLEME FORCE_LOGGING
-------- ---------------------------------------
YES YES
SQL> ALTER SYSTEM SWITCH LOGFILE;
System altered.
SQL> alter system set ENABLE_GOLDENGATE_REPLICATION=true;
System altered.
l 在源端和目標端建立OGG數據庫用戶及受權,個人例子裏建立的用戶是GGADM。
OGG用戶須要的權限能夠參閱聯機文檔《Installing and Configuring Oracle GoldenGate for Oracle Database 12c (12.2.0.1)》中的章節 4.1.4.1 Oracle 11.2.0.4 or Later Database Privileges,咱們這個測試爲了方便,授予用戶DBA角色,以及使用特定系統包的受權:
SQL> BEGIN
dbms_goldengate_auth.grant_admin_privilege
2 3 (
grantee => 'GGADM',
privilege_type => 'CAPTURE',
grant_select_privileges => TRUE
);
END;
/ 4 5 6 7 8 9
PL/SQL procedure successfully completed.
l 登陸數據庫:
GGSCI (rac1.hthorizontest.com) 1> dblogin userid ggadm password ggadm
Successfully logged into database.
l 註冊集成式抽取
GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 2> register extract ext1 database;
2016-04-07 23:44:38 INFO OGG-02003 Extract EXT1 successfully registered with database at SCN 1291634.
l 增長抽取進程
GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 3> ADD EXTRACT ext1 INTEGRATED TRANLOG, BEGIN NOW
EXTRACT (Integrated) added.
GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 4> ADD EXTTRAIL /u01/app/grid/acfsmounts/data_vol1/ogg12/dirdat/et, EXTRACT ext1
EXTTRAIL added.
l 增長傳送進程
GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 5> ADD EXTRACT pump1 EXTTRAILSOURCE /u01/app/grid/acfsmounts/data_vol1/ogg12/dirdat/et
EXTRACT added.
GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 6>EDIT PARAMS EXT1
加入下面內容:
EXTRACT ext1
USERID ggadm, PASSWORD ggadm
TRANLOGOPTIONS INTEGRATED PARAMS (MAX_SGA_SIZE 100)
EXTTRAIL /u01/app/grid/acfsmounts/data_vol1/ogg12/dirdat/et
TABLE test.*;
GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 7>EDIT PARAMS PUMP1
加入下面內容:
EXTRACT pump1
USERID ggadm, PASSWORD ggadm
RMTHOST 192.168.0.11, MGRPORT 7809
RMTTRAIL /u01/app/grid/acfsmounts/ogg_vol1/ogg12/dirdat/rt
TABLE TEST.*;
而後啓動全部進程。
在11.2.0.4版本,若是實現集成的capture模式,在啓動抽取進程時,會提示須要安裝補丁17030189,主要是由於使用集成的capture,須要修改數據字典表。
可是在安裝了PSU後,有時會致使這個補丁和其餘補丁衝突,也能夠手工執行prvtlmpg.plb來解決問題。
(EXTRACT Abending With OGG-02912 (Doc ID 2091679.1))
GGSCI (oel65vm11.hthorizon.com) 8> dblogin userid ggadm password ggadm
Successfully logged into database.
GGSCI (oel65vm11.hthorizon.com as ggadm@racdb1) 9>ADD CHECKPOINTTABLE ggadm.checkpointtab
Successfully created checkpoint table ggadm.checkpointtab
GGSCI (oel65vm11.hthorizon.com as ggadm@racdb1) 10> ADD REPLICAT rep1, EXTTRAIL /u01/app/grid/acfsmounts/ogg_vol1/ogg12/dirdat/rt checkpointtable ggadm.checkpointtab
REPLICAT added.
GGSCI (oel65vm11.hthorizon.com as ggadm@racdb1) 11>EDIT PARAMS REP1
加入下面內容:
REPLICAT rep1
USERID ggadm, PASSWORD ggadm
ASSUMETARGETDEFS
DISCARDFILE /u01/app/grid/acfsmounts/ogg_vol1/ogg12/dirdat/rt, PURGE
MAP TEST.* TARGET TEST.*;
而後啓動進程,測試OGG數據複製是否正常
爲了讓OGG的Manager進程可以自動啓動複製進程,須要將下列配置加進Manager的配置文件:
AUTORESTART ER *, RETRIES 5, WAITMINUTES 1, RESETMINUTES 60
AUTOSTART ER *
重啓Manager進程使之生效。
源端和目標端都要修改。
l 添加APP VIP(以root身份)
[root@rac1 ~]# /u01/app/11.2.0/grid/bin/appvipcfg create -network=1 -ip=192.168.0.36 -vipname=xag.gg_1-vip.vip -user=oracle
l 容許grid用戶啓動資源(以root身份)
[root@rac1 ~]# /u01/app/11.2.0/grid/bin/crsctl setperm resource xag.gg_1-vip.vip -u user:grid:r-x
l 啓動VIP(以grid身份)
[root@rac1 ~]# su - grid
[grid@rac1 ~]$ /u01/app/11.2.0/grid/bin/crsctl start resource xag.gg_1-vip.vip
CRS-2672: Attempting to start 'xag.gg_1-vip.vip' on 'rac1'
CRS-2676: Start of 'xag.gg_1-vip.vip' on 'rac1' succeeded
l 查看狀態
[grid@rac1 ~]$ crsctl status resource xag.gg_1-vip.vip
NAME=xag.gg_1-vip.vip
TYPE=app.appvip_net1.type
TARGET=ONLINE
STATE=ONLINE on rac1
l 建立OGG對應的CRS資源(以root身份)
[root@rac1 bin]# /u01/app/grid/xaghome/bin/agctl add goldengate gg_1 --gg_home /u01/app/grid/acfsmounts/data_vol1/ogg12 --instance_type source --nodes rac1,rac2 --vip_name xag.gg_1-vip.vip --filesystems ora.data.vologg1.acfs --databases ora.tdb.db --oracle_home /u01/app/oracle/product/11.2.0/dbhome_1 --monitor_extracts ext1,pump1
[root@rac1 ~]# cd /u01/app/grid/xaghome/bin
[root@rac1 bin]# ./agctl status goldengate gg_1
Goldengate instance 'gg_1' is not running
l 受權grid啓動資源
上面的命令執行完畢,會自動建立一個對應ogg的CRS資源,須要受權grid有權管理它:
[root@oel65vm11 bin]# /u01/app/11.2.0/grid/bin/crsctl setperm resource xag.gg_1.goldengate -u user:grid:r-x
過程和源端相似,
l 建立VIP資源:
[root@oel65vm11 ~]# /u01/app/12.1.0/grid/bin/appvipcfg create -network=1 -ip=192.168.0.26 -vipname=xag.gg_1-vip.vip -user=oracle
[root@oel65vm11 ~]# /u01/app/12.1.0/grid/bin/crsctl setperm resource xag.gg_1-vip.vip -u user:grid:r-x
[root@oel65vm11 ~]# /u01/app/12.1.0/grid/bin/crsctl start resource xag.gg_1-vip.vip
CRS-2672: Attempting to start 'xag.gg_1-vip.vip' on 'oel65vm12'
CRS-2676: Start of 'xag.gg_1-vip.vip' on 'oel65vm12' succeeded
[root@oel65vm11 ~]# /u01/app/12.1.0/grid/bin/crsctl relocate resource xag.gg_1-vip.vip -n oel65vm11
CRS-2673: Attempting to stop 'xag.gg_1-vip.vip' on 'oel65vm12'
CRS-2677: Stop of 'xag.gg_1-vip.vip' on 'oel65vm12' succeeded
CRS-2672: Attempting to start 'xag.gg_1-vip.vip' on 'oel65vm11'
CRS-2676: Start of 'xag.gg_1-vip.vip' on 'oel65vm11' succeeded
l 建立ogg 對應的CRS資源
[root@oel65vm11 bin]# /u01/app/grid/xaghome/bin/agctl add goldengate gg_2 --gg_home /u01/app/grid/acfsmounts/ogg_vol1/ogg12 --instance_type target --nodes oel65vm11,oel65vm12 --vip_name xag.gg_1-vip.vip --filesystems ora.data.vologg2.acfs --databases ora.racdb.db --oracle_home /u01/app/oracle/product/12.1.0/dbhome_1 --monitor_replicats rep1
l 受權
[root@oel65vm11 bin]# /u01/app/12.1.0/grid/bin/crsctl setperm resource xag.gg_2.goldengate -u user:grid:r-x
將PUMP進程對應的源端地址修改成咱們剛纔建立的VIP
RMTHOST 192.168.0.26, MGRPORT 7809
重啓PUMP進程
進入ggsci命令行,將源端和目標段進程都停掉
l 啓動目標端資源
[grid@oel65vm11 ~]$ cd $ORACLE_BASE
[grid@oel65vm11 grid]$ cd xaghome/bin
[grid@oel65vm11 bin]$ ./agctl start goldengate gg_2 --node oel65vm11
[grid@oel65vm11 bin]$ crsctl status resource xag.gg_2.goldengate
NAME=xag.gg_2.goldengate
TYPE=xag.goldengate.type
TARGET=ONLINE
STATE=ONLINE on oel65vm11
l 啓動源端資源
[grid@rac1 bin]$ cd $ORACLE_BASE
[grid@rac1 grid]$ cd xaghome/bin
[grid@rac1 bin]$ ./agctl start goldengate gg_1 --node rac1
[grid@rac1 bin]$ crsctl status resource xag.gg_1.goldengate
NAME=xag.gg_1.goldengate
TYPE=xag.goldengate.type
TARGET=ONLINE
STATE=ONLINE on rac1
啓動後,進入GGSCI命令行,查看進程狀態,若是進程都自動啓動了,說明配置沒有問題。
使用命令測試源端切換:
[grid@rac1 bin]$ ./agctl relocate goldengate gg_1 --node rac2
[grid@rac1 bin]$ crsctl status resource –t
。。。。。。
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
。。。。。。
xag.gg_1-vip.vip
1 ONLINE ONLINE rac2
xag.gg_1.goldengate
1 ONLINE ONLINE rac2
再作一個切斷電源的測試,咱們以「關掉電源」的方式關閉目標端的主機oel65vm11
在主機oel65vm12上,能夠看到RAC的vip failover到了本節點,ogg的vip和gg_2對應的資源也自動failover到了本節點:
[grid@oel65vm12 ~]$ crsctl status resource -t
。。。。。。
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
。。。。。。
ora.oel65vm11.vip
1 ONLINE INTERMEDIATE oel65vm12 FAILED OVER,STABLE
ora.oel65vm12.vip
1 ONLINE ONLINE oel65vm12 STABLE
ora.racdb.db
1 ONLINE OFFLINE STABLE
2 ONLINE ONLINE oel65vm12 Open,STABLE
ora.scan1.vip
1 ONLINE ONLINE oel65vm12 STABLE
xag.gg_1-vip.vip
1 ONLINE ONLINE oel65vm12 STABLE
xag.gg_2.goldengate
1 ONLINE ONLINE oel65vm12 STABLE
上面只是一個最簡單的例子,沒有考慮各類複雜的狀況,例如,同時部署有監控jagent,或者downstream複製等等,因此現實的生產環境每每比這個例子複雜得多。