環境:3臺虛擬機 RHEL 7.3 + Oracle RAC 11.2.0.4
問題現象:RAC運行正常,ASM磁盤組Normal冗餘,有failgroup總體故障,有failgroup配置錯誤。
舒適提示:本文並非市場上任何一款商業的一體機產品,只是我爲了學習這類分佈式存儲架構本身模擬的實驗環境,爲了區分我暫時稱之爲xData吧^_^。架構
SQL> select group_number, name, total_mb, free_mb, USABLE_FILE_MB, offline_disks, state, type from v$asm_diskgroup; GROUP_NUMBER NAME TOTAL_MB FREE_MB USABLE_FILE_MB OFFLINE_DISKS STATE TYPE ------------ ------------------------------ ---------- ---------- -------------- ------------- ---------------------- ---------- 1 CRS 2000 1170 585 0 MOUNTED NORMAL 2 DATA 40960 35652 7586 0 MOUNTED NORMAL SQL> select group_number, disk_number, name, path, failgroup, mode_status, voting_file from v$asm_disk order by 1, 2; GROUP_NUMBER DISK_NUMBER NAME PATH FAILGROUP MODE_STATUS VO ------------ ----------- ------------------------------ ----------------------- -------------------- -------------- -- 0 0 /dev/CELL01-data2 ONLINE N 0 1 /dev/CELL01-data1 ONLINE N 0 2 /dev/CELL01-crs1 ONLINE Y 1 1 CRS_0001 /dev/CELL02-crs2 CRS_0001 ONLINE Y 1 2 CRS_0002 /dev/CELL03-crs3 CRS_0002 ONLINE Y 2 0 DATA_0000 /dev/CELL03-data1 DATA_0000 ONLINE N 2 1 DATA_0001 /dev/CELL03-data2 DATA_0001 ONLINE N 2 2 DATA_0002 /dev/CELL02-data1 CELL02 ONLINE N 2 3 DATA_0003 /dev/CELL02-data2 CELL02 ONLINE N 9 rows selected.
能夠看到不但CELL01節點的全部盤被刪除,並且CELL03節點的數據盤,failgroup目前也配置不正確!
分佈式
alter diskgroup CRS add disk '/dev/CELL01-crs1'; alter diskgroup DATA ADD FAILGROUP CELL01 disk '/dev/CELL01-data1', '/dev/CELL01-data2' rebalance power 5;
直接這樣加盤極可能會遇到下面這類錯誤,由於這些盤以前是被使用過的:學習
SQL> alter diskgroup CRS add disk '/dev/CELL01-crs1'; alter diskgroup CRS add disk '/dev/CELL01-crs1' * ERROR at line 1: ORA-15032: not all alterations performed ORA-15033: disk '/dev/CELL01-crs1' belongs to diskgroup "CRS"
這個問題能夠經過dd盤頭,也能夠加盤嘗試加force參數來解決,我這裏選擇dd盤頭的方式:code
[root@db01 ~]# dd if=/dev/zero of=/dev/CELL01-crs1 bs=8k count=1000 1000+0 records in 1000+0 records out 8192000 bytes (8.2 MB) copied, 0.0691801 s, 118 MB/s
dd盤頭以後再次嘗試添加就能夠順利完成:orm
SQL> alter diskgroup CRS add disk '/dev/CELL01-crs1'; Diskgroup altered.
一樣的,將CELL01的數據盤也從新加入到DATA磁盤組中,failgroup名稱爲CELL01:虛擬機
SQL> alter diskgroup DATA ADD FAILGROUP CELL01 disk '/dev/CELL01-data1', '/dev/CELL01-data2' rebalance power 5; Diskgroup altered.
經過v$asm_operation視圖能夠查看磁盤從新平衡的進度,直到下面的查詢再也不返回結果說明重平衡完成:產品
SQL> select * from v$asm_operation; GROUP_NUMBER OPERATION STATE POWER ACTUAL SOFAR EST_WORK EST_RATE EST_MINUTES ERROR_CODE ------------ ---------- -------- ---------- ---------- ---------- ---------- ---------- ----------- -------------------- 2 REBAL RUN 5 5 366 529 348 0 SQL> select * from v$asm_operation; no rows selected
CELL03的數據盤,failgroup目前配置還不正確。io
SQL> alter diskgroup DATA drop disk DATA_0000, DATA_0001; Diskgroup altered.
查詢v$asm_operation視圖能夠查看磁盤從新平衡的進度,完成後再從新加回磁盤組,指定確切的failgroup(CELL03):asm
SQL> alter diskgroup DATA ADD FAILGROUP CELL03 disk '/dev/CELL03-data1', '/dev/CELL03-data2' rebalance power 5; Diskgroup altered.
再次關注重平衡進度,最後查詢一切正常,結果以下:form
SQL> col path for a50 SQL> select group_number, disk_number, name, path, failgroup, mode_status, voting_file from v$asm_disk order by 1, 2; GROUP_NUMBER DISK_NUMBER NAME PATH FAILGROUP MODE_STATUS VO ------------ ----------- ------------------------------ ----------------------- -------------------- -------------- -- 1 0 CRS_0000 /dev/CELL01-crs1 CRS_0000 ONLINE Y 1 1 CRS_0001 /dev/CELL02-crs2 CRS_0001 ONLINE Y 1 2 CRS_0002 /dev/CELL03-crs3 CRS_0002 ONLINE Y 2 0 DATA_0000 /dev/CELL03-data1 CELL03 ONLINE N 2 1 DATA_0001 /dev/CELL03-data2 CELL03 ONLINE N 2 2 DATA_0002 /dev/CELL02-data1 CELL02 ONLINE N 2 3 DATA_0003 /dev/CELL02-data2 CELL02 ONLINE N 2 4 DATA_0004 /dev/CELL01-data1 CELL01 ONLINE N 2 5 DATA_0005 /dev/CELL01-data2 CELL01 ONLINE N 9 rows selected. SQL> select group_number, name, total_mb, free_mb, USABLE_FILE_MB, offline_disks, state, type from v$asm_diskgroup; GROUP_NUMBER NAME TOTAL_MB FREE_MB USABLE_FILE_MB OFFLINE_DISKS STATE TYPE ------------ ------------------------------ ---------- ---------- -------------- ------------- ---------------------- ---------- 1 CRS 3000 2033 516 0 MOUNTED NORMAL 2 DATA 61440 56012 17766 0 MOUNTED NORMAL
說明:通常我會將磁盤組的兼容性參數設置爲11.2,若有特殊需求,還能夠設置disk_repair_time(默認3.6h)。
SQL> col COMPATIBILITY for a30 SQL> col DATABASE_COMPATIBILITY for a30 SQL> select NAME, COMPATIBILITY, DATABASE_COMPATIBILITY from v$asm_diskgroup; NAME COMPATIBILITY DATABASE_COMPATIBILITY ------------------------------ ------------------------------ ------------------------------ CRS 11.2.0.0.0 11.2.0.0.0 DATA 11.2.0.0.0 11.2.0.0.0 --設置DATA磁盤組disk_repair_time(可理解爲磁盤離線刪除的時間)屬性爲4.5h SQL> ALTER DISKGROUP data SET ATTRIBUTE 'disk_repair_time' = '4.5h'; Diskgroup altered.