測試一體機ASM failgroup的相關問題處理

環境:3臺虛擬機 RHEL 7.3 + Oracle RAC 11.2.0.4
問題現象:RAC運行正常,ASM磁盤組Normal冗餘,有failgroup總體故障,有failgroup配置錯誤。
舒適提示:本文並非市場上任何一款商業的一體機產品,只是我爲了學習這類分佈式存儲架構本身模擬的實驗環境,爲了區分我暫時稱之爲xData吧^_^。架構

1.問題現象確認

SQL> select group_number, name, total_mb, free_mb, USABLE_FILE_MB, offline_disks, state, type from v$asm_diskgroup;

GROUP_NUMBER NAME                             TOTAL_MB    FREE_MB USABLE_FILE_MB OFFLINE_DISKS STATE                  TYPE
------------ ------------------------------ ---------- ---------- -------------- ------------- ---------------------- ----------
           1 CRS                                  2000       1170            585             0 MOUNTED                NORMAL
           2 DATA                                40960      35652           7586             0 MOUNTED                NORMAL

SQL>  select group_number, disk_number, name, path, failgroup, mode_status, voting_file  from v$asm_disk order by 1, 2;

GROUP_NUMBER DISK_NUMBER NAME                           PATH                    FAILGROUP            MODE_STATUS    VO
------------ ----------- ------------------------------ ----------------------- -------------------- -------------- --
           0           0                                /dev/CELL01-data2                            ONLINE         N
           0           1                                /dev/CELL01-data1                            ONLINE         N
           0           2                                /dev/CELL01-crs1                             ONLINE         Y
           1           1 CRS_0001                       /dev/CELL02-crs2        CRS_0001             ONLINE         Y
           1           2 CRS_0002                       /dev/CELL03-crs3        CRS_0002             ONLINE         Y
           2           0 DATA_0000                      /dev/CELL03-data1       DATA_0000            ONLINE         N
           2           1 DATA_0001                      /dev/CELL03-data2       DATA_0001            ONLINE         N
           2           2 DATA_0002                      /dev/CELL02-data1       CELL02               ONLINE         N
           2           3 DATA_0003                      /dev/CELL02-data2       CELL02               ONLINE         N

9 rows selected.

能夠看到不但CELL01節點的全部盤被刪除,並且CELL03節點的數據盤,failgroup目前也配置不正確!
分佈式

2.從新加入CELL01的盤

因爲時間超過默認的3.6h,offline的盤已經被刪除,只有從新加入CELL01的盤才能夠。

alter diskgroup CRS add disk '/dev/CELL01-crs1';
alter diskgroup DATA ADD FAILGROUP CELL01 disk '/dev/CELL01-data1', '/dev/CELL01-data2' rebalance power 5;

直接這樣加盤極可能會遇到下面這類錯誤,由於這些盤以前是被使用過的:學習

SQL> alter diskgroup CRS add disk '/dev/CELL01-crs1';
alter diskgroup CRS add disk '/dev/CELL01-crs1'
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15033: disk '/dev/CELL01-crs1' belongs to diskgroup "CRS"

這個問題能夠經過dd盤頭,也能夠加盤嘗試加force參數來解決,我這裏選擇dd盤頭的方式:code

[root@db01 ~]# dd if=/dev/zero of=/dev/CELL01-crs1 bs=8k count=1000
1000+0 records in
1000+0 records out
8192000 bytes (8.2 MB) copied, 0.0691801 s, 118 MB/s

dd盤頭以後再次嘗試添加就能夠順利完成:orm

SQL> alter diskgroup CRS add disk '/dev/CELL01-crs1';

Diskgroup altered.

一樣的,將CELL01的數據盤也從新加入到DATA磁盤組中,failgroup名稱爲CELL01:虛擬機

SQL> alter diskgroup DATA ADD FAILGROUP CELL01 disk '/dev/CELL01-data1', '/dev/CELL01-data2' rebalance power 5;

Diskgroup altered.

經過v$asm_operation視圖能夠查看磁盤從新平衡的進度,直到下面的查詢再也不返回結果說明重平衡完成:產品

SQL> select * from v$asm_operation;

GROUP_NUMBER OPERATION  STATE         POWER     ACTUAL      SOFAR   EST_WORK   EST_RATE EST_MINUTES ERROR_CODE
------------ ---------- -------- ---------- ---------- ---------- ---------- ---------- ----------- --------------------
           2 REBAL      RUN               5          5        366        529        348           0
SQL> select * from v$asm_operation;

no rows selected

3.修改failgroup的配置

CELL03的數據盤,failgroup目前配置還不正確。io

SQL> alter diskgroup DATA drop disk DATA_0000, DATA_0001;

Diskgroup altered.

查詢v$asm_operation視圖能夠查看磁盤從新平衡的進度,完成後再從新加回磁盤組,指定確切的failgroup(CELL03):asm

SQL> alter diskgroup DATA ADD FAILGROUP CELL03 disk '/dev/CELL03-data1', '/dev/CELL03-data2' rebalance power 5;

Diskgroup altered.

再次關注重平衡進度,最後查詢一切正常,結果以下:form

SQL> col path for a50
SQL> select group_number, disk_number, name, path, failgroup, mode_status, voting_file  from v$asm_disk order by 1, 2;

GROUP_NUMBER DISK_NUMBER NAME                           PATH                    FAILGROUP            MODE_STATUS    VO
------------ ----------- ------------------------------ ----------------------- -------------------- -------------- --
           1           0 CRS_0000                       /dev/CELL01-crs1        CRS_0000             ONLINE         Y
           1           1 CRS_0001                       /dev/CELL02-crs2        CRS_0001             ONLINE         Y
           1           2 CRS_0002                       /dev/CELL03-crs3        CRS_0002             ONLINE         Y
           2           0 DATA_0000                      /dev/CELL03-data1       CELL03               ONLINE         N
           2           1 DATA_0001                      /dev/CELL03-data2       CELL03               ONLINE         N
           2           2 DATA_0002                      /dev/CELL02-data1       CELL02               ONLINE         N
           2           3 DATA_0003                      /dev/CELL02-data2       CELL02               ONLINE         N
           2           4 DATA_0004                      /dev/CELL01-data1       CELL01               ONLINE         N
           2           5 DATA_0005                      /dev/CELL01-data2       CELL01               ONLINE         N

9 rows selected.

SQL> select group_number, name, total_mb, free_mb, USABLE_FILE_MB, offline_disks, state, type from v$asm_diskgroup;

GROUP_NUMBER NAME                             TOTAL_MB    FREE_MB USABLE_FILE_MB OFFLINE_DISKS STATE                  TYPE
------------ ------------------------------ ---------- ---------- -------------- ------------- ---------------------- ----------
           1 CRS                                  3000       2033            516             0 MOUNTED                NORMAL
           2 DATA                                61440      56012          17766             0 MOUNTED                NORMAL

說明:通常我會將磁盤組的兼容性參數設置爲11.2,若有特殊需求,還能夠設置disk_repair_time(默認3.6h)。

SQL> col COMPATIBILITY for a30
SQL> col DATABASE_COMPATIBILITY for a30
SQL> select NAME, COMPATIBILITY, DATABASE_COMPATIBILITY from v$asm_diskgroup;

NAME                           COMPATIBILITY                  DATABASE_COMPATIBILITY
------------------------------ ------------------------------ ------------------------------
CRS                            11.2.0.0.0                     11.2.0.0.0
DATA                           11.2.0.0.0                     11.2.0.0.0

--設置DATA磁盤組disk_repair_time(可理解爲磁盤離線刪除的時間)屬性爲4.5h
SQL> ALTER DISKGROUP data SET ATTRIBUTE 'disk_repair_time' = '4.5h';
Diskgroup altered.
相關文章
相關標籤/搜索