故障環境:
AIX5.3 ORACLE10.2.0.3 RAC HA
故障現象:
一、啟動RAC單節點異常
$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....0A.lsnr application ONLINE OFFLINE
ora.p670a.gsd application ONLINE OFFLINE
ora.p670a.ons application ONLINE OFFLINE
ora.p670a.vip application ONLINE ONLINE p670b
ora....0B.lsnr application ONLINE ONLINE p670b
ora.p670b.gsd application ONLINE ONLINE p670b
ora.p670b.ons application ONLINE ONLINE p670b
ora.p670b.vip application ONLINE ONLINE p670b
ora.zhjport.db application ONLINE OFFLINE
ora....port.cs application ONLINE OFFLINE
ora....rt1.srv application ONLINE OFFLINE
ora....rt2.srv application ONLINE OFFLINE
ora....t1.inst application ONLINE OFFLINE
ora....t2.inst application ONLINE OFFLINE
二、觀察實例的ALERT日志,發現存在ORA-1113錯誤
ALTER DATABASE MOUNT
Wed Dec 6 00:29:55 2010
This instance was first to mount
Setting recovery target incarnation to 2
Wed Dec 6 00:29:59 2010
Successful mount of redo thread 2, with mount id 3375273859
Wed Dec 6 00:29:59 2010
Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE)
Completed: ALTER DATABASE MOUNT
Wed Dec 6 00:29:59 2010
ALTER DATABASE OPEN
This instance was first to open
ORA-1113 signalled during: ALTER DATABASE OPEN...
Wed Dec 6 00:30:00 2010
Shutting down instance (abort)
License high water mark = 1
Instance terminated by USER, pid = 1482880
根據ORA-1113初步判斷是OPEN過程中發現,數據文件錯誤
三、為了精確定位,通過SQLPLUS命令行OPEN庫
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01113: file 73 needs media recovery
ORA-01110: data file 73: '/home/oracle/database/DEV_DATA'
很顯然,數據文件DEV_DATA存在問題
處理過程:
1、確認/home/oracle/database下物理文件存在;
2、確認權限 無問題;
3、確認屬主無問題;
4、無奈只有OFFLINE該數據文件;
SQL> alter database datafile '/home/oracle/database/DEV_DATA' offline drop;
Database altered.
SQL> alter database open;
Database altered.
SQL>
四、確認
1、通過$ crs_stat -t確認狀態正常;
$ crs_stat -t
Name Type Target State Host
---- --------------------------------------------------------
ora....0A.lsnr application ONLINE OFFLINE
ora.p670a.gsd application ONLINE OFFLINE
ora.p670a.ons application ONLINE OFFLINE
ora.p670a.vip application ONLINE ONLINE p670b
ora....0B.lsnr application ONLINE ONLINE p670b
ora.p670b.gsd application ONLINE ONLINE p670b
ora.p670b.ons application ONLINE ONLINE p670b
ora.p670b.vip application ONLINE ONLINE p670b
ora.zhjport.db application ONLINE ONLINE p670b
ora....port.cs application ONLINE ONLINE p670b
ora....rt1.srv application ONLINE OFFLINE
ora....rt2.srv application ONLINE ONLINE p670b
ora....t1.inst application ONLINE OFFLINE
ora....t2.inst application ONLINE ONLINE p670b
2、通過PLSQL客戶端訪問正常;
五、後續分析
問題的根源是用戶把數據文件建在了本地,而非 ORADATAVG上,其他人破壞了本地的文件。
為什麼offline參數中加了drop,因為數據庫是非歸檔模式,如果是歸檔模式,則處理如下:
1.先mount數據庫
startup mount
2.脫機壞的數據文件
alter database datafile '/home/oracle/database/DEV_DATA' offline;
如果數據庫不是歸檔模式,則以上操作會報錯
3.打開 數據庫
alter database open;
4.恢復數據文件
recover datafile '/home/oracle/database/DEV_DATA';
5.聯機被脫機的數據文件
alter database datafile '/home/oracle/database/DEV_DATA' online;