还是那个库，又挂了，这次是2663

2020年6月3日 1621点热度 1人点赞 0条评论

还是上次那个2662的故障库，又挂了，这个老库是windows系统的，年岁已久，怀疑确实有硬件问题了，这次如能修复，得赶紧把历史数据导出来，后续该扔就扔！

无法启动了：
SQL> alter database open;
alter database open
*
第 1 行出现错误:
ORA-00354: 损坏重做日志块标头
ORA-00353: 日志损坏接近块 2138585 更改 16671892799912 时间 05/21/2020 02:43:27
ORA-00312: 联机日志 4 线程 1: 'E:\APP\ADMINROOT\ORADATA\WYDB\REDO04.LOG'

查看alertlog如下

Fri May 29 15:48:22 2020
alter database open
Beginning crash recovery of 1 threads
 parallel recovery started with 9 processes
Started redo scan
Incomplete read from log member 'E:\APP\ADMINROOT\ORADATA\WYDB\REDO04.LOG'. Trying next member.
Incomplete read from log member 'E:\APP\ADMINROOT\ORADATA\WYDB\REDO04.LOG'. Trying next member.
Fri May 29 15:48:35 2020
Errors in file E:\APP\ADMINROOT\diag\rdbms\wydb\wydb\trace\wydb_ora_1280.trc:
ORA-48132: 请求的文件锁正忙, [INCIDENT] [E:\APP\ADMINROOT\diag\rdbms\wydb\wydb\lck\AM_1762783_4031814035.lck]
ORA-00353: 日志损坏接近块 2138585 更改 16671892799912 时间 05/21/2020 02:43:27
ORA-00312: 联机日志 4 线程 1: 'E:\APP\ADMINROOT\ORADATA\WYDB\REDO04.LOG'
Fri May 29 15:48:36 2020
Dumping diagnostic data in directory=[cdmp_20200529154836], requested by (instance=1, osid=1280), summary=[abnormal process termination].
Aborting crash recovery due to error 354
Errors in file E:\APP\ADMINROOT\diag\rdbms\wydb\wydb\trace\wydb_ora_1280.trc:
ORA-00354: 损坏重做日志块标头
ORA-00353: 日志损坏接近块 2138585 更改 16671892799912 时间 05/21/2020 02:43:27
ORA-00312: 联机日志 4 线程 1: 'E:\APP\ADMINROOT\ORADATA\WYDB\REDO04.LOG'
Errors in file E:\APP\ADMINROOT\diag\rdbms\wydb\wydb\trace\wydb_ora_1280.trc:
ORA-00354: 损坏重做日志块标头
ORA-00353: 日志损坏接近块 2138585 更改 16671892799912 时间 05/21/2020 02:43:27
ORA-00312: 联机日志 4 线程 1: 'E:\APP\ADMINROOT\ORADATA\WYDB\REDO04.LOG'
ORA-354 signalled during: alter database open...
Fri May 29 15:49:06 2020
Incomplete read from log member 'E:\APP\ADMINROOT\ORADATA\WYDB\REDO04.LOG'. Trying next member.
Incomplete read from log member 'E:\APP\ADMINROOT\ORADATA\WYDB\REDO04.LOG'. Trying next member.
Errors in file E:\APP\ADMINROOT\diag\rdbms\wydb\wydb\trace\wydb_m000_6176.trc:
ORA-48132: requested file lock is busy, [INCIDENT] [E:\APP\ADMINROOT\diag\rdbms\wydb\wydb\lck\AM_1762783_4031814035.lck]
ORA-00353: log corruption near block 2140160 change 16671892799912 time 05/21/2020 02:43:27
ORA-00312: online log 4 thread 1: 'E:\APP\ADMINROOT\ORADATA\WYDB\REDO04.LOG'
Fri May 29 15:49:12 2020
Dumping diagnostic data in directory=[cdmp_20200529154912], requested by (instance=1, osid=6176 (M000)), summary=[abnormal process termination].

为了排除48132错误，重启wydb数据库服务，open数据库，日志如下

SQL> alter database open;
alter database open
*
第 1 行出现错误:
ORA-00354: 损坏重做日志块标头
ORA-00353: 日志损坏接近块 2138585 更改 16671892799912 时间 05/21/2020 02:43:27
ORA-00312: 联机日志 4 线程 1: 'E:\APP\ADMINROOT\ORADATA\WYDB\REDO04.LOG'
------->alertlog
Fri May 29 16:14:06 2020
alter database open
Beginning crash recovery of 1 threads
 parallel recovery started with 9 processes
Started redo scan
Incomplete read from log member 'E:\APP\ADMINROOT\ORADATA\WYDB\REDO04.LOG'. Trying next member.
Incomplete read from log member 'E:\APP\ADMINROOT\ORADATA\WYDB\REDO04.LOG'. Trying next member.
Errors in file E:\APP\ADMINROOT\diag\rdbms\wydb\wydb\trace\wydb_ora_6000.trc  (incident=260464):
ORA-00353: 日志损坏接近块 2138585 更改 16671892799912 时间 05/21/2020 02:43:27
ORA-00312: 联机日志 4 线程 1: 'E:\APP\ADMINROOT\ORADATA\WYDB\REDO04.LOG'
Incident details in: E:\APP\ADMINROOT\diag\rdbms\wydb\wydb\incident\incdir_260464\wydb_ora_6000_i260464.trc
Fri May 29 16:14:18 2020
Dumping diagnostic data in directory=[cdmp_20200529161418], requested by (instance=1, osid=6000), summary=[incident=260464].
Fri May 29 16:14:19 2020
Aborting crash recovery due to error 354
Errors in file E:\APP\ADMINROOT\diag\rdbms\wydb\wydb\trace\wydb_ora_6000.trc:
ORA-00354: 损坏重做日志块标头
ORA-00353: 日志损坏接近块 2138585 更改 16671892799912 时间 05/21/2020 02:43:27
ORA-00312: 联机日志 4 线程 1: 'E:\APP\ADMINROOT\ORADATA\WYDB\REDO04.LOG'
Errors in file E:\APP\ADMINROOT\diag\rdbms\wydb\wydb\trace\wydb_ora_6000.trc:
ORA-00354: 损坏重做日志块标头
ORA-00353: 日志损坏接近块 2138585 更改 16671892799912 时间 05/21/2020 02:43:27
ORA-00312: 联机日志 4 线程 1: 'E:\APP\ADMINROOT\ORADATA\WYDB\REDO04.LOG'
ORA-354 signalled during: alter database open...
Fri May 29 16:14:44 2020
Sweep [inc][260464]: completed

决定resetlog，操作如下

alter system set "_allow_resetlogs_corruption"=TRUE scope=spfile;
select file#, checkpoint_change# scn from v$datafile;
16,671,885,689,959
shutdown abort
startup mount
recover database until cancel;
alter database open resetlogs;
时间很长，，，但是报错了

SQL> alter database open resetlogs;
alter database open resetlogs
*
第 1 行出现错误:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [2663], [3881], [3117615201],
[3881], [3124723478], [], [], [], [], [], [], []
进程 ID: 3520
会话 ID: 2 序列号: 3

查看alertlog如下

Fri May 29 16:49:44 2020
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 16671885689959
Resetting resetlogs activation ID 1283227263 (0x4c7c7e7f)
Fri May 29 19:04:24 2020
Warning: VKTM detected a time drift.
Time drifts can result in an unexpected behavior such as time-outs. Please check trace file for more details.
Fri May 29 21:55:57 2020
Warning: VKTM detected a time drift.
Time drifts can result in an unexpected behavior such as time-outs. Please check trace file for more details.
Sat May 30 00:38:31 2020
Setting recovery target incarnation to 3
Sat May 30 00:38:33 2020
Assigning activation ID 1284121787 (0x4c8a24bb)
Thread 1 opened at log sequence 1
  Current log# 4 seq# 1 mem# 0: E:\APP\ADMINROOT\ORADATA\WYDB\REDO04.LOG
Successful open of redo thread 1
Sat May 30 00:38:35 2020
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sat May 30 00:38:36 2020
SMON: enabling cache recovery
Errors in file E:\APP\ADMINROOT\diag\rdbms\wydb\wydb\trace\wydb_ora_3520.trc  (incident=268387):
ORA-00600: 内部错误代码, 参数: [2663], [3881], [3117615201], [3881], [3124723478], [], [], [], [], [], [], []
Incident details in: E:\APP\ADMINROOT\diag\rdbms\wydb\wydb\incident\incdir_268387\wydb_ora_3520_i268387.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file E:\APP\ADMINROOT\diag\rdbms\wydb\wydb\trace\wydb_ora_3520.trc:
ORA-00600: 内部错误代码, 参数: [2663], [3881], [3117615201], [3881], [3124723478], [], [], [], [], [], [], []
Errors in file E:\APP\ADMINROOT\diag\rdbms\wydb\wydb\trace\wydb_ora_3520.trc:
ORA-00600: 内部错误代码, 参数: [2663], [3881], [3117615201], [3881], [3124723478], [], [], [], [], [], [], []
Error 600 happened during db open, shutting down database
USER (ospid: 3520): terminating the instance due to error 600
Instance terminated by USER, pid = 3520
ORA-1092 signalled during: alter database open resetlogs...
opiodr aborting process unknown ospid (3520) as a result of ORA-1092
Sat May 30 00:38:46 2020
ORA-1092 : opitsk aborting process

可见这次是遇见了2663错误，这是一个与2662类似的报错，采用类似的操作方法如下

--mount状态下查看scn值
SQL> select CHECKPOINT_CHANGE# from v$database;
    16671885689963
SQL> select file#,CHECKPOINT_CHANGE# from v$datafile;
         1     16671885689963
         2     16671885689963
         4     16671885689963
         5     16671885689963
         6     16671885689963
         7     16671885689963
         8     16671885689963
         9     16671885689963
        10     16671885689963
--计算推进的SCN数值
select 3881*power(2,32)+3124724000 scn from dual;
16671892799776
比 16671885689963 稍大，应该合适
--nomount状态下，创建controlfile
CREATE CONTROLFILE REUSE DATABASE "WYDB" RESETLOGS NOARCHIVELOG
-- SET STANDBY TO MAXIMIZE PERFORMANCE
    MAXLOGFILES 10
    MAXLOGMEMBERS 3
    MAXDATAFILES 100
    MAXINSTANCES 1
    MAXLOGHISTORY 226
LOGFILE
  GROUP 4 'E:\APP\ADMINROOT\ORADATA\WYDB\REDO04.LOG'  SIZE 2000M, 
  GROUP 5 'E:\APP\ADMINROOT\ORADATA\WYDB\REDO05.LOG'  SIZE 2000M,
  GROUP 6 'E:\APP\ADMINROOT\ORADATA\WYDB\REDO06.LOG'  SIZE 2000M
-- STANDBY LOGFILE
DATAFILE
  'E:\APP\ADMINROOT\ORADATA\WYDB\SYSTEM01.DBF',
  'E:\APP\ADMINROOT\ORADATA\WYDB\SYSAUX01.DBF',
  'E:\APP\ADMINROOT\ORADATA\WYDB\UNDO01.DBF',
  'E:\APP\ADMINROOT\ORADATA\WYDB\USERS01.DBF',
  'E:\APP\ADMINROOT\ORADATA\WYDB\PM_LTE_TBS.DBF',
  'E:\APP\ADMINROOT\ORADATA\WYDB\PM_GSM_TBS.DBF',
  'E:\APP\ADMINROOT\ORADATA\WYDB\SYSAUX02.DBF',
  'E:\APP\ADMINROOT\ORADATA\WYDB\SYSTEM02.DBF',
  'E:\APP\ADMINROOT\ORADATA\WYDB\AOT_TAB.DBF'
  ;
--修改scn
oradebug setmypid
oradebug DUMPvar SGA kcsgscn
kcslf kcsgscn_ [149876FA0, 149876FD0) = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 49876C30 00000001
oradebug poke 0x149876FA0 8 16671892799776
BEFORE: [149876FA0, 149876FA8) = 00000000 00000000
AFTER:  [149876FA0, 149876FA8) = BA3F8120 00000F29
oradebug DUMPvar SGA kcsgscn
kcslf kcsgscn_ [149876FA0, 149876FD0) = BA3F8120 00000F29 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 49876C30 00000001
alter database open resetlogs;
--16:41开始，这次比较快，5分钟就报错了
SQL> alter database open resetlogs;
alter database open resetlogs
*
第 1 行出现错误:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [2662], [3881], [3124724009], [3881], [3164245129], [41943344], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [2662], [3881], [3124724008], [3881], [3164245129], [41943344], [], [], [], [], [], []
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [2662], [3881], [3124724006], [3881], [3164245129], [41943344], [], [], [], [], [], []
进程 ID: 3224
会话 ID: 1378 序列号: 3

恰逢六一儿童节，windows记事本下竟然误操作覆盖了alertlog日志文件，还好，反正就是这个2662错误。

--启动到nomount状态，计算推进的SCN数值
select 3881*power(2,32)+3164247000 scn from dual;
16671932322776
--创建controlfile，内容同上次
oradebug setmypid
oradebug DUMPvar SGA kcsgscn
oradebug poke 0x149876FA0 8 16671932322776
alter database open resetlogs;

--这次打开数据库，hang住了，日志如下

Mon Jun 01 09:55:16 2020
alter database open resetlogs
Mon Jun 01 09:56:44 2020
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
Mon Jun 01 09:57:05 2020
RESETLOGS after incomplete recovery UNTIL CHANGE 16671892799777
Mon Jun 01 09:58:47 2020
Clearing online redo logfile 4 E:\APP\ADMINROOT\ORADATA\WYDB\REDO04.LOG
Mon Jun 01 09:58:57 2020
Clearing online log 4 of thread 1 sequence number 0
Mon Jun 01 10:30:37 2020
Clearing online redo logfile 4 complete
Online log E:\APP\ADMINROOT\ORADATA\WYDB\REDO04.LOG: Thread 1 Group 4 was previously cleared
Online log E:\APP\ADMINROOT\ORADATA\WYDB\REDO05.LOG: Thread 1 Group 5 was previously cleared
Online log E:\APP\ADMINROOT\ORADATA\WYDB\REDO06.LOG: Thread 1 Group 6 was previously cleared
Mon Jun 01 10:30:59 2020
Setting recovery target incarnation to 2
Mon Jun 01 10:31:33 2020
Initializing SCN for created control file
Database SCN compatibility initialized to 3
Mon Jun 01 10:32:03 2020
Assigning activation ID 1284286079 (0x4c8ca67f)
Thread 1 opened at log sequence 1
  Current log# 4 seq# 1 mem# 0: E:\APP\ADMINROOT\ORADATA\WYDB\REDO04.LOG
Successful open of redo thread 1
Mon Jun 01 10:32:12 2020
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Mon Jun 01 10:32:16 2020
SMON: enabling cache recovery
Mon Jun 01 10:32:49 2020
[2964] Successfully onlined Undo Tablespace 5.
Undo initialization finished serial:0 start:1364258804 end:1364267618 diff:8814 (88 seconds)
Dictionary check beginning
Mon Jun 01 10:33:14 2020
Tablespace 'TEMP3' #10 found in data dictionary,
but not in the controlfile. Adding to controlfile.
Mon Jun 01 10:34:32 2020
Dictionary check complete
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
Mon Jun 01 10:34:33 2020
SMON: enabling tx recovery
*********************************************************************
WARNING: The following temporary tablespaces contain no files.
         This condition can occur when a backup controlfile has
         been restored.  It may be necessary to add files to these
         tablespaces.  That can be done using the SQL statement:

         ALTER TABLESPACE  ADD TEMPFILE

         Alternatively, if these temporary tablespaces are no longer
         needed, then they can be dropped.
           Empty temporary tablespace: TEMP3
*********************************************************************
Updating character set in controlfile to ZHS16GBK
SMON: Restarting fast_start parallel rollback
Mon Jun 01 10:34:36 2020
Errors in file E:\APP\ADMINROOT\diag\rdbms\wydb\wydb\trace\wydb_p000_4428.trc  (incident=300395):
ORA-00600: internal error code, arguments: [4198], [], [], [], [], [], [], [], [], [], [], []
Incident details in: E:\APP\ADMINROOT\diag\rdbms\wydb\wydb\incident\incdir_300395\wydb_p000_4428_i300395.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Mon Jun 01 10:34:47 2020
No Resource Manager plan active
Mon Jun 01 10:35:11 2020
**********************************************************
WARNING: Files may exists in db_recovery_file_dest
that are not known to the database. Use the RMAN command
CATALOG RECOVERY AREA to re-catalog any such files.
If files cannot be cataloged, then manually delete them
using OS command.
One of the following events caused this:
1. A backup controlfile was restored.
2. A standby controlfile was restored.
3. The controlfile was re-created.
4. db_recovery_file_dest had previously been enabled and
   then disabled.
**********************************************************
replication_dependency_tracking turned off (no async multimaster replication found)
Mon Jun 01 10:35:23 2020
Dumping diagnostic data in directory=[cdmp_20200601103523], requested by (instance=1, osid=4428 (P000)), summary=[incident=300395].
Mon Jun 01 10:35:38 2020
Block recovery from logseq 1, block 3 to scn 16671932322800
Recovery of Online Redo Log: Thread 1 Group 4 Seq 1 Reading mem 0
  Mem# 0: E:\APP\ADMINROOT\ORADATA\WYDB\REDO04.LOG
Mon Jun 01 10:35:48 2020
Starting background process QMNC
Mon Jun 01 10:35:48 2020
QMNC started with pid=41, OS id=2316 
Block recovery completed at rba 1.20.16, scn 3881.3164247025
Block recovery from logseq 1, block 53 to scn 16671932322858
Mon Jun 01 10:35:59 2020
Recovery of Online Redo Log: Thread 1 Group 4 Seq 1 Reading mem 0
  Mem# 0: E:\APP\ADMINROOT\ORADATA\WYDB\REDO04.LOG
Mon Jun 01 10:36:29 2020
Block recovery completed at rba 1.54.16, scn 3881.3164247083
Mon Jun 01 10:36:38 2020
Sweep [inc][300395]: completed
Sweep [inc2][300395]: completed
Mon Jun 01 10:36:38 2020
LOGSTDBY: Validating controlfile with logical metadata
LOGSTDBY: Validation complete
Mon Jun 01 10:36:38 2020
Errors in file E:\APP\ADMINROOT\diag\rdbms\wydb\wydb\trace\wydb_smon_6580.trc  (incident=300339):
ORA-00600: internal error code, arguments: [4198], [], [], [], [], [], [], [], [], [], [], []
Incident details in: E:\APP\ADMINROOT\diag\rdbms\wydb\wydb\incident\incdir_300339\wydb_smon_6580_i300339.trc
Mon Jun 01 10:36:39 2020
Errors in file E:\APP\ADMINROOT\diag\rdbms\wydb\wydb\trace\wydb_mmon_5520.trc  (incident=300355):
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
Incident details in: E:\APP\ADMINROOT\diag\rdbms\wydb\wydb\incident\incdir_300355\wydb_mmon_5520_i300355.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Mon Jun 01 10:36:40 2020
Dumping diagnostic data in directory=[cdmp_20200601103640], requested by (instance=1, osid=6580 (SMON)), summary=[incident=300339].
Mon Jun 01 10:36:51 2020
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
SMON: Parallel transaction recovery slave got internal error
SMON: Downgrading transaction recovery to serial
Errors in file E:\APP\ADMINROOT\diag\rdbms\wydb\wydb\trace\wydb_smon_6580.trc  (incident=300340):
ORA-00600: internal error code, arguments: [4137], [1.9.1621727], [0], [0], [], [], [], [], [], [], [], []
Incident details in: E:\APP\ADMINROOT\diag\rdbms\wydb\wydb\incident\incdir_300340\wydb_smon_6580_i300340.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Mon Jun 01 10:36:53 2020
Dumping diagnostic data in directory=[cdmp_20200601103653], requested by (instance=1, osid=6580 (SMON)), summary=[incident=300340].
Mon Jun 01 10:38:54 2020
ORACLE Instance wydb (pid = 13) - Error 600 encountered while recovering transaction (1, 9).
Errors in file E:\APP\ADMINROOT\diag\rdbms\wydb\wydb\trace\wydb_smon_6580.trc:
ORA-00600: internal error code, arguments: [4137], [1.9.1621727], [0], [0], [], [], [], [], [], [], [], []
Mon Jun 01 10:38:56 2020
Dumping diagnostic data in directory=[cdmp_20200601103856], requested by (instance=1, osid=6580 (SMON)), summary=[abnormal process termination].
Mon Jun 01 10:44:45 2020
Exception [type: ACCESS_VIOLATION, UNABLE_TO_READ] [ADDR:0x2905AE00] [PC:0xFABC18, kgegpa()+38]
Dump file e:\app\adminroot\diag\rdbms\wydb\wydb\trace\alert_wydb.log
Mon Jun 01 10:45:21 2020
ORACLE V11.2.0.4.0 - 64bit Production vsnsta=0
vsnsql=16 vsnxtr=3
Windows NT Version V6.1 Service Pack 1 
CPU                 : 10 - type 8664, 10 Physical Cores
Process Affinity    : 0x0x0000000000000000
Memory (Avail/Total): Ph:83157M/131071M, Ph+PgF:214478M/262141M 
VM name             : VMWare Version (6)

Mon Jun 01 10:45:21 2020
Errors in file 
ORA-07445: caught exception [ACCESS_VIOLATION] at [kgegpa()+38] [0x0000000000FABC18]
Mon Jun 01 10:49:47 2020
Dumping diagnostic data in directory=[cdmp_20200601104947], requested by (instance=1, osid=5520 (MMON)), summary=[incident=300355].
Mon Jun 01 10:55:23 2020
Process 0x0000000B10D4D800 appears to be hung while dumping
Current time = 136562830, process death time = 136502581 interval = 60000
Attempting to kill process 0x0000000B10D4D800 with OS pid = 2964
Mon Jun 01 10:55:45 2020
Exception [type: ACCESS_VIOLATION, UNABLE_TO_READ] [ADDR:0x5B19A4D4] [PC:0xFABC18, kgegpa()+38]
Mon Jun 01 10:55:45 2020
Errors in file E:\app\adminroot\diag\rdbms\wydb\wydb\cdump\wydbcore.log
ORA-07445: caught exception [ACCESS_VIOLATION] at [kgegpa()+38] [0x0000000000FABC18]
Mon Jun 01 11:03:01 2020
Warning: skgmdetach - Unable to register unmap, error 4210

--明天继续处理
hung住了，无法再次登入管理，决定重启主机，顺便看下主机重启后是否变快了。

还是那个库，又挂了，这次是2663

文章评论