11G DataGuard 日志同步错误处理案例-20180821

2018年8月22日 7115点热度 0人点赞 0条评论

接同事电话，ADG备库同步出了问题。
在备库查看，大量的redolog没有apply。
在备库查看GAP，竟然没有查到。

查看备库日志，当前只有1号、3号的日志不停的传了过来，唯独缺了2号的，如下：

Tue Aug 21 09:28:00 2018
RFS[16]: Selected log 22 for thread 1 sequence 91327 dbid 1638344406 branch 954674776
Tue Aug 21 09:28:06 2018
Archived Log entry 76689 added for thread 1 sequence 91326 ID 0x61a75fd3 dest 2:
Tue Aug 21 09:32:55 2018
RFS[16]: Selected log 21 for thread 1 sequence 91328 dbid 1638344406 branch 954674776
Tue Aug 21 09:32:58 2018
Archived Log entry 76690 added for thread 1 sequence 91327 ID 0x61a75fd3 dest 2:
Tue Aug 21 09:36:41 2018
RFS[16]: Selected log 22 for thread 1 sequence 91329 dbid 1638344406 branch 954674776
Tue Aug 21 09:36:42 2018
RFS[17]: Selected log 41 for thread 3 sequence 73888 dbid 1638344406 branch 954674776
Tue Aug 21 09:36:45 2018
Archived Log entry 76691 added for thread 3 sequence 73887 ID 0x61a75fd3 dest 2:
Tue Aug 21 09:36:46 2018
Archived Log entry 76692 added for thread 1 sequence 91328 ID 0x61a75fd3 dest 2:
Tue Aug 21 09:40:10 2018
RFS[16]: Selected log 21 for thread 1 sequence 91330 dbid 1638344406 branch 954674776
Tue Aug 21 09:40:21 2018
Archived Log entry 76693 added for thread 1 sequence 91329 ID 0x61a75fd3 dest 2:
Tue Aug 21 09:41:29 2018
RFS[16]: Selected log 22 for thread 1 sequence 91331 dbid 1638344406 branch 954674776
Tue Aug 21 09:41:33 2018
Archived Log entry 76694 added for thread 1 sequence 91330 ID 0x61a75fd3 dest 2:
Tue Aug 21 09:42:33 2018
RFS[16]: Selected log 21 for thread 1 sequence 91332 dbid 1638344406 branch 954674776
Tue Aug 21 09:42:35 2018
RFS[17]: Selected log 42 for thread 3 sequence 73889 dbid 1638344406 branch 954674776

看来是2号节点那边出了问题。

查看2号节点日志，果然是无法登陆adg节点，如下：

Tue Aug 21 08:41:30 2018
ORA-16198: LGWR received timedout error from KSR
LGWR: Attempting destination LOG_ARCHIVE_DEST_1 network reconnect (16198)
LGWR: Destination LOG_ARCHIVE_DEST_1 network reconnect abandoned
Error 16198 for archive log file 9 to 'jkdbadg'
ORA-16198: LGWR received timedout error from KSR
LGWR: Error 16198 disconnecting from destination LOG_ARCHIVE_DEST_1 standby host 'jkdbadg'
Tue Aug 21 08:41:51 2018
LGWR: Failed to archive log 9 thread 2 sequence 73164 (16198)

，，，

Tue Aug 21 08:47:13 2018
Error 1017 received logging on to the standby
------------------------------------------------------------
Check that the primary and standby are using a password file
and remote_login_passwordfile is set to SHARED or EXCLUSIVE,
and that the SYS password is same in the password files.
      returning error ORA-16191
------------------------------------------------------------
PING[ARC2]: Heartbeat failed to connect to standby 'jkdbadg'. Error is 16191.

让同事协助重启2号节点，依然有问题，从1号节点拷贝来密码文件，命名为2号的密码文件，再次重启，查看备库的GAP，这次GAP信息对了，如下：

SYS@jkdb1> SELECT * FROM v$archive_gap;

        THREAD#   LOW_SEQUENCE#  HIGH_SEQUENCE#
--------------- --------------- ---------------
              2           73164           73213

SYS@jkdb1> SELECT thread#, MAX(sequence#) FROM v$archived_log WHERE applied = 'YES' GROUP BY thread#;

        THREAD#  MAX(SEQUENCE#)
--------------- ---------------
              1           91317
              2           73163
              3           73875

可见，2号节点的73164-73213共计50个redolog没有传过来，2号的redo只在备库应用到73163，并堵塞了1号、3号的日志应用。

只能从2号主库去找这些日志了，不幸的是，2号上的日志只保留到了73124开始往后的，如下：

select name from v$archived_log where name like '+FRA/jkdb/archivelog/2018_08_21/thread_2%' order by sequence#;
NAME
----------------------------------------------------------------------------------------------------
+FRA/jkdb/archivelog/2018_08_21/thread_2_seq_73214.4526.984744477
+FRA/jkdb/archivelog/2018_08_21/thread_2_seq_73215.4486.984744967
+FRA/jkdb/archivelog/2018_08_21/thread_2_seq_73216.2786.984745033
+FRA/jkdb/archivelog/2018_08_21/thread_2_seq_73217.1013.984745537
+FRA/jkdb/archivelog/2018_08_21/thread_2_seq_73218.3876.984745721
，，，

可见，前面的日志已经在备份后删除了，只能从备份恢复出来，再搞到备库。

此处先暂停2号节点的自动备份与删除任务，恢复日志备份，如下：

restore archivelog from logseq 73164 until logseq 73213 thread 2;
单通道恢复太慢！改为多通道，效率提升明显，如下：

run {
allocate channel c1 type disk;
allocate channel c2 type disk;
allocate channel c3 type disk;
allocate channel c4 type disk;

restore archivelog from logseq 73170 until logseq 73213 thread 2;

release channel c1;
release channel c2;
release channel c3;
release channel c4;
}

耗时几个小时!
此时查看可见恢复了这些日志：
select name from v$archived_log where name like '+FRA/jkdb/archivelog/2018_08_21/thread_2%' order by sequence#;
此处显示结果略。

将这些短缺的redolog拷贝到备库，如下：

run {
allocate channel c1 type disk;
allocate channel c2 type disk;
allocate channel c3 type disk;
allocate channel c4 type disk;
copy archivelog '+FRA/jkdb/archivelog/2018_08_21/thread_2_seq_73164.3169.984774029' to '/nfs/bak/archivelog/2018_08_21/thread_2_seq_73164.3169.984774029';
copy archivelog '+FRA/jkdb/archivelog/2018_08_21/thread_2_seq_73165.2819.984774871' to '/nfs/bak/archivelog/2018_08_21/thread_2_seq_73165.2819.984774871';
copy archivelog '+FRA/jkdb/archivelog/2018_08_21/thread_2_seq_73166.3733.984776431' to '/nfs/bak/archivelog/2018_08_21/thread_2_seq_73166.3733.984776431';
,,,
copy archivelog '+FRA/jkdb/archivelog/2018_08_21/thread_2_seq_73211.3202.984780775' to '/nfs/bak/archivelog/2018_08_21/thread_2_seq_73211.3202.984780775';
copy archivelog '+FRA/jkdb/archivelog/2018_08_21/thread_2_seq_73212.3220.984782767' to '/nfs/bak/archivelog/2018_08_21/thread_2_seq_73212.3220.984782767';
copy archivelog '+FRA/jkdb/archivelog/2018_08_21/thread_2_seq_73213.3700.984778461' to '/nfs/bak/archivelog/2018_08_21/thread_2_seq_73213.3700.984778461';

release channel c1;
release channel c2;
release channel c3;
release channel c4;
}

又耗费几个小时，汗。

然后再备库注册这些日志，如下：

alter database register logfile '/nfs/bak/archivelog/2018_08_21/thread_2_seq_73164.3169.984774029';
alter database register logfile '/nfs/bak/archivelog/2018_08_21/thread_2_seq_73165.2819.984774871';
alter database register logfile '/nfs/bak/archivelog/2018_08_21/thread_2_seq_73166.3733.984776431';
,,,
alter database register logfile '/nfs/bak/archivelog/2018_08_21/thread_2_seq_73211.3202.984780775';
alter database register logfile '/nfs/bak/archivelog/2018_08_21/thread_2_seq_73212.3220.984782767';
alter database register logfile '/nfs/bak/archivelog/2018_08_21/thread_2_seq_73213.3700.984778461';

需要注意的是，注册时提示已经注册了。但是备库并没有主动开始应用这些日志。

在备库关闭同步，再打开同步，让它自己慢慢apply吧！预计至少几个小时吧，因为截至此时，已经积攒了600多个日志待apply。