■2021-01-14 15:07
存储开始出现问题,主数据库2个节点(node1、node2)均报共享存储'I/O error'
-------node1 alert 2021-01-14T15:07:24.122930+08:00 WARNING: Write Failed. group:1 disk:3 AU:81 offset:2022912 size:4096 path:/dev/asmdisks/su001_lun04 incarnation:0xf0f0720e asynchronous result:'I/O error' subsys:System krq:0x7fd35c49c928 bufp:0x2060dcb800 osderr1:0x69b5 osderr2:0x0 IO elapsed time: 0 usec Time waited on I/O: 0 usec ---此处省略若干 2021-01-14T15:23:38.253848+08:00 NOTE: updating disk modes to 0x7 from 0x5 for disk 3 (DATA3_0003) in group 3 (DATA3): lflags 0x0 NOTE: updating disk modes to 0x7 from 0x5 for disk 5 (DATA3_0005) in group 3 (DATA3): lflags 0x0 NOTE: updating disk modes to 0x7 from 0x5 for disk 7 (DATA3_0007) in group 3 (DATA3): lflags 0x0 NOTE: updating disk modes to 0x7 from 0x5 for disk 33 (DATA3_0033) in group 3 (DATA3): lflags 0x0 NOTE: updating disk modes to 0x7 from 0x5 for disk 46 (DATA3_0046) in group 3 (DATA3): lflags 0x0 NOTE: disk 3 (DATA3_0003) in group 3 (DATA3) is online for reads NOTE: disk 5 (DATA3_0005) in group 3 (DATA3) is online for reads NOTE: disk 7 (DATA3_0007) in group 3 (DATA3) is online for reads NOTE: disk 33 (DATA3_0033) in group 3 (DATA3) is online for reads NOTE: disk 46 (DATA3_0046) in group 3 (DATA3) is online for reads ---此处省略若干 2021-01-14T15:31:09.600620+08:00 NOTE: ASMB process initiating disk discovery for grp 1 (reqid:4751423187424713884) NOTE: Found /dev/asmdisks/su001_lun02 for disk DATA1_0001 SUCCESS: disk DATA1_0001 (1.4042289684) replaced in diskgroup DATA1 path: /dev/asmdisks/su001_lun02 SUCCESS: completed reconfiguration of group 1 (DATA1) NOTE: updating disk modes to 0x5 from 0x1 for disk 1 (DATA1_0001) in group 1 (DATA1): lflags 0x0 NOTE: disk 1 (DATA1_0001) in group 1 (DATA1) is online for writes 2021-01-14T15:31:25.524942+08:00 NOTE: updating disk modes to 0x7 from 0x5 for disk 1 (DATA1_0001) in group 1 (DATA1): lflags 0x0 NOTE: disk 1 (DATA1_0001) in group 1 (DATA1) is online for reads -------node2 alert 2021-01-14T15:07:23.320330+08:00 PDBAPP(3):WARNING: Read Failed. group:1 disk:5 AU:138139 offset:3162112 size:8192 PDBAPP(3):path:/dev/asmdisks/su001_lun06 PDBAPP(3): incarnation:0xf0f0510c synchronous result:'I/O error' PDBAPP(3): subsys:System krq:0x7f8f2bd86ec0 bufp:0x1f5a5de000 osderr1:0x69c0 osderr2:0x0 PDBAPP(3): IO elapsed time: 0 usec Time waited on I/O: 0 usec ---此处省略若干 2021-01-14T15:37:12.581586+08:00 IO elapsed time: 0 usec Time waited on I/O: 0 usec ---此处省略若干 2021-01-14T15:37:45.068876+08:00 WARNING: Read Failed. group:1 disk:20 AU:7 offset:16384 size:16384 path:/dev/asmdisks/su003_lun01 incarnation:0xf0f0511e synchronous result:'I/O error' subsys:System krq:0x7ff4b2ec9b10 bufp:0x7ff4b2c43000 osderr1:0x69c0 osderr2:0x0 (mirrored read) IO elapsed time: 0 usec Time waited on I/O: 0 usec
■2021-01-14 15:37
备库监控开始报错,主库长时间失联,导致主备切换
-------1 adg 2021-01-14T15:37:30.328+08:00 Fast-Start Failover cannot proceed because: "primary last contacted this standby within FastStartFailoverThreshold seconds" primary last contacted to instance 1 17 seconds ago 2021-01-14T15:37:33.343+08:00 Fast-Start Failover cannot proceed because: "primary last contacted this standby within FastStartFailoverThreshold seconds" primary last contacted to instance 1 20 seconds ago 2021-01-14T15:37:36.356+08:00 Fast-Start Failover cannot proceed because: "primary last contacted this standby within FastStartFailoverThreshold seconds" primary last contacted to instance 1 23 seconds ago 2021-01-14T15:37:39.370+08:00 Fast-Start Failover cannot proceed because: "primary last contacted this standby within FastStartFailoverThreshold seconds" primary last contacted to instance 1 26 seconds ago 2021-01-14T15:37:42.383+08:00 Fast-Start Failover cannot proceed because: "primary last contacted this standby within FastStartFailoverThreshold seconds" primary last contacted to instance 1 29 seconds ago 2021-01-14T15:37:46.854+08:00 FAILOVER TO wydbadg Starting FAILOVER -------2 adg alert 2021-01-14T15:37:45.397868+08:00 Attempting Fast-Start Failover because the threshold of 30 seconds has elapsed. 2021-01-14T15:37:46.514371+08:00 Errors in file /u01/app/oracle/diag/rdbms/wydbadg/wydb1/trace/wydb1_ora_120118.trc: 2021-01-14T15:37:46.847916+08:00 Attempting Fast-Start Failover because the threshold of 30 seconds has elapsed. 2021-01-14T15:37:46.890811+08:00 Beginning Fast-Start Failover to database wydbadg. 2021-01-14T15:37:46.915192+08:00 ALTER DATABASE FAILOVER TO wydbadg -------3 node1 alert 2021-01-14T15:38:30.185383+08:00 Starting background process FSFP 2021-01-14T15:38:30.214348+08:00 FSFP started with pid=87, OS id=205120 2021-01-14T15:38:31.184950+08:00 Primary has heard from neither observer nor target standby within FastStartFailoverThreshold seconds. It is likely an automatic failover has already occurred. Primary is shutting down.
■分析结论
当时厂家人员做心跳故障模拟测试,可能有相关操作导致了存储出现问题(日志显示两个节点的共享磁盘均出现了掉线、上线的过程),从而导致了主库2个节点均出现了问题,最终触发了主备自动切换。
文章评论