大量事务并发回滚彻底堵塞数据库(4)

2021年6月12日 1100点热度 0人点赞 0条评论

(接上文)
https://liking.site/2021/06/09/大量事务并发回滚彻底堵塞数据库1/
https://liking.site/2021/06/10/大量事务并发回滚彻底堵塞数据库2/
https://liking.site/2021/06/11/大量事务并发回滚彻底堵塞数据库3/
根据oratop的top等待事件排名，包括累积排名和实时排名，这个"wait for a undo record"已经成为了目标等待事件，是否它阻塞了正常的前滚事务呢？
根据MOS的搜索结果，当这个事件成为top事件的时候，原因似乎只有一个，如下所述。

CAUSE:

In parallel transaction recovery this wait-event is used in a number of places while waiting for some work.

In fast-start parallel rollback, the background process SMON acts as a coordinator and rolls back a set of transactions in parallel using multiple server processes.
Fast start parallel rollback is mainly useful when a system has transactions that run a long time before a commit, especially parallel Inserts, Updates, Deletes operations. When SMON discovers that the amount of recovery work is above a certain threshold, it automatically begins parallel rollback by dispersing the work among several parallel processes.

There are cases where parallel transaction recovery is not as fast as serial transaction recovery because the PQ slaves may interfere with each others work by contending for the same resource. With such a transaction rollback performance may be worse in parallel when compared to a serial rollback.

Because of this contention and the perceived slowness and 'hang' like symptoms (the database may seem to hang, SMON and parallel query slaves may be seen to take all the available CPU), DBA intervention may be taken. If a large transaction is terminated, the the cleanup may also take time.

谷歌翻译如下：

在并行事务恢复中，此等待事件在等待某些工作时在许多地方使用。

在快速启动并行回滚中，后台进程 SMON 充当协调器并使用多个服务器进程并行回滚一组事务。
当系统具有在提交前运行很长时间的事务时，快速启动并行回滚主要有用，尤其是并行插入、更新、删除操作。当 SMON 发现恢复工作量超过某个阈值时，它会通过将工作分散到多个并行进程中来自动开始并行回滚。

在某些情况下，并行事务恢复不如串行事务恢复快，因为 PQ 从设备可能会通过竞争相同的资源来干扰彼此的工作。与串行回滚相比，这种事务回滚的并行性能可能更差。

由于这种争用以及感知到的缓慢和类似“挂起”的症状（数据库可能看起来挂起，可能会看到 SMON 和并行查询从属节点占用了所有可用的 CPU），DBA 可能会进行干预。如果大型事务被终止，清理也可能需要时间。

此时的解决方案只有一个，如下。

SOLUTION:

You can disable parallel rollback by setting the following parameter

fast_start_parallel_rollback = false

If this is a one-time issue, reported after rollback of a huge transaction (s), you can remove this parameter after the transaction recovery.
BEWARE: that setting this parameter dynamically can cause problems on a busy instance with a lot of active transaction work and it is safer to set this with an instance restart so as not to change the rollback strategy on active transactions.

该参数用于指定回滚的并行度，有三个值
FALSE: Parallel rollback is disabled
LOW: Limits the maximum degree of parallelism to 2 CPU_COUNT
HIGH : Limits the maximum degree of parallelism to 4 CPU_COUNT
当修改该参数时rollback会停止并重新启动

此时已经是22:20左右，执行如下。

SYS@jkdb2> show parameter fast_start_parallel_rollback
NAME                                 TYPE        VALUE
------------------------------------ ----------- -----
fast_start_parallel_rollback         string      LOW
发现该参数为low,把他关闭，设置为FALSE

SYS@jkdb2> alter system set fast_start_parallel_rollback = FALSE scope=both sid='*';
System altered.
此时oratop监控发现数据库迅速恢复为正常状态！

大量事务并发回滚彻底堵塞数据库(4)

文章评论