oracle systemstate dump

2017年11月28日 739点热度 0人点赞 0条评论

当数据库出现严重的性能问题或挂起hang住的时候,这时候我们需要去找出问题的根源,需要查明相关进程在等什么,谁是资源持有者,即谁阻塞了别人?此时收集systemstate dump非常有用,专业的oracle支持也是需要收集这个信息来分析问题。

但往往此时,数据库服务端sqlplus连接可能会很慢,甚至无法连接,从10g开始,oracle提供了一个参数-prelim,可以在sqlplus无法连接的情况下,连接数据库。

There are two ways to connect to sqlplus using a preliminary connection.
sqlplus -prelim / as sysdba
sqlplus /nolog
set _prelim on
connect / as sysdba

下面是在metalink上的介绍如何在单机或RAC环境下做systemstate或hanganalyze

Collection commands for Hanganalyze and Systemstate: Non-RAC

Sometimes, database may actually just be very slow and not actually hanging. It is therefore recommended,  where possible to get 2 hanganalyze and 2 systemstate dumps in order to determine whether processes are moving at all or whether they are "frozen".

Hanganalyze
sqlplus '/ as sysdba'
oradebug setmypid
oradebug unlimit
oradebug hanganalyze 3
-- Wait one minute before getting the second hanganalyze
oradebug hanganalyze 3
oradebug tracefile_name
exit

Systemstate
sqlplus '/ as sysdba'
oradebug setmypid
oradebug unlimit
oradebug dump systemstate 266
oradebug dump systemstate 266
oradebug tracefile_name
exit

Collection commands for Hanganalyze and Systemstate: RAC

There are 2 bugs affecting RAC that without the relevant patches being applied on your system, make using level 266 or 267 very costly. Therefore without these fixes in place it highly unadvisable to use these level

For information on these patches see:
Document 11800959.8 Bug 11800959 - A SYSTEMSTATE dump with level >= 10 in RAC dumps huge BUSY GLOBAL CACHE ELEMENTS - can hang/crash instances 
Document 11827088.8 Bug 11827088 - Latch 'gc element' contention, LMHB terminates the instance 
Note:  both bugs are fixed in 11.2.0.3.

Collection commands for Hanganalyze and Systemstate: RAC with fixes for bug 11800959 and bug 11827088

For 11g:
sqlplus '/ as sysdba'
oradebug setorapname reco
oradebug  unlimit
oradebug -g all hanganalyze 3
oradebug -g all hanganalyze 3
oradebug -g all dump systemstate 266
oradebug -g all dump systemstate 266
exit
Collection commands for Hanganalyze and Systemstate: RAC without fixes for Bug 11800959 and Bug 11827088
sqlplus '/ as sysdba'
oradebug setorapname reco
oradebug unlimit
oradebug -g all hanganalyze 3
oradebug -g all hanganalyze 3
oradebug -g all dump systemstate 258
oradebug -g all dump systemstate 258
exit

For 10g, run oradebug setmypid instead of oradebug setorapname reco:
sqlplus '/ as sysdba'
oradebug setmypid
oradebug unlimit
oradebug -g all hanganalyze 3
oradebug -g all hanganalyze 3
oradebug -g all dump systemstate 258
oradebug -g all dump systemstate 258
exit
In RAC environment, a dump will be created for all RAC instances in the DIAG trace file for each instance.

实际操作时,在alertlog可以看到相关的导出信息。

systemstate dump有多个级别:
2: dump (不包括lock element)
10: dump
11: dump + global cache of RAC
256: short stack (函数堆栈)
258: 256+2 -->short stack +dump(不包括lock element)
266: 256+10 -->short stack+ dump
267: 256+11 -->short stack+ dump + global cache of RAC

level 11和 267会 dump global cache, 会生成较大的trace 文件,一般情况下不推荐。一般情况下,如果进程不是太多,推荐用266,因为这样可以dump出来进程的函数堆栈,可以用来分析进程在执行什么操作。但是生成short stack比较耗时,如果进程非常多,比如2000个进程,那么可能耗时30分钟以上。这种情况下,可以生成level 10 或者 level 258, level 258 比 level 10会多收集short short stack, 但比level 10少收集一些lock element data.
使用systemstate dump生成的trace文件可能会非常大,一般会几百兆甚至更大,如何有效解读相关信息并诊断问题是一个挑战。

liking

这个人很懒,什么都没留下

文章评论