一个重要的生产系统近期出现负载严重不均衡的情况,在业务忙时,一个节点CPU负载99%,而同时另一个节点CPU却很空闲。两个节点的活动会话如下。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
INST_ID STATUS COUNT(*) ------- -------- --------------- 1 ACTIVE 7 1 ACTIVE 6 1 ACTIVE 4 1 ACTIVE 4 1 ACTIVE 3 1 ACTIVE 2 1 ACTIVE 1 1 ACTIVE 1 2 ACTIVE 1 1 ACTIVE 1 1 ACTIVE 1 1 ACTIVE 1 1 ACTIVE 1 1 ACTIVE 1 2 ACTIVE 1 |
可见很不正常,绝大多数会话都连到了1号节点!
再看非活动会话情况,如下。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
INST_ID STATUS COUNT(*) ------- -------- --------------- 2 SNIPED 3 2 SNIPED 2 2 KILLED 922 2 KILLED 559 2 KILLED 406 2 KILLED 254 2 KILLED 4 2 KILLED 4 2 KILLED 2 2 KILLED 1 2 KILLED 1 2 KILLED 1 1 INACTIVE 202 1 INACTIVE 162 1 INACTIVE 147 1 INACTIVE 132 1 INACTIVE 60 1 INACTIVE 44 2 INACTIVE 41 2 INACTIVE 41 1 INACTIVE 36 1 INACTIVE 33 2 INACTIVE 28 1 INACTIVE 25 1 INACTIVE 24 1 INACTIVE 21 2 INACTIVE 20 1 INACTIVE 16 2 INACTIVE 13 1 INACTIVE 12 1 INACTIVE 10 2 INACTIVE 8 2 INACTIVE 8 1 INACTIVE 8 1 INACTIVE 8 1 INACTIVE 8 1 INACTIVE 6 2 INACTIVE 4 1 INACTIVE 3 1 INACTIVE 3 1 INACTIVE 3 2 INACTIVE 3 1 INACTIVE 2 1 INACTIVE 2 2 INACTIVE 2 2 INACTIVE 2 1 INACTIVE 2 1 INACTIVE 2 1 INACTIVE 2 1 INACTIVE 2 1 INACTIVE 2 1 INACTIVE 2 1 INACTIVE 2 2 INACTIVE 1 2 INACTIVE 1 1 INACTIVE 1 1 INACTIVE 1 1 INACTIVE 1 2 INACTIVE 1 1 INACTIVE 1 1 INACTIVE 1 2 INACTIVE 1 2 INACTIVE 1 |
可见大量的非活动会话,大部分都连接到了2号节点,据此再次印证了RAC集群的负载均衡机制只是简单的round-robin轮询模式,只是简单根据会话总数判断下一个连接的节点选择。本系统客户端都采用server端的load-banlance机制,因此应该确保两个节点在活动状态下,再启动weblogic等长连接应用,确保长连接应用平均分配到2个节点,这样才可以确保后续的大量短连接应用能够平均分布到两个节点!
附会话查询语句如下。
1 2 3 4 5 6 |
SELECT s.inst_id, s.username, s.machine, s.osuser, s.program, s.status, COUNT(*) FROM gv$session s WHERE 1 = 1 AND s.username NOT IN ('TESTUPDATE') GROUP BY s.inst_id, s.username, s.machine, s.osuser, s.program, s.status ORDER BY 6 DESC, 7 DESC; |
针对killed、snipped会话,应该kill session,如果kill不掉,需要kill 进程,使用如下语句。
1 2 3 4 5 6 7 8 9 10 |
COL todo FOR A80 SELECT s.username, s.status, s.machine, s.program, paddr, 'alter system kill session ''' || s.sid || ',' || s.serial# || ''' immediate;' todo FROM gv$session s WHERE s.username IN ('TESTUPDATE'); SELECT 'kill -9 ' || spid FROM gv$process WHERE addr = '0000001101AD75C8'; |
历史只会眷顾坚定者、奋进者、搏击者,而不会等待犹豫者、懈怠者、畏难者