Hang Manager (HM)
has existed since 10.2.0.1. The main goal of Hang Manager is to reliably
detect, and if hang resolution is enabled, resolve hangs in a timely manner.
Over various releases, Hang Manager has been enhanced along with the wait event
infrastructure on which it relies. However, it is only in 11.2.0.2 that Hang
Manager actually resolves any hangs by terminating sessions and/or processes.
This is the default operation in 11.2.0.2. Hang Manager will not terminate an
instance unless the resolution scope, which is controlled by the initialization
parameter _HANG_RESOLUTION_SCOPE, is set to INSTANCE. By default, this
parameter is set to PROCESS.
Hang Manager is only
active when RAC is enabled, that is, CLUSTER_DATABASE = TRUE. Also, the
parameters _HANG_DETECTION_ENABLED and _HANG_RESOLUTION_SCOPE control hang
detection and resolution respectively. Currently, Hang Manager does not operate
on a non-RAC enabled instance.
DIA0 process is responsible for hang detection and deadlock resolution.
NOT TO BE CONFUSED WITH
DIAG process:
diagnosability process, performs diagnostic dumps and executes oradebug
commands
When DIA0 process kills off a process for hang management,
ORA 32701 is thrown, and this indicates that "Hang Management" is in use.
Hang can occur due
to variety of reasons:
- application related
- resource related
- internal blockage in Oracle kernel
- other reason
'Resolvable Hangs' int the incident trace file indicate one or more hangs that were found and identify the
final blocking session and instance on which they occurred. If the current hang resolution state is 'PROCESS',
any hangs requiring
session or process termination are automatically
resolved by DIA0 process
>11.2.0.2, Hang Management will try to
resolve the hang by terminating the ultimate blocker
If 32701 occured, it
is important to determine the cause of the hang.
In the example below, the DIA0
terminated session sid:3768 with serial # 19475 (ospid:73677)
for terminating
blocker process (ospid: 73677 sid: 3768 ser#: 19475) of
hang with
-----------------------------------------------------------------------------------------
ORA-32701: Possible hangs up to hang ID=16 detected
Incident details in: /ade/b/3518352270/oracle/log/diag/rdbms/mainrac/mainrac1/incident/incdir_81/mainrac1_dia0_20207_i81.trc
DIA0 requesting termination of session sid:44 with serial # 1 (ospid:5964) on instance 2
due to a GLOBAL, HIGH confidence hang with ID=16.
Hang Resolution Reason: Although hangs of this root type are typically
self-resolving, the previously ignored hang was resolved.
DIA0 could not resolve a GLOBAL, HIGH confidence hang with ID=16.
-----------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------
Errors in file /u01/app/oracle/diag/rdbms/dm02cdb1/orcl2/trace/orcl2_dia0_87406.trc (incident=31458) (PDBNAME=CDB$ROOT):
ORA-32701: Possible hangs up to hang ID=86 detected
Incident details in: /u01/app/oracle/diag/rdbms/dm02cdb1/orcl2/incident/incdir_31458/orcl2_dia0_87406_i31458.trc
Thu Dec 31 21:59:43 2015
Sweep [inc][31458]: completed
Sweep [inc2][31458]: completed
Thu Dec 31 21:59:44 2015
DIA0 terminating blocker (ospid: 73677 sid: 3768 ser#: 19475) of hang with ID = 86
requested by master DIA0 process on instance 2
Hang Resolution Reason: Previously ignored hang - hang did not self-resolve
as was expected. Automatic hang resolution will be performed.
by terminating session sid:3768 with serial # 19475 (ospid:73677)
DIA0 successfully terminated session sid:3768 with serial # 19475 (ospid:73677) with status 0.
DIA0 successfully resolved a GLOBAL, HIGH confidence hang with ID=86.
-----------------------------------------------------------------------------------------
*** 2015-12-31 21:59:40.213
Resolvable Hangs in the System
Root Chain Total Hang
Hang Hang Inst Root #hung #hung Hang Hang Resolution
ID Type Status Num Sess Sess Sess Conf Span Action
----- ---- -------- ---- ----- ----- ----- ------ ------ -------------------
86 HANG RSLNPEND 2 3768 3 3 HIGH GLOBAL Terminate Process
Hang Resolution Reason: Previously ignored hang - hang did not self-resolve
as was expected. Automatic hang resolution will be performed.
Inst Sess Ser Proc Wait Wait
Num ID Num OSPID Name Time(s) Event
----- ------ ----- --------- ----- ------- -----
PDBID PDBNm
----- ---------------
1 4528 13156 28552 FG 704 library cache lock
11 MODERNIZATION_DEV_12
1 2770 52684 22975 FG 50 library cache pin
11 MODERNIZATION_DEV_12
2 3768 19475 73677 FG 1183 not in wait
11 MODERNIZATION_DEV_12
Dumping process info of pid[65.73677] (sid:3768, ser#:19475)
requested by master DIA0 process on instance 2.
-----------------------------------------------------------------------------------------
*** 2015-12-31 21:59:40.213
[TOC00004]
Process diagnostic dump for oracle@dm02dbadm02.ttiinc.com, OS id=73677,
pid: 65, proc_ser: 187, sid: 3768, sess_ser: 19475
-------------------------------------------------------------------------------
os thread scheduling delay history: (sampling every 1.000000 secs)
0.000000 secs at [ 22:59:39 ]
NOTE: scheduling delay has not been sampled for 0.476107 secs
0.000000 secs from [ 22:59:35 - 22:59:40 ], 5 sec avg
0.000000 secs from [ 22:58:40 - 22:59:40 ], 1 min avg
0.000000 secs from [ 22:54:40 - 22:59:40 ], 5 min avg
*** 2015-12-31 21:59:41.679
loadavg : 2.31 2.47 2.23
System user time: 0.03 sys time: 0.00 context switch: 17233
Memory (Avail / Total) = 75526.99M / 257965.57M
Swap (Avail / Total) = 23089.19M / 24576.00M
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
0 R oracle 73677 1 99 80 0 - 19898980 ? 21:39 ? 00:19:45 oracleorcl2 (LOCAL=NO)