Friday, January 29, 2016

Oracle Hang Management

Hang Manager (HM) has existed since 10.2.0.1. The main goal of Hang Manager is to reliably detect, and if hang resolution is enabled, resolve hangs in a timely manner. Over various releases, Hang Manager has been enhanced along with the wait event infrastructure on which it relies. However, it is only in 11.2.0.2 that Hang Manager actually resolves any hangs by terminating sessions and/or processes. This is the default operation in 11.2.0.2. Hang Manager will not terminate an instance unless the resolution scope, which is controlled by the initialization parameter _HANG_RESOLUTION_SCOPE, is set to INSTANCE. By default, this parameter is set to PROCESS.

Hang Manager is only active when RAC is enabled, that is, CLUSTER_DATABASE = TRUE. Also, the parameters _HANG_DETECTION_ENABLED and _HANG_RESOLUTION_SCOPE control hang detection and resolution respectively. Currently, Hang Manager does not operate on a non-RAC enabled instance.

DIA0 process is responsible for hang detection and deadlock resolution.

NOT TO BE CONFUSED WITH 
DIAG process: diagnosability process, performs diagnostic dumps and executes oradebug commands

When DIA0 process kills off a process for hang management,
ORA 32701 is thrown, and this indicates that "Hang Management" is in use.

Hang can occur due to variety of reasons:
 - application related
 - resource related
 - internal blockage in Oracle kernel
 - other reason

'Resolvable Hangs' int the incident trace file indicate one or more hangs that were found and identify the final blocking session and instance on which they occurred. If the current hang resolution state is 'PROCESS',
any hangs requiring session or process termination are automatically resolved by DIA0 process

 >11.2.0.2, Hang Management will try to resolve the hang by terminating the ultimate blocker

If 32701 occured, it is important to determine the cause of the hang.

In the example below, the DIA0 terminated session sid:3768 with serial # 19475 (ospid:73677)
 for terminating blocker process (ospid: 73677 sid: 3768 ser#: 19475) of hang with


-----------------------------------------------------------------------------------------
ORA-32701: Possible hangs up to hang ID=16 detected
Incident details in: /ade/b/3518352270/oracle/log/diag/rdbms/mainrac/mainrac1/incident/incdir_81/mainrac1_dia0_20207_i81.trc
DIA0 requesting termination of session sid:44 with serial # 1 (ospid:5964) on instance 2
  due to a GLOBAL, HIGH confidence hang with ID=16.
  Hang Resolution Reason: Although hangs of this root type are typically
  self-resolving, the previously ignored hang was resolved.
DIA0 could not resolve a GLOBAL, HIGH confidence hang with ID=16.
-----------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------
Errors in file /u01/app/oracle/diag/rdbms/dm02cdb1/orcl2/trace/orcl2_dia0_87406.trc  (incident=31458) (PDBNAME=CDB$ROOT):
ORA-32701: Possible hangs up to hang ID=86 detected
Incident details in: /u01/app/oracle/diag/rdbms/dm02cdb1/orcl2/incident/incdir_31458/orcl2_dia0_87406_i31458.trc
Thu Dec 31 21:59:43 2015
Sweep [inc][31458]: completed
Sweep [inc2][31458]: completed
Thu Dec 31 21:59:44 2015
DIA0 terminating blocker (ospid: 73677 sid: 3768 ser#: 19475) of hang with ID = 86
    requested by master DIA0 process on instance 2
    Hang Resolution Reason: Previously ignored hang - hang did not self-resolve
   as was expected.  Automatic hang resolution will be performed.
    by terminating session sid:3768 with serial # 19475 (ospid:73677)
DIA0 successfully terminated session sid:3768 with serial # 19475 (ospid:73677) with status 0.
DIA0 successfully resolved a GLOBAL, HIGH confidence hang with ID=86.

-----------------------------------------------------------------------------------------
*** 2015-12-31 21:59:40.213
Resolvable Hangs in the System
                     Root       Chain Total               Hang              
  Hang Hang          Inst Root  #hung #hung  Hang   Hang  Resolution        
    ID Type Status   Num  Sess   Sess  Sess  Conf   Span  Action            
 ----- ---- -------- ---- ----- ----- ----- ------ ------ -------------------
    86 HANG RSLNPEND    2  3768     3     3   HIGH GLOBAL Terminate Process 
 Hang Resolution Reason: Previously ignored hang - hang did not self-resolve
    as was expected.  Automatic hang resolution will be performed.

  Inst  Sess   Ser             Proc  Wait    Wait
   Num    ID    Num      OSPID  Name Time(s) Event
  ----- ------ ----- --------- ----- ------- -----
        PDBID PDBNm          
        ----- ---------------
      1   4528 13156     28552    FG     704 library cache lock
           11 MODERNIZATION_DEV_12
      1   2770 52684     22975    FG      50 library cache pin
           11 MODERNIZATION_DEV_12
      2   3768 19475     73677    FG    1183 not in wait
           11 MODERNIZATION_DEV_12

Dumping process info of pid[65.73677] (sid:3768, ser#:19475)
    requested by master DIA0 process on instance 2.

-----------------------------------------------------------------------------------------

*** 2015-12-31 21:59:40.213
[TOC00004]
Process diagnostic dump for oracle@dm02dbadm02.ttiinc.com, OS id=73677,
pid: 65, proc_ser: 187, sid: 3768, sess_ser: 19475
-------------------------------------------------------------------------------
os thread scheduling delay history: (sampling every 1.000000 secs)
  0.000000 secs at [ 22:59:39 ]
    NOTE: scheduling delay has not been sampled for 0.476107 secs
  0.000000 secs from [ 22:59:35 - 22:59:40 ], 5 sec avg
  0.000000 secs from [ 22:58:40 - 22:59:40 ], 1 min avg
  0.000000 secs from [ 22:54:40 - 22:59:40 ], 5 min avg

*** 2015-12-31 21:59:41.679
loadavg : 2.31 2.47 2.23
System user time: 0.03 sys time: 0.00 context switch: 17233
Memory (Avail / Total) = 75526.99M / 257965.57M
Swap (Avail / Total) = 23089.19M /  24576.00M
F S UID         PID   PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY          TIME CMD
0 R oracle    73677      1 99  80   0 - 19898980 ?   21:39 ?        00:19:45 oracleorcl2 (LOCAL=NO)





No comments:

Post a Comment