Thursday, April 17, 2014

Oracle wait events

There are various events that might induce waiting time for a database operation.
These events are known are wait events.

These wait events are classified into various wait classes based on the type of operation they affect.

The wait events can be observed using the following views:

                v$SESSION_EVENT
                v$SESSION_WAIT
                v$SYSTEM_EVENT
                v$EVENT_NAME

Following are descriptions of some important wait events:

db sequential file reads

  This wait event occurs when Oracle process wants to read a block that is not currently in the Buffer cache, and it is waiting to read the block into SGA from disk.
  A sequential read is generally a single-block read. Sequential read reads data into contiguous memory in SGA.
 
  Causes:
   
    - use of unselective index
    - fragmented index
    - High I/O on particular disk
    - bad application design
 
  Action:
   
    Faster data block identification, faster disk to memory transfer, caching of data in memory will decrease the occurrence and intensity of this wait event.
   
    Attention needs to be paid to following aspects:
   
    - check that right index is being used to access the table
    - check the column order of index with the WHERE clause in SQL
    - use partitioning to reduce the number of blocks visited
    - statistics should be up to date
    - relocate HOT datafiles
    - consider using KEEP pool to cache frequently accessed indexes/table
    - check the execution plans to ascertain the access paths, join methods   
 
db file scattered reads

  This wait happens when a session is waiting for multi-block IO to complete.
  A scattered read reads multiple blocks and scatters them into different buffers in SGA.
  Typically happens during Full Table Scans and Index Fast Full Scans.
  The number of consecutive blocks read at one time is determined by parameter,    
   DB_FILE_MULTIBLOCK_READ_COUNT.
 
  Causes:
   
    - Full Table Scans
    - Index Fast Full Scans
 
  Actions:
 
    - If an application has been running fine for some time, and there has been no code 
      change, and this wait event starts to show up, check if one or more indexes have
      been dropped
    - optimize multiblock read count by setting appropriate 
       DB_FILE_MULTIBLOCK_READ_COUNT
    - Consider caching frequently accessed indexes/tables in KEEP pool
    - use partitioning to reduce the number of blocks visited
    - optimize SQL with the goal to reduce the number of physical and logical reads
    - check if an Index range scan or unique scan can be done instead of FTS or IFFS
    - make sure statistics are up to date

buffer busy waits
 
  This wait happens when a sessions wants to access a block in Buffer cache but cant because the block is busy.
  A buffer must be PINNED before it can be read/modified. Only one process can pin a buffer at one time.
  This event represents read/read, read/write and write/write contention.
  The reason code is present in the P3 column of v$SESSION_WAIT.
 
  Causes:
 
    - another session is reading the block into the buffer
    - another session holds the block in an incompatible mode to our request
   
    It can get intensified if
    - the block size is large, thus block contains a high number of rows
    - lot of sessions are asking for the same blocks
 
  Actions:
 
    Main method to reduce the buffer busy waits is to reduce the total IO on system
   
    Data Blocks
     
      - eliminate hot blocks from application (probably by providing redundancy)
      - check for repeatedly scanned, unselective indexes
      - try rebuilding object with higher PCTFREE so as to reduce the 
         number of rows per block
      - use automatic segment space management
   
    Segment Header
   
      - increase the number of FREELISTs
      - use automatic segment space management
   
    Undo Blocks
   
      - increase number of rollback segments
      - use automatic undo management
   
   
log file parallel write

  This wait is the LGWR wait that occurs when the OS writes contents of the redo buffer to the online log files.
  LGWR does a group write for multiple commits/rollbacks, hence, parallel write.
 
  Causes:
 
    - large amount of redo being generated
    - slow IO subsystem
   
  Actions:
 
    - reduce amount of redo being generated
    - don't leave tablespaces in HOT backup mode longer than necessary
    - use faster disks for redo log files
    - ensure that disks holding the archived logs and those holding online redo logs are 
       separate to reduce LGWR, ARCn contention
    - consider using NOLOGGING option
 
log file sync

  This wait is logged against the oracle foreground process that is waiting for COMMIT/ROLLBACK to complete
 
  Actions:
 
    - reduce number of commits by batching transactions, so that there are fewer 
       COMMIT operations

free buffer waits

  This wait happens when we are waiting for free buffer in the buffer cache but there are none as there are too many dirty buffers
 
  Causes:
   
    - either buffer cache is too small or DBWn is too slow in flushing bocks to disk
    - DBWn is unable to keep up to the write requests
    - Checkpoints are happening too fast. This in turn could be due to
       
        - high database activity
        - small Online Redo Log file size
       
    - Large sorts and FTS's are filling the buffer cache faster than 
      DBWn is able to write to disk
    - if number of blocks that need to be written to disk is greater than 
      number of blocks that DBWn can write in one batch
 
  Actions:
 
    - reduce checkpoint frequency, increase size of redo logfiles
    - check the V$DB_CACHE_ADVICE to optimally size the DB buffer
    - ensure hotspots don't exists by spreading datafiles over disks and disk controllers
    - pre-sorting, reorganization of data can help
   
enqueue waits

  This wait events indicates a wait for a lock that in held by other session/s in a mode incompatible to the requested mode.
  There are over 60 types of enqueue waits. Can be viewed through V$ENQUEUE_STAT.
  
Major among them are:
 
    TX - transaction lock
   
      This indicates contention for row level lock. It is related to buffer busy waits
     
      Causes:
     
        - when a transaction tries to update/delete rows that are currently locked by 
           another session
        - usually an application issue (bad application design)
     
      Actions:
       
        - Since this is a blocking session wait, the action is to find the blocking session/s 
          and take appropriate action
       
    TM - DML enqueue lock
   
      This is related to transaction management.
     
      Causes:
     
        - unindexed foreign key columns
     
      Actions:
       
        - index the foreign key columns

LATCH:

  Latches are lightweight mechanisms that are used to serialize access to memory structures.
  It is not possible to queue for latch - a process either gets the latch or does not.
  Various types of latches can be seen through v$LATCH view.
 
  Latches apply only to memory structures in SGA. They do not apply to database objects.
  The SGA has many latches and they exist to protect various memory structures from potential corruption by concurrent access. 
  Memory structures protected by latches include things like buffer cache, java pool and library cache.
 
  A latch is requested by process in 2 modes:
   
 Immediate mode - A process asks for latch, if it fails to acquire, the control is returned to the process
Willing to wait mode - A process asks for a latch,
                                                       if it fails to acquire,
                                                          it keeps requesting it for specified number of 
                                                          times (spinning),
                                                       if it still fails to acquire,
                                                          it sleeps, then wakes up after certain specified 
                                                          time and begins the whole process again.
 
cache buffer chain latch

  This latch is acquired when searching for data blocks.
  Buffer cache is a chain of blocks and each chain is protected by a latch while it is being scanned.
 
  Causes:
   
    - when multiple sessions repeatedly access one or more blocks that are protected by
      the same child 'cache buffer chain' latch
    - sql statements with high buffer gets (logical reads) per execution
 
  Actions:
 
    - reduce logical IO rates by tuning and minimizing IO requirements of involved SQL
    - reducing number of rows per block
    - reducing block size
 
cache buffer LRU chain latch

   This latch is acquired to introduce a new block into the buffer cache based on LRU block replacement policy.
  
   Causes:
   
     - intense buffer cache activity caused by inefficient SQL
     - SQL scanning large unselective indexes, performing FTS
  
   Actions:
  
     - implement multiple buffer pools
     - increase buffer cache size
  
  
Direct path reads and direct path writes wait events are related to operations that are performed in PGA like sorting, group by operation, hash join.

direct path reads

  Direct path reads are used by oracle when reading directly into PGA bypassing SGA.
  Direct path reads are common in DSS or during heavy batch activity.
 
  Causes:
 
    - sorting operations, direct path reads usually occur in connection with 
       temporary segments
    - sql statements that require sort (order by, group by, union, distinct, rollup)
    - join methods that require sort (hash join, sort merge join)
 
  Actions:
   
    - check the sql to see if the IO can be reduced
    - set PGA_AGGREGATE_TARGET to appropriate value as advised by 
       V$PGA_TARGET_ADVICE
    - wherever possible, use Hash Join instead of Sort Merge join, and Nested Loops 
       instead of Hash Join
    - make sure optimizer selects right driving table 
 
direct path writes

  These are waits that are associated with direct write operations that write data into user's PGA to data files or to temporary segments
 
  Causes:
 
    - direct load operations ( like CTAS)
    - parallel DML operations
    - Sort IO (when a sort doesn't fit into PGA)
   
  Actions:
 
    - ensure OS asynchronous IO is configured correctly
    - ensure no disks are IO bound
   
latch free waits

  This wait indicates that the process is waiting for a latch that is currently busy.
 
  When a process is spinning, the spinning is recorded as CPU time, and when it goes to sleep, it is recorded as 'latch free wait' event.
   
library cache latch

  This latch prevents the cached sql statements and object definitions staored in the library cache within the shared pool.
  This latch must be acquired to add a statement to the library cache.
 
  This latch is to ensure that application is reusing as much SQL statement representation as possible. 
 
  Causes:
   
    - application is making heavy use of literal sql
 
  Actions:
   
    - consider increasing shared pool size, based on the shared pool advisor 
       (v$SHARED_POOL_ADVICE)
    - use bind variables wherever possible
   
shared pool latch

  This latch is used to protect critical operations when allocating and freeing memory in shared pool.
 
  Causes:
 
    - the shared pool latch waits and the library cache latch waits are generally due to intense hard parsing
 
  Actions:
 
    - avoid hard parses where possible
    - eliminate literal SQL, use bind variables
    - consider increasing shared pool size, based on the shared pool advisor 
       (v$SHARED_POOL_ADVICE)
    - try setting CURSOR_SHARING to FORCE if hard parses persist
   
row cache objects latch

  This latch comes into play when processes are attempting to access the cached data dictionary values.
 
  Generally, this latch is not a source of contention, but if there is significant wait on this latch, SHARED_POOL_SIZE needs to be corrected.

  Tuning library cache tunes the row cache indirectly.

No comments:

Post a Comment