Thursday, February 13, 2020

MongoDB 4.2 Hybrid Index Build

Earlier -
  Foreground Index Build - Most performant but locks entire database for duration of index build. No reads/writes are permitted.
  Background Index Build - Non-performant. Incremental approach. Periodically locks database, but yields to incoming read/write operations. If the index is larger than available RAM, background index can take much longer than foreground index.
Another downside is that index structure resulting from background build is worse than the index structure resulting from a foreground build.

Hybrid Index Build - Best of both worlds. Performance of foreground index build and the non-locking property of background index build.
Index structure remains unchanged.

Under the hood

Every data collection in WiredTiger is called Table. All collection files, index files in the db path are supported in WT by table objects.

collection-4--7758868473387840549.wt
collection-6--6388518888314681728.wt
index-1--6388518888314681728.wt
index-1--7758868473387840549.wt

Aside from clearly identifiable collection and index table files, there are some internal tables used by MongoDB to write index keys during index build and a temporary WT table that is used to accommodate some writes that need to be staged before being inserted in the expected collection or index table.

1. Take exclusive lock on the collection and create 2 temporary tables for index creation.
    These are visible in the dbPath for the duration of the index build.
2. Remove the exclusive lock and apply a weaker lock on the collection.
3. Start collection scan.
  - While doing collection scan all index keys are generated in an external sorter - similar to foreground index build.
  - During this time, all the index keys from the inserts are side written into a temporary table.
    Documents are written to the collection as normal. Only index keys are written to the temp table.
4. After completing collection scan, keys are indexed in order.
   - Temp table is drained of the index keys, and index is created with ordered index keys from temp table.
5. If the index being created is a unique index, duplicate key violations are checked.
   - The second temp table is used to keep track of duplicate keys.
   - Only a the end of the index build process are constraint violations checked and error is returned.
6. The temp tables are removed and locks are released.
 

No comments:

Post a Comment