Thursday, February 13, 2020

MongoDB 4.2 Hybrid Index Build

Earlier -
  Foreground Index Build - Most performant but locks entire database for duration of index build. No reads/writes are permitted.
  Background Index Build - Non-performant. Incremental approach. Periodically locks database, but yields to incoming read/write operations. If the index is larger than available RAM, background index can take much longer than foreground index.
Another downside is that index structure resulting from background build is worse than the index structure resulting from a foreground build.

Hybrid Index Build - Best of both worlds. Performance of foreground index build and the non-locking property of background index build.
Index structure remains unchanged.

Under the hood

Every data collection in WiredTiger is called Table. All collection files, index files in the db path are supported in WT by table objects.

collection-4--7758868473387840549.wt
collection-6--6388518888314681728.wt
index-1--6388518888314681728.wt
index-1--7758868473387840549.wt

Aside from clearly identifiable collection and index table files, there are some internal tables used by MongoDB to write index keys during index build and a temporary WT table that is used to accommodate some writes that need to be staged before being inserted in the expected collection or index table.

1. Take exclusive lock on the collection and create 2 temporary tables for index creation.
    These are visible in the dbPath for the duration of the index build.
2. Remove the exclusive lock and apply a weaker lock on the collection.
3. Start collection scan.
  - While doing collection scan all index keys are generated in an external sorter - similar to foreground index build.
  - During this time, all the index keys from the inserts are side written into a temporary table.
    Documents are written to the collection as normal. Only index keys are written to the temp table.
4. After completing collection scan, keys are indexed in order.
   - Temp table is drained of the index keys, and index is created with ordered index keys from temp table.
5. If the index being created is a unique index, duplicate key violations are checked.
   - The second temp table is used to keep track of duplicate keys.
   - Only a the end of the index build process are constraint violations checked and error is returned.
6. The temp tables are removed and locks are released.
 

Sunday, February 2, 2020

GeoJSON

Format for encoding geographic Data Structures

location : {
  "type" : "point",
  "coordinates" : [12.2, 13.1]
}

GeoJSON supports following geometries -
 - Point
 - MultiPoint
 - Polygon
 - MultiPolygon
 - LineString
 - MultiLineString

 - GeometryCollection
 - Feature
 - FeatureCollection

Geospatial Indexes 

2dsphere - support queries that calculate geometries on earth like spheres
     db.collection.createIndex( { <location field> : "2dsphere" } )
2d - support queries that calculate geometries in s 2D plane
     db.collection.createIndex( { <location field> : "2d" } )

These indexes can not cover a query.
These indexes can not be used in shard key.

Following geospatial operations are supported on sharded collections - 
  - $geoNear aggregation stage
  - > 4.0, $near and $nearSphere query operators

geoHaystack - optimised to return results over a small area. Improves performance of queries that use flat geometry. For queries using spherical geometry, better to use 2dSphere indexes. 
geoHaystack requires first field to be location. 
Creates buckets of documents from same geographic area to improve query performance. 
Each bucket contains all documents in specified proximity to a given latitude and longitude.
sparse by default. 

https://docs.mongodb.com/manual/tutorial/query-a-geohaystack-index/#geospatial-indexes-haystack-queries


Query Selectors

$geoIntersect - selects geometries that intersect with the given geometry
    2dSphere supports this. Does not require the geospatial index.

$geoWithin - selects geometries within bounding geometry. Both 2dSphere and 2d indexes support this
     $geometry, $centerSphere
   
     $centerSphere takes distance in radians.
     3963.2 miles - radius of earth. Divide the distance by this to get result in radians

$near - returns geospatial objects in proximity to point. Both 2dSphere and 2d indexes support this

$nearSphere - returns geospatial objects in proximity to a point on a sphere
Both 2dSphere and 2d indexes support this
      $geometry, $minDistance, $maxDistance


$geoNear - AggregationFramework
   - near .. GeoJSON point
   - spherical .. must be true if using a 2dSphere index
   - maxDistance, minDistance .. meters
   - distanceField
   - distanceMultiplier

1609.34 converts miles to meters

Geometry Specifiers -

$box - specifies a rectangular box using legacy coord pairs for $geoWithin. 2d index

$center - specifies a circle using legacy coord pairs for $geoWithin. 2d index

$centerSphere - specifies a circle using either legacy coord pairs or GeoJSON object with $geoWithin.
2d or 2dSphere index.

$geometry - specifies a geometry in GeoJSON format for search with geoSpatial operators.
uses EPSG:4326 as default Coordinate Reference System(CRS).

$geometry: {
   type: "<GeoJSON object type>",
   coordinates: [ <coordinates> ]
}

s.find({ "o": {$eq:"99e01530-4538-4e26-b834-0072011af7ed"},
         "t":{$gte: 0, $lte: 176829836394000 },
         "l": {$geoWithin:
                 {$geometry:
                    { type: "Polygon",
                         coordinates:[
                              [[59.907542457804084, 86.79600024595857],
                               [59.907542457804084, 80.64036540314555],
                               [66.908002791926265, 80.64036540314555],
                               [66.908002791926265, 86.79600024595857],
                               [59.907542457804084, 86.79600024595857]
                              ] 
                         ]
                    }
                 }
              }
       }


$maxDistance - specifies max distance in meters to limit the results of $near and $nearSphere.
2d and 2dSphere indexes.

$minDistance - specifies min distance in meters to limit the results of $near and $nearSphere.
2dSphere indexes only

$polygon - specifies polygon in legacy coord pairs for $geoWithin. uses planar geometry.
2d Index. Can even query without 2d Index.

$uniqueDocs - deprecated. modifies $geoWithin and $near queries to ensure that even a document that matches query multiple times, is returned only once.




https://docs.mongodb.com/manual/reference/operator/query-geospatial/