Without Haste: MongoDB Notes

MongoDb is a document database (i.e. a no SQL database).

It is advertised as cross-platform, providing high performance, high availability, and easy scalability.

Schemaless

MongoDb is a schemaless database. Every document can be a different structure, if you want.

Relationships

MongoDb does not support relationships between Collections or Documents.
If you want data to be used together, save it as one document.

Instead of relying on foreign keys, just duplicate the data whereever it is needed.
MongoDb says disk space is cheap.

A database contains multiple Collections.

Each database on a server has it's own section of the file system.

A collection contains multiple Documents.

A collection is analogous to a SQL Table.

Dynamic Schema

MongoDb Collections do not enforce a schema, meaning that documents in the same collection do not have to have the same fields.
Fields with the same Key do not need to contain the same data types.

Design

What to divide into multiple Collections, and what to keep in one Collection?

Put multiple types in one Collection:
- Collections cannot be joined for queries.
- Aggregations cannot be performed across multiple Collections.

One type per Collection:
- To deserialize, you must know what type you are dealing with.
- If mixing types in a Collection, make sure they can be easily differentiated. Maybe by a "type" field.
- Indexes on the Collection will be updated whether or not the new/edited records contains the field being indexed.
- If the Indexed fields are all shared fields, that's an indication these types should be in the same Collection.
- If only part of a Collection is frequently updated, it should probably be its own Collection. Otherwise you'll get locking conflicts.
- Ex: An "auction item" has an array of "bids" under it. Now it's hard for people to all save their bids because they lock each other out of the database.

Many suggestions online says don't nest data more than 1 level deep. But that probably depends on how variable the structure is and how precise of a programmer you are.
- Ex: Don't have a array of objects which also contain arrays. At least, not when you'll want to search by those value.

Collection-Level Operations

count the documents
you can enter any "find" query into the method


db.getCollection("name").countDocuments({"field":"value"});

the raw integer result is printed at the end of the console output

insert multiple documents into collection


var allDocs=
[
    { "_id":"1" },
    { "_id":"2", "parentDocId":"1" },
];
db.collection_name.insert(allDocs);

remove all documents from collection


db.collection_name.remove({});

A document contains multiple Key/Value Pairs called Fields.

A document is analogous to a SQL Row/Record.

JSON

Documents are displayed, edited, etc as JSON objects.

Ex:


{
    _id: ObjectId(7df78ad8902c),
    title: 'Test',
    comments: [
        {
            user: 'Steve',
            comment: 'Test comment'
        }
    ]
}

Default Key

MongoDb provides a default key Field called "_id".
You can specify the _id when inserting a record, or allow MongoDb to generate it.
The default _id is made up of the timestamp, machine id, process id, and sequence number.

Mongo's query language is called MQL.

Comments


//comment out a line of MQL

Find

Find will return a list of records.

Where field equals X:


db.getCollection('MyCollection').find({"myField": "X"})
db.getCollection('MyCollection').find({"myObject.myField": "X"})
db.getCollection('MyCollection').find({"myField": UUID("12345678-1234-1234-1234-123456789012")})

Where field exists:


db.getCollection('MyCollection').find({"myObject.myField":{$exists: true}})

If any link in the path to the field does not exist, then the field does not exist.

And


db.getCollection('MyCollection').find({$and: [{"myFieldA": "A"}, {"myFieldB": "B"}])

Array is at least 1-element long


db.getCollection.find({'myArray.0': {$exists: true}})

(indexing starts at 0)

Array is exactly 2-elements long


db.getCollection.find({'myArray': {$size: 2}})

Sort

Sort by age descending:


db.collection.find().sort( { age: -1 } )

Limit

Return just the first X records.


db.collection.find().sort( { age: -1 } ).limit(50)

If sort gives you an "exceeded memory limit" then add a limit to the number of results.

Distinct


db.getCollection('customers').distinct('firstName')

Returns all the distinct values of the field "firstName" from collection "customers".

String Operations

String minus last two characters:


myField: { $substr: [ "$originField", 0, { $subtract: [ { $strLenCP: "$originField" }, 2 ] } ] }

Last two characters of string:


myField: { $substr: [ "$originField", { $subtract: [ { $strLenCP: "$originField" }, 2 ] }, -1 ] }

String contains


db.getCollection('customers').find({fullName: { $regex: '.*Steve.*' } })
db.getCollection('customers').find({fullName: { $regex: /.*Steve.*/ } })

Capitalization matters

Aggregate

Count results (returns integer):


db.getCollection('MyCollection').find({"myField": "X"}).count()

Aggregation Pipeline

Each expression (array element) in an aggregate can be mixed up in any order, repeated, etc.
Ex: You can have three "match" expressions, then a "replaceRoot", then another "match".

Just like find:


db.getCollection('Customers').aggregate([
    { $match: { _id: UUID("customer's uuid") } }
])

Find, then raise a nested document to be the new root of each result:


//given this customer format
{
    _id: UUID("uuid"),
    age: 35,
    address: {
        street: "street",
        city: "city",
        state: "state"
    }
}


db.getCollection('Customers').aggregate([
    { $match: { _id: UUID("customer's uuid") } },
    { $replaceRoot: { newRoot: "$address" } } //pulls all of address up be the root
])


//results in
{
    street: "street",
    city: "city",
    state: "state"
}


db.getCollection('Customers').aggregate([
    { $match: { _id: UUID("customer's uuid") } },
    { $replaceRoot: { newRoot: { age: "$age", city: "$address.city" } } } //flattens different levels together
])


//results in
{
    age: 35,
    city: "city"
}

Group by:
Start with the group by id, then add as many aggregations as you want.


db.getCollection('Customers').aggregate([
    { $group: { _id: "$idField", arrayA: { $addToSet: { "fieldA":"$aValue", "fieldB":"$bValue" } } } }
])

A group with a multi-part key


db.getCollection('Customers').aggregate([
    { $group: { _id: { a: "$a", b: "$b" } } }
])

"addToSet" creates an array of unique values


db.getCollection('Customers').aggregate([
    { $group: { _id: "$idField", arrayA: { $addToSet: "$aField" }, arrayB: { $addToSet: "$bField" } } }
])

And don't look for a DISTINCT operation in aggregate pipelines, there isn't one, group by is the only option.

AddField:
Add a new field to the documents


{ $addFields: { <newField>: <expression>, ... } }

Project to filter an array:


db.getCollection('MyCollection').aggregate([
{
    "$project" : {
        "field_to_keep_as_is": 1,
        "my_filtered_array" : {
            "$filter" : {
                "input" : "$my_array",
                "as" : "my_array", /*defaults to "this"*/
                "cond" : {
                    /*$eq: ["$$my_array.some_field", "some_value" ]*/ /*if you want to check for a value*/
                    $not: ["$$my_array.field_that_might_not_exist"] /*$exists doesn't work in here*/
                }
            }
        }
    }
}
]);

Simplify an array


db.getCollection('poc_agencies').aggregate([
    { $project: {  
            "array_of_strings": {
                $reduce: {
                    input: "$input_array_of_objects",
                    initialValue: [],
                    in: { $concatArrays: [ "$$value", ["$$this.keep_just_this_field"] ] }
                }
            }
        } 
    }
]);

- $$value refers to the current accumulated value
- $$this refers to the next array element being operated on

Unwind, to break an array into individual objects


db.getCollection('MyCollection').aggregate([
{
    $unwind: "$my_array"
}
]);

Flatten a recursive lookup


db.getCollection('agencies').aggregate([
    { $graphLookup: {
            from: 'agencies', //name of the collection to search
            startWith: '$_id', //field name to start with, probably same as connectionFromField but with a $ symbol
            connectFromField: '_id', //this field in parent
            connectToField: 'parentAgencyId', //connects to this field in child
            as: 'descendantAgencyIds', //put all the results into an array named this
            maxDepth: 100 //stop recursive lookup at this depth, to avoid infinite loops
        } 
    }
])

Overwrite an entire collection with the results of this pipeline


db.getCollection('input_collection_name').aggregate([
    { $out : { db: "database_name", coll: "output_collection_name" } }
])

$out must be the last step in the pipeline


db.collection.updateOne()
db.collection.updateMany()
db.collection.update()


db.runCommand(
   {
      update: <collection>,
      updates: [
         {
           q: <query>,
           u: <document or pipeline>,
           upsert: <boolean>,
           multi: <boolean>,
           collation: <document>,
           arrayFilters: <array>,
           hint: <document|string>
         },
         ...
      ],
      ordered: <boolean>,
      writeConcern: { <write concern> },
      bypassDocumentValidation: <boolean>,
      comment: <any>
   }
)

Comment

Optional.

A user-provided comment to attach to this command. Once set, this comment appears alongside records of this command in the following locations:
- mongod log messages
- database profiler output
- currentOp output

Ordered

Optional. Defaults to true.

True: when an update statement fails, return without performing the remaining update statements.
False: when an update fails, continue with the remaining update statements, if any.

Updates

An array of one or more update statements to perform on the named collection.


{
    q: <query>,
    u: <document or pipeline>,
    upsert: <boolean>,
    multi: <boolean>,
    collation: <document>,
    arrayFilters: <array>,
    hint: <document|string>
}

Q: Query

U: Modifiers

Any of:
- document containing update operator expressions
- a replacement document
- an aggregation pipeline (MongoDB 4.2 or later)

Upsert

True: if no documents match the query, perform an insert.

Multi

Defaults to false.

True: update all documents that meet the query criteria.
False: update only 1 document.

Collation

For string comparisons.