MongoDB Data Modelling and Performance

MongoDB’s real strength is not just flexible documents.

The way you model, organize, and query data directly affects:

performance,
scalability,
maintainability,
and how efficiently MongoDB can use indexes.

A good schema design can make queries extremely fast. A poor design can lead to slow queries, excessive memory usage, and complicated application logic.

Thinking About Data Modelling in MongoDB

In relational databases, schema design usually starts with normalization:

splitting data into multiple tables,
reducing duplication,
and connecting records using joins.

MongoDB approaches this differently.

Instead of optimizing for normalization, MongoDB encourages you to model data based on:

how the application reads data,
how frequently data changes,
and which data is accessed together.

A common MongoDB principle is:

Store data together if it is usually accessed together.

Embedding vs Referencing

One of the most important design decisions in MongoDB is whether related data should be:

embedded inside the same document,
or stored separately and referenced using IDs.

	Embedding	Referencing
What is it?	Store related data inside the same document	Store related data in separate collections and connect them using `ObjectId` references
Best for	Tightly coupled data	Independent or large datasets
Benefits	Faster reads, fewer queries	Better scalability and independent updates
Drawbacks	Documents can grow large	Requires additional queries or `$lookup`
Example	Blog post with embedded comments	Orders referencing users

Example: Embedding

{
  "title": "MongoDB Basics",
  "comments": [
    {
      "user": "Alice",
      "text": "Great article!"
    }
  ]
}

This works well because comments are closely related to the blog post. MongoDB can fetch everything in a single read.

Example: Referencing

{
  "title": "MongoDB Basics",
  "authorId": ObjectId("...")
}

The author information lives in another collection. This is useful when related data is large, reused in multiple places or updated indpendently.

Indexing in MongoDB

Indexes help MongoDB find documents efficiently without scanning the entire collection. Without indexes, MongoDB performs a collection scan, checking every document one by one. Indexes are implemented internally using a B-tree data structure.

Example

Suppose you frequently search users by email:

db.users.find({ email: "alice@example.com" })

// Creating an index
db.users.createIndex({ email: 1 })

Now MongoDB can locate matching users quickly instead of scanning the entire collection.

When MongoDB Uses Indexes

MongoDB uses indexes when query patterns match indexed fields.

Indexes can improve: filtering, sorting, range queries, uniqueness checks, and prefix-based searches.

Range Query Example

db.users.find({
  age: { $gt: 18 }
})

An index on age helps MongoDB efficiently locate matching documents.

Compound Indexes

Compound indexes contain multiple fields.

Example:

{ age: 1, name: 1 }

This index supports - queries on age and queries on age + name.

However, it does not efficiently support queries only on name. This is because MongoDB follows the prefix rule.

Prefix Rule - MongoDB can use a compound index only from the left-most fields onward. So the above index works well for age and name, also works well for age but not for name.

Index Trade-offs

Indexes improve read performance, but they also introduce costs.

Slower writes: Every insert or update must also update indexes.
Indexes consume additional disk space.
Too many indexes or incorrect field order can reduce efficiency.

A good indexing strategy focuses on real query patterns, not indexing every field blindly.

Query Planner

MongoDB has a built-in query planner that decides how queries should execute.

When a query runs, MongoDB:

checks available indexes,
generates multiple candidate plans,
benchmarks them,
and selects the most efficient one.

Winning Plan - The execution plan selected by MongoDB.

Rejected Plans - Alternative plans that were considered but not chosen.

Using explain()

You can inspect query execution using:

db.collection.find(...).explain("executionStats")

The executionStats mode shows: which plan was selected, how many documents were scanned, how many matched,and whether indexes were used.

This is one of the most important tools for diagnosing slow MongoDB queries.

Aggregation Framework

MongoDB’s Aggregation Framework is used for:

analytics,
transformations,
grouping,
joins,
and complex data processing.

It works as a pipeline, where documents pass through multiple stages.

Common Aggregation Stages

Stage	Purpose
`$match`	Filter documents
`$group`	Group documents and calculate aggregates
`$project`	Reshape output fields
`$sort`	Sort results
`$lookup`	Join another collection
`$unwind`	Split array elements into separate documents

Aggregation Example

db.orders.aggregate([
  {
    $group: {
      _id: "$status",
      total: { $sum: 1 }
    }
  }
])

This groups orders by status and counts them.

Note - MongoDB also optimizes aggregation pipelines internally. For example, MongoDB may automatically move $match stages earlier to reduce the amount of data processed.

Conclusion

MongoDB performance depends less on raw hardware and more on how you design your schema, choose indexes, and structure queries. By embedding related data, indexing wisely, and using tools like explain() and aggregation pipelines, you can build applications that scale smoothly while staying efficient.