On this page

Embedding vs Referencing
Embedded Documents
Referencing (Normalization)
Decision Matrix
Relationship Patterns
One-to-One
One-to-Many
Many-to-Many
Schema Design Patterns
Polymorphic Pattern
Bucket Pattern
Outlier Pattern
Subset Pattern
Real-World Schema Examples
E-commerce
Blog Platform
Model Design Checklist
Quick Reference
Practice Exercises

Data Modeling & Schema Design

Checking access...

Data modeling in MongoDB is different from SQL. Instead of normalizing into tables and joining with foreign keys, you design documents around your application’s query patterns. The goal: store data the way you query it.

Embedding vs Referencing

The fundamental decision in MongoDB schema design.

Embedded Documents

Store related data inside the parent document:

// Embedded approach — address inside user
{
  _id: ObjectId("..."),
  name: "Alice",
  email: "alice@example.com",
  address: {
    street: "123 Main St",
    city: "London",
    country: "UK",
    zip: "EC1A 1BB"
  }
}

Use embedding when:

Data is contained within the parent (address belongs to one user)
You always query the embedded data with the parent
Embedded data changes rarely
Embedded data has a small and bounded size

Referencing (Normalization)

Store a reference (ObjectId) to another document:

// User document
{
  _id: ObjectId("user1"),
  name: "Alice",
  email: "alice@example.com"
}

// Address document (separate collection)
{
  _id: ObjectId("addr1"),
  userId: ObjectId("user1"),  // Reference
  street: "123 Main St",
  city: "London",
  country: "UK",
  zip: "EC1A 1BB"
}

Use referencing when:

Data is shared across multiple parents (an address has multiple users)
Embedded data grows unboundedly (comments on a popular post)
You query the related data independently (all users in a city)
You need atomic updates to the related data

Decision Matrix

Scenario	Embed	Reference
User has profile (name, email, avatar)	✅ One-to-one	❌
User has addresses (1-3)	✅ Small, bounded	❌
User has orders (unlimited)	❌ Unbounded growth	✅
Product has category	❌ Same category shared	✅
Blog post has comments	❌ Could grow to 1000s	✅
Blog post has tags (5-10)	✅ Small, bounded	❌
Order has line items	✅ Queried together	❌

Relationship Patterns

One-to-One

// Embedded (preferred)
{
  _id: "user1",
  name: "Alice",
  profile: { bio: "Developer", avatar: "alice.jpg", theme: "dark" }
}

// Or referenced (if profile is large or accessed independently)
{
  _id: "user1",
  name: "Alice",
  profileId: "profile1"  // Reference to profiles collection
}

One-to-Many

One-to-few (embed):

// User with addresses (typically 1-3)
{
  _id: "user1",
  name: "Alice",
  addresses: [
    { label: "Home", street: "123 Main", city: "London" },
    { label: "Work", street: "456 High", city: "London" },
  ]
}

One-to-many (reference from child to parent):

// Product with reviews (potentially thousands)
// Store reference in the "many" side
{
  _id: "review1",
  productId: "prod1",   // Reference to product
  userId: "user1",
  rating: 5,
  text: "Great product!"
}

One-to-squillions (reference from parent to child):

// Server with log entries (millions)
// Store references array in parent (with IDs only)
{
  _id: "server1",
  name: "web-01",
  recentLogIds: ["log1", "log2", "log3"]  // Last 3 log IDs only
}

// Or don't reference at all — query by serverId field on log
// log collection entries have { serverId: "server1", ... }

Many-to-Many

Two-way referencing:

// Student
{
  _id: "student1",
  name: "Alice",
  courseIds: ["course1", "course2"]  // References to courses
}

// Course
{
  _id: "course1",
  title: "MongoDB 101",
  studentIds: ["student1", "student2"]  // References to students
}

When to use array of references vs join table:

Use array of references when the relationship is small on both sides (< 500 each side).

Use a join/through collection when the relationship is large or has metadata:

// Enrollment collection (through table)
{
  _id: "enrollment1",
  studentId: "student1",
  courseId: "course1",
  enrolledAt: ISODate("2024-01-15"),
  grade: "A",
  status: "active"
}

Schema Design Patterns

Polymorphic Pattern

Different documents in the same collection with varied schemas:

// products collection
[
  { _id: 1, type: "book", title: "MongoDB Guide", pages: 400, author: "John" },
  { _id: 2, type: "electronics", name: "Laptop", specs: { cpu: "i7", ram: 16 }, warranty: 24 },
  { _id: 3, type: "clothing", name: "T-Shirt", sizes: ["S", "M", "L"], material: "Cotton" },
]

// Query by common fields
db.products.find({ price: { $lt: 100 } });

Bucket Pattern

Group related data into time-based buckets to limit array growth:

// Instead of storing each reading as a document:
// { sensorId: 1, ts: ISODate("..."), temp: 22.5 }
// { sensorId: 1, ts: ISODate("..."), temp: 22.7 }
// ...

// Bucket by hour:
{
  sensorId: 1,
  hour: ISODate("2024-01-15T10:00:00Z"),
  readings: [
    { ts: ISODate("..."), temp: 22.5 },
    { ts: ISODate("..."), temp: 22.7 },
    // ... up to 60 readings per hour
  ],
  readingCount: 42,
  avgTemp: 22.6,
}

Outlier Pattern

Handle edge cases where a few items exceed normal bounds:

// Most products have < 10 reviews — embed them
// Popular products might have 10,000+ reviews — reference them

{
  _id: "product1",
  name: "Normal Product",
  reviews: [  // Embedded for small products
    { userId: "u1", text: "Great!", rating: 5 },
    { userId: "u2", text: "Nice", rating: 4 },
  ],
  reviewCount: 2,
}

{
  _id: "product2",
  name: "Bestseller",
  reviews: "REF:reviews_collection",  // Flag to look in separate collection
  reviewCount: 10427,
  reviewIds: ["rev1", "rev2", ...],   // Last 10 review IDs for quick display
}

Subset Pattern

Store frequently accessed fields on the parent, less-used fields in a sub-collection:

// Frequently displayed fields in the main document
{
  _id: "product1",
  name: "Laptop",
  price: 999,
  rating: 4.5,
  imageUrl: "/images/laptop.jpg",
  // Full details (rarely accessed) in a separate collection
  detailId: "detail1"
}

// Full detail document
{
  _id: "detail1",
  productId: "product1",
  specs: { cpu: "i7", ram: "16GB", storage: "512GB SSD" },
  description: "Long product description with HTML...",
  reviews: [...],
  relatedProducts: [...]
}

Real-World Schema Examples

E-commerce

// User
{
  _id: ObjectId,
  name: String,
  email: String,
  shippingAddresses: [Address],       // Embedded (1-3)
  paymentMethods: [
    { type: "card", last4: "4242", token: "pm_..." }  // Embedded tokens
  ],
  cart: {                             // Embedded (current state)
    items: [{ productId, qty, price }],
    updatedAt: Date,
  },
  createdAt: Date,
}

// Product
{
  _id: ObjectId,
  name: String,
  description: String,
  price: Number,
  categoryId: ObjectId,              // Reference to category
  tags: [String],                    // Embedded (small array)
  variants: [{                       // Embedded (e.g., color, size)
    sku: String,
    color: String,
    size: String,
    stock: Number,
  }],
  ratings: {                         // Computed summary
    average: Number,
    count: Number,
  },
  createdAt: Date,
}

// Order
{
  _id: ObjectId,
  userId: ObjectId,                  // Reference
  items: [{                          // Embedded (snapshot of purchase)
    productId: ObjectId,
    name: String,
    price: Number,
    qty: Number,
  }],
  shipping: {
    address: Address,                // Snapshot
    method: String,
    trackingNumber: String,
  },
  total: Number,
  status: String,                    // "pending", "shipped", "delivered"
  createdAt: Date,
  updatedAt: Date,
}

// Category (shared, referenced)
{
  _id: ObjectId,
  name: String,
  slug: String,
  parentId: ObjectId | null,        // Self-reference for hierarchy
  description: String,
}

Blog Platform

// User
{
  _id: ObjectId,
  username: String,
  email: String,
  bio: String,
  avatar: String,
  stats: {                          // Computed, updated periodically
    postCount: Number,
    followerCount: Number,
    totalViews: Number,
  },
  createdAt: Date,
}

// Post
{
  _id: ObjectId,
  authorId: ObjectId,               // Reference
  title: String,
  slug: String,
  content: String,
  excerpt: String,
  tags: [String],                   // Embedded
  status: String,                   // "draft", "published"
  stats: {
    views: Number,
    likes: Number,
    commentCount: Number,
  },
  publishedAt: Date | null,
  createdAt: Date,
  updatedAt: Date,
}

// Comment (separate collection because unbounded)
{
  _id: ObjectId,
  postId: ObjectId,                 // Reference
  authorId: ObjectId,               // Reference
  text: String,
  parentId: ObjectId | null,        // For nested replies
  likes: Number,
  createdAt: Date,
}

Model Design Checklist

Before finalizing a schema, answer:

Query patterns — What queries will you run most?
Growth — Will embedded arrays grow unboundedly?
Data consistency — Does the data need to be atomic?
Sharing — Is data referenced by multiple parents?
Access patterns — Is data always fetched together?
Write frequency — How often does each field change?

Quick Reference

// Embed when: contained, small, always queried together
// Reference when: shared, unbounded, queried independently

// One-to-few → embed array
{ user: "Alice", addresses: [{ city: "London" }, { city: "NYC" }] }

// One-to-many → reference from child
// child doc: { parentId: ObjectId, ... }

// Many-to-many → two arrays or join collection
// doc1: { doc2Ids: [...] }
// doc2: { doc1Ids: [...] }
// or: { doc1Id, doc2Id, metadata }

Practice Exercises

Model an e-commerce system: Design schemas for a full e-commerce platform: users, products, categories, orders, reviews, and shopping cart. Justify each embedding/referencing decision.
Blog with comments: Compare two designs: (a) embedding comments in posts vs (b) storing comments in a separate collection. Write queries for “get post with last 10 comments” for both designs. Compare performance for 100 comments vs 100,000 comments.
Many-to-many with metadata: Design a schema for students enrolling in courses. Include enrollment date, grade, and status. Write a query to find “all courses Alice is enrolled in with her grade”.

Refactor a flat schema: Below is a poorly designed schema. Identify the problems and redesign it:

{
  name: "Shop",
  products: [{ name, price, category, reviews: [{ user, text, rating }] }],
  employees: [{ name, role, salary, address: { street, city } }],
  suppliers: [{ name, contact, address: { street, city } }],
}

What happens when the shop has 10,000 products? When addresses change?