Data Modeling & Schema Design
Checking access...
Data modeling in MongoDB is different from SQL. Instead of normalizing into tables and joining with foreign keys, you design documents around your application’s query patterns. The goal: store data the way you query it.
Embedding vs Referencing
The fundamental decision in MongoDB schema design.
Embedded Documents
Store related data inside the parent document:
// Embedded approach — address inside user{ _id: ObjectId("..."), name: "Alice", email: "alice@example.com", address: { street: "123 Main St", city: "London", country: "UK", zip: "EC1A 1BB" }}Use embedding when:
- Data is contained within the parent (address belongs to one user)
- You always query the embedded data with the parent
- Embedded data changes rarely
- Embedded data has a small and bounded size
Referencing (Normalization)
Store a reference (ObjectId) to another document:
// User document{ _id: ObjectId("user1"), name: "Alice", email: "alice@example.com"}
// Address document (separate collection){ _id: ObjectId("addr1"), userId: ObjectId("user1"), // Reference street: "123 Main St", city: "London", country: "UK", zip: "EC1A 1BB"}Use referencing when:
- Data is shared across multiple parents (an address has multiple users)
- Embedded data grows unboundedly (comments on a popular post)
- You query the related data independently (all users in a city)
- You need atomic updates to the related data
Decision Matrix
| Scenario | Embed | Reference |
|---|---|---|
| User has profile (name, email, avatar) | ✅ One-to-one | ❌ |
| User has addresses (1-3) | ✅ Small, bounded | ❌ |
| User has orders (unlimited) | ❌ Unbounded growth | ✅ |
| Product has category | ❌ Same category shared | ✅ |
| Blog post has comments | ❌ Could grow to 1000s | ✅ |
| Blog post has tags (5-10) | ✅ Small, bounded | ❌ |
| Order has line items | ✅ Queried together | ❌ |
Relationship Patterns
One-to-One
// Embedded (preferred){ _id: "user1", name: "Alice", profile: { bio: "Developer", avatar: "alice.jpg", theme: "dark" }}
// Or referenced (if profile is large or accessed independently){ _id: "user1", name: "Alice", profileId: "profile1" // Reference to profiles collection}One-to-Many
One-to-few (embed):
// User with addresses (typically 1-3){ _id: "user1", name: "Alice", addresses: [ { label: "Home", street: "123 Main", city: "London" }, { label: "Work", street: "456 High", city: "London" }, ]}One-to-many (reference from child to parent):
// Product with reviews (potentially thousands)// Store reference in the "many" side{ _id: "review1", productId: "prod1", // Reference to product userId: "user1", rating: 5, text: "Great product!"}One-to-squillions (reference from parent to child):
// Server with log entries (millions)// Store references array in parent (with IDs only){ _id: "server1", name: "web-01", recentLogIds: ["log1", "log2", "log3"] // Last 3 log IDs only}
// Or don't reference at all — query by serverId field on log// log collection entries have { serverId: "server1", ... }Many-to-Many
Two-way referencing:
// Student{ _id: "student1", name: "Alice", courseIds: ["course1", "course2"] // References to courses}
// Course{ _id: "course1", title: "MongoDB 101", studentIds: ["student1", "student2"] // References to students}When to use array of references vs join table:
Use array of references when the relationship is small on both sides (< 500 each side).
Use a join/through collection when the relationship is large or has metadata:
// Enrollment collection (through table){ _id: "enrollment1", studentId: "student1", courseId: "course1", enrolledAt: ISODate("2024-01-15"), grade: "A", status: "active"}Schema Design Patterns
Polymorphic Pattern
Different documents in the same collection with varied schemas:
// products collection[ { _id: 1, type: "book", title: "MongoDB Guide", pages: 400, author: "John" }, { _id: 2, type: "electronics", name: "Laptop", specs: { cpu: "i7", ram: 16 }, warranty: 24 }, { _id: 3, type: "clothing", name: "T-Shirt", sizes: ["S", "M", "L"], material: "Cotton" },]
// Query by common fieldsdb.products.find({ price: { $lt: 100 } });Bucket Pattern
Group related data into time-based buckets to limit array growth:
// Instead of storing each reading as a document:// { sensorId: 1, ts: ISODate("..."), temp: 22.5 }// { sensorId: 1, ts: ISODate("..."), temp: 22.7 }// ...
// Bucket by hour:{ sensorId: 1, hour: ISODate("2024-01-15T10:00:00Z"), readings: [ { ts: ISODate("..."), temp: 22.5 }, { ts: ISODate("..."), temp: 22.7 }, // ... up to 60 readings per hour ], readingCount: 42, avgTemp: 22.6,}Outlier Pattern
Handle edge cases where a few items exceed normal bounds:
// Most products have < 10 reviews — embed them// Popular products might have 10,000+ reviews — reference them
{ _id: "product1", name: "Normal Product", reviews: [ // Embedded for small products { userId: "u1", text: "Great!", rating: 5 }, { userId: "u2", text: "Nice", rating: 4 }, ], reviewCount: 2,}
{ _id: "product2", name: "Bestseller", reviews: "REF:reviews_collection", // Flag to look in separate collection reviewCount: 10427, reviewIds: ["rev1", "rev2", ...], // Last 10 review IDs for quick display}Subset Pattern
Store frequently accessed fields on the parent, less-used fields in a sub-collection:
// Frequently displayed fields in the main document{ _id: "product1", name: "Laptop", price: 999, rating: 4.5, imageUrl: "/images/laptop.jpg", // Full details (rarely accessed) in a separate collection detailId: "detail1"}
// Full detail document{ _id: "detail1", productId: "product1", specs: { cpu: "i7", ram: "16GB", storage: "512GB SSD" }, description: "Long product description with HTML...", reviews: [...], relatedProducts: [...]}Real-World Schema Examples
E-commerce
// User{ _id: ObjectId, name: String, email: String, shippingAddresses: [Address], // Embedded (1-3) paymentMethods: [ { type: "card", last4: "4242", token: "pm_..." } // Embedded tokens ], cart: { // Embedded (current state) items: [{ productId, qty, price }], updatedAt: Date, }, createdAt: Date,}
// Product{ _id: ObjectId, name: String, description: String, price: Number, categoryId: ObjectId, // Reference to category tags: [String], // Embedded (small array) variants: [{ // Embedded (e.g., color, size) sku: String, color: String, size: String, stock: Number, }], ratings: { // Computed summary average: Number, count: Number, }, createdAt: Date,}
// Order{ _id: ObjectId, userId: ObjectId, // Reference items: [{ // Embedded (snapshot of purchase) productId: ObjectId, name: String, price: Number, qty: Number, }], shipping: { address: Address, // Snapshot method: String, trackingNumber: String, }, total: Number, status: String, // "pending", "shipped", "delivered" createdAt: Date, updatedAt: Date,}
// Category (shared, referenced){ _id: ObjectId, name: String, slug: String, parentId: ObjectId | null, // Self-reference for hierarchy description: String,}Blog Platform
// User{ _id: ObjectId, username: String, email: String, bio: String, avatar: String, stats: { // Computed, updated periodically postCount: Number, followerCount: Number, totalViews: Number, }, createdAt: Date,}
// Post{ _id: ObjectId, authorId: ObjectId, // Reference title: String, slug: String, content: String, excerpt: String, tags: [String], // Embedded status: String, // "draft", "published" stats: { views: Number, likes: Number, commentCount: Number, }, publishedAt: Date | null, createdAt: Date, updatedAt: Date,}
// Comment (separate collection because unbounded){ _id: ObjectId, postId: ObjectId, // Reference authorId: ObjectId, // Reference text: String, parentId: ObjectId | null, // For nested replies likes: Number, createdAt: Date,}Model Design Checklist
Before finalizing a schema, answer:
- Query patterns — What queries will you run most?
- Growth — Will embedded arrays grow unboundedly?
- Data consistency — Does the data need to be atomic?
- Sharing — Is data referenced by multiple parents?
- Access patterns — Is data always fetched together?
- Write frequency — How often does each field change?
Quick Reference
// Embed when: contained, small, always queried together// Reference when: shared, unbounded, queried independently
// One-to-few → embed array{ user: "Alice", addresses: [{ city: "London" }, { city: "NYC" }] }
// One-to-many → reference from child// child doc: { parentId: ObjectId, ... }
// Many-to-many → two arrays or join collection// doc1: { doc2Ids: [...] }// doc2: { doc1Ids: [...] }// or: { doc1Id, doc2Id, metadata }Practice Exercises
Model an e-commerce system: Design schemas for a full e-commerce platform: users, products, categories, orders, reviews, and shopping cart. Justify each embedding/referencing decision.
Blog with comments: Compare two designs: (a) embedding comments in posts vs (b) storing comments in a separate collection. Write queries for “get post with last 10 comments” for both designs. Compare performance for 100 comments vs 100,000 comments.
Many-to-many with metadata: Design a schema for students enrolling in courses. Include enrollment date, grade, and status. Write a query to find “all courses Alice is enrolled in with her grade”.
Refactor a flat schema: Below is a poorly designed schema. Identify the problems and redesign it:
{name: "Shop",products: [{ name, price, category, reviews: [{ user, text, rating }] }],employees: [{ name, role, salary, address: { street, city } }],suppliers: [{ name, contact, address: { street, city } }],}What happens when the shop has 10,000 products? When addresses change?