Module Project: Multi-Tier Storage Architecture
Checking access...
Project Overview
Design a multi-tier storage and database architecture for an e-commerce platform called “ShopCloud.” The platform sells physical and digital goods, has a global customer base, and needs to be highly available, durable, and performant.
Scenario: ShopCloud is launching in three phases. Phase 1 targets North America and Europe. The engineering team practices Infrastructure as Code (Terraform) and wants to automate everything. You must recommend every storage and database service, justify your choices, and document the data flow.
Requirements
| Requirement | Detail |
|---|---|
| Product catalog | 1 million SKUs, growing 10% monthly. Highly read-heavy. |
| User accounts | 500K registered users. Passwords hashed with bcrypt. |
| Orders | 50K orders/day. Must be ACID-compliant. |
| Session data | Ephemeral, high-throughput, low-latency. |
| Product images | 10 TB of images, served globally. Cache-friendly. |
| Analytics pipeline | Raw clickstream data, warehouse for reporting. |
| Disaster recovery | RPO ≤ 5 minutes, RTO ≤ 1 hour. |
Architecture Decision Record
1. Product Catalog → DynamoDB
Why: The catalog is accessed by product ID. No complex joins are needed — just fast key-value lookups. DynamoDB scales automatically to handle traffic spikes during sales events.
Configuration:
- Table with
product_id(partition key) andcategory(GSI for category browsing) - On-demand capacity (unpredictable traffic during flash sales)
- DAX cluster for microsecond reads on frequently accessed products
Info
If the product catalog required full-text search or faceted navigation (filter by price range, brand, ratings), we would add a secondary indexing layer like Amazon OpenSearch.
2. User Accounts and Orders → RDS PostgreSQL
Why: Orders involve multiple tables (customers, orders, line items, payments) with complex transactional logic. ACID guarantees are non-negotiable for financial data.
Configuration:
- db.r6g.xlarge (8 vCPU, 64 GB RAM) — Multi-AZ
- Automated backups with 35-day retention
- Read replica in
eu-west-1for European users
3. Session Data → ElastiCache Redis
Why: Sessions are ephemeral, require single-digit-millisecond latency, and benefit from Redis data structures (sorted sets for cart expiration, pub/sub for inventory notifications).
Configuration:
- Redis cluster mode enabled, 3 shards, 1 replica per shard
- No persistence (sessions are disposable; RAM-only for maximum performance)
4. Product Images → S3 + CloudFront
Why: Images are large, immutable, and accessed globally. S3 provides 11 nines durability, lifecycle policies manage cost, and CloudFront caches at edge locations.
Configuration:
- S3 Standard — images uploaded in the last 30 days
- S3 Standard-IA — images 31-90 days old
- S3 Glacier Deep Archive — images older than 90 days (rarely accessed)
- CloudFront distribution with origin access identity
- Image optimization (WebP conversion via Lambda@Edge)
5. Analytics Pipeline → S3 + Athena
Why: Clickstream data is unstructured, high-volume, and accessed for ad-hoc analysis. S3 serves as the data lake, Athena queries it via SQL without provisioning servers.
Configuration:
- S3 bucket with partitioned layout:
clickstream/year=2025/month=01/day=15/ - Lifecycle policy: transition to S3 Glacier after 90 days, delete after 7 years
- Glue crawler for schema discovery
- QuickSight for dashboards
Data Flow Diagram
┌──────────────┐ │ Users │ └──────┬───────┘ │ ┌──────▼───────┐ │ CloudFront │ │ (CDN) │ └──────┬───────┘ │ ┌────────────┼────────────┐ │ │ │ ┌─────▼────┐ ┌────▼────┐ ┌────▼────┐ │ S3 │ │ ALB │ │ Athena │ │ (Images) │ │ │ │ (Analytics) └──────────┘ └────┬────┘ └─────────┘ │ ┌────────┼────────┐ │ │ │ ┌─────▼──┐ ┌──▼───┐ ┌──▼─────┐ │ EC2 │ │Redis │ │ Lambda │ │ (App) │ │Sess. │ │ (Auth) │ └────┬───┘ └──────┘ └────────┘ │ ┌─────┼─────┐ │ │ │ ┌─────▼┐ ┌──▼──┐ ┌▼──────┐ │ RDS │ │Dynamo│ │ S3 │ │Orders│ │Catalog│ │DataLake│ └──────┘ └──────┘ └───────┘Disaster Recovery
| Tier | Primary | DR Strategy | RPO | RTO |
|---|---|---|---|---|
| RDS (Orders) | us-east-1 | Cross-region read replica in us-west-2 | < 5 seconds | < 1 hour |
| DynamoDB (Catalog) | us-east-1 | Global table (us-west-2) | < 1 second | < 1 minute |
| ElastiCache (Sessions) | us-east-1 | Rebuild from app on failover | N/A | < 5 minutes |
| S3 (Images + Lake) | us-east-1 | Cross-region replication | < 15 minutes | < 1 hour |
Caution
Cross-region replication for S3 incurs transfer costs (roughly $0.02/GB for inter-region). Only replicate buckets that contain business-critical data. Recreatable data (transient logs) can be regenerated from source systems.
Cost Estimate (Monthly)
| Service | Configuration | Estimated Cost |
|---|---|---|
| S3 (10 TB) + CloudFront (50 TB egress) | Standard + Glacier lifecycle | $1,200 |
| DynamoDB (1M RCU/WCU on-demand) | With DAX cluster | $900 |
| RDS PostgreSQL (Multi-AZ) | db.r6g.xlarge | $800 |
| ElastiCache Redis (3 shards) | r6g.large | $450 |
| S3 data lake + Athena queries | Partitioned, 10 TB scanned | $200 |
| Total | ~$3,550 |
Reflection Questions
- Why not use RDS for the product catalog? RDS would work, but it requires manual sharding at this scale. DynamoDB handles partitioning automatically and costs less for simple key-value access.
- Should the analytics pipeline use a data warehouse (Redshift) instead of Athena? At this data volume (~1 TB/month), Athena and S3 are more cost-effective. Redshift becomes compelling at 10+ TB with complex, repeated queries.
- How would you handle digital goods delivery? Store digital goods in S3 with pre-signed URLs (time-limited, one-time use). The order confirmation Lambda generates the URL after payment is validated.
- What if read traffic doubles overnight (viral marketing campaign)? DynamoDB on-demand and CloudFront absorb spikes automatically. RDS would need read replicas scaled up — automate this with a Lambda that monitors replica lag and adds replicas on threshold breach.
Deliverable
Create an ADR covering: service selection rationale, configuration details, data flow diagram, DR strategy, and cost estimate. Be prepared to justify each choice against alternatives (e.g., why DynamoDB over MongoDB on EC2, why Aurora over standard PostgreSQL).