Skip to main content

Skillber v1.0 is here!

Learn more

Architecture Reference

Checking access...

This page provides the reference architecture for the capstone project. Use this as a blueprint when designing your own infrastructure. Every component below should be provisioned using Terraform modules with remote state stored in S3 with DynamoDB locking.

Global Architecture

┌─────────────────────────────────────────────────────────┐
│ Route 53 │
│ (Latency-based routing) │
└────────────────────┬────────────────────────────────────┘
┌────────────────────▼────────────────────────────────────┐
│ CloudFront │
│ (WAF attached, HTTPS only) │
└────────────────────┬────────────────────────────────────┘
┌────────────────────▼────────────────────────────────────┐
│ ALB (us-east-1 & eu-west-1) │
│ (Cross-zone, deletion protection) │
└────────────────────┬────────────────────────────────────┘
┌────────────────────▼────────────────────────────────────┐
│ ECS Fargate (us-east-1 & eu-west-1) │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Product │ │ Order │ │ User │ │
│ │ Service │ │ Service │ │ Service │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ │ │ │ │
│ ┌────▼─────┐ ┌────▼─────┐ ┌────▼─────┐ │
│ │ Aurora │ │ Aurora │ │ Aurora │ │
│ │ Primary │ │ Primary │ │ Primary │ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ ▲ ▲ ▲ │
│ │ Cross-Region Replication │ │
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Aurora │ │ Aurora │ │ Aurora │ │
│ │ Read │ │ Read │ │ Read │ │
│ │ Replica │ │ Replica │ │ Replica │ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ ┌──────────┐ │
│ │ElastiCache│ │
│ │ Redis │ │
│ └──────────┘ │
└─────────────────────────────────────────────────────────┘

VPC Design

Each region gets a dedicated VPC with the following CIDR allocation:

us-east-1: 10.1.0.0/16
eu-west-1: 10.2.0.0/16

Each VPC has three tiers across three availability zones:

Subnet TypeCIDR (per AZ)Purpose
Public10.1.1.0/24, 10.1.2.0/24, 10.1.3.0/24ALB, NAT Gateway, Bastion
Private App10.1.10.0/24, 10.1.11.0/24, 10.1.12.0/24ECS Fargate tasks
Private Data10.1.20.0/24, 10.1.21.0/24, 10.1.22.0/24RDS, ElastiCache

Connect the two regions via Transit Gateway or VPC Peering for cross-region communication. Use a Transit Gateway if you plan to add more regions or on-premise connectivity later.

Tip

Use a centralized Transit Gateway even for two regions. It makes adding a third region or a Direct Connect connection trivial later and avoids the complexity of full-mesh VPC peering.

ECS Fargate Configuration

Each microservice runs as a separate ECS service in its own task definition:

resource "aws_ecs_task_definition" "product_service" {
family = "product-service"
requires_compatibilities = ["FARGATE"]
network_mode = "awsvpc"
cpu = "512"
memory = "1024"
execution_role_arn = aws_iam_role.ecs_execution.arn
task_role_arn = aws_iam_role.ecs_task.arn
container_definitions = jsonencode([
{
name = "product-service"
image = "${aws_ecr_repository.product.repository_url}:latest"
portMappings = [{ containerPort = 8080, protocol = "tcp" }]
environment = [
{ name = "DB_HOST", value = aws_rds_cluster.product.cluster_endpoint },
{ name = "REDIS_HOST", value = aws_elasticache_replication_group.main.primary_endpoint_address }
]
logConfiguration = {
logDriver = "awslogs"
options = {
"awslogs-group" = "/ecs/product-service"
"awslogs-region" = var.aws_region
"awslogs-stream-prefix" = "ecs"
}
}
}
])
}

Service auto-scaling uses CloudWatch metric targets:

resource "aws_appautoscaling_target" "product" {
service_namespace = "ecs"
resource_id = "service/${aws_ecs_cluster.main.name}/${aws_ecs_service.product.name}"
scalable_dimension = "ecs:service:DesiredCount"
min_capacity = 2
max_capacity = 20
}
resource "aws_appautoscaling_policy" "product_cpu" {
name = "product-cpu-auto-scaling"
service_namespace = "ecs"
resource_id = aws_appautoscaling_target.product.resource_id
scalable_dimension = "ecs:service:DesiredCount"
policy_type = "TargetTrackingScaling"
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "ECSServiceAverageCPUUtilization"
}
target_value = 70
}
}

RDS Multi-AZ with Cross-Region Replication

Deploy Aurora PostgreSQL with a primary cluster in us-east-1 and a cross-region read replica in eu-west-1:

resource "aws_rds_cluster" "product_primary" {
cluster_identifier = "product-db-primary"
engine = "aurora-postgresql"
engine_mode = "provisioned"
database_name = "products"
master_username = "dbadmin"
master_password = random_password.db.result
storage_encrypted = true
kms_key_id = aws_kms_key.rds.arn
backup_retention_period = 35
preferred_backup_window = "03:00-04:00"
vpc_security_group_ids = [aws_security_group.data_sg.id]
db_subnet_group_name = aws_db_subnet_group.data.name
}
resource "aws_rds_cluster_instance" "product_primary_instances" {
count = 2
identifier = "product-db-primary-${count.index}"
cluster_identifier = aws_rds_cluster.product_primary.id
instance_class = "db.r6g.large"
engine = aws_rds_cluster.product_primary.engine
engine_version = aws_rds_cluster.product_primary.engine_version
}

CI/CD Pipeline

The pipeline uses CodePipeline with CodeBuild and approval gates:

buildspec.yml
version: 0.2
phases:
install:
runtime-versions:
python: 3.11
commands:
- pip install -r requirements.txt
- pip install bandit safety
pre_build:
commands:
- bandit -r src/ --exit-zero
- safety check --full-report
- docker login -u AWS -p $(aws ecr get-login-password) $AWS_ACCOUNT.dkr.ecr.$AWS_REGION.amazonaws.com
build:
commands:
- docker build -t product-service .
- docker tag product-service:latest $ECR_REPO:latest
- docker push $ECR_REPO:latest
post_build:
commands:
- printf '[{"name":"product-service","imageUri":"%s:latest"}]' $ECR_REPO > imagedefinitions.json
artifacts:
files:
- imagedefinitions.json

Pipeline stages:

  1. Source — GitHub repository (webhook trigger)
  2. Build — CodeBuild runs tests, security scanning, Docker build/push
  3. Staging Deploy — Deploy to staging ECS service
  4. Integration Test — Run smoke tests against staging
  5. Approval — Manual approval gate
  6. Production Deploy — Deploy to production ECS service (canary deploy)

Observability Stack

ComponentServicePurpose
MetricsCloudWatch + Custom MetricsCPU, memory, request count, error rate, business metrics
LogsCloudWatch Logs + Grafana LokiApplication logs with structured JSON
TracingX-Ray + OpenTelemetry CollectorDistributed traces, service map
DashboardsGrafana (provisioned)Unified view across CloudWatch, Prometheus, Loki
AlertingCloudWatch Alarms + Grafana AlertingSLO-based burn rate alerts routed to on-call

Security Controls

ControlImplementation
Encryption at restKMS envelope encryption for RDS, EBS, S3, ElastiCache
Encryption in transitTLS 1.3 via ACM (CloudFront + ALB), mTLS between services
WAFAWS Managed Rules + rate-based blocking
DDoSShield Advanced
Network segmentationSecurity groups per tier, NACL subnet-level deny lists
IAMLeast-privilege roles, task roles for ECS, service roles for RDS
SecretsAWS Secrets Manager, rotated automatically
AuditCloudTrail (management + data events), Config rules

Terraform Module Structure

terraform/
modules/
vpc/ — VPC, subnets, route tables, NAT Gateways, TG attachments
ecs-cluster/ — ECS cluster, capacity providers, task definitions, services
rds-aurora/ — DB cluster, instances, subnet groups, parameter groups
elasticache/ — Redis replication group, subnet group, parameter group
alb/ — ALB, target groups, listeners, WAF association
route53/ — DNS zones, records, health checks
cloudfront/ — Distribution, origin, WAF association, behaviors
ci-cd/ — CodePipeline, CodeBuild, ECR repositories
monitoring/ — CloudWatch dashboards, alarms, log groups
security/ — KMS keys, WAF ACLs, Shield, Secrets Manager, IAM roles
s3-backend/ — S3 bucket, DynamoDB table for Terraform state
environments/
dev/ — Smaller instances, single AZ
staging/ — Full multi-AZ, no cross-region
prod/ — Multi-region, all features

Incident Response Runbook Template

Each runbook should include:

  1. Detection — What alert fires, what SLI/SLO is breached
  2. Triage — Check Grafana dashboard, X-Ray traces, CloudWatch Logs
  3. Mitigation — Specific steps (e.g., failover to secondary region)
  4. Resolution — Apply fix, verify, update runbook
  5. Post-mortem — Root cause, timeline, action items

Info

Store runbooks alongside your Terraform code in a runbooks/ directory. Keep them in version control so they evolve with the infrastructure.

Summary

This reference architecture brings together every concept from the course: VPC design, container orchestration, database high availability, global traffic routing, CI/CD, observability, and defense-in-depth security. Use it as the starting point for your capstone project and adapt it to your specific requirements.