Architecture Reference

Checking access...

This page provides the reference architecture for the capstone project. Use this as a blueprint when designing your own infrastructure. Every component below should be provisioned using Terraform modules with remote state stored in S3 with DynamoDB locking.

Global Architecture

┌─────────────────────────────────────────────────────────┐
│                      Route 53                           │
│              (Latency-based routing)                     │
└────────────────────┬────────────────────────────────────┘
                     │
┌────────────────────▼────────────────────────────────────┐
│                    CloudFront                            │
│              (WAF attached, HTTPS only)                  │
└────────────────────┬────────────────────────────────────┘
                     │
┌────────────────────▼────────────────────────────────────┐
│            ALB (us-east-1 & eu-west-1)                   │
│          (Cross-zone, deletion protection)               │
└────────────────────┬────────────────────────────────────┘
                     │
┌────────────────────▼────────────────────────────────────┐
│          ECS Fargate (us-east-1 & eu-west-1)             │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐              │
│  │ Product  │  │  Order   │  │  User    │              │
│  │ Service  │  │  Service │  │  Service │              │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘              │
│       │              │              │                    │
│  ┌────▼─────┐  ┌────▼─────┐  ┌────▼─────┐              │
│  │  Aurora  │  │ Aurora   │  │ Aurora   │              │
│  │  Primary │  │ Primary  │  │ Primary  │              │
│  └──────────┘  └──────────┘  └──────────┘              │
│       ▲              ▲              ▲                    │
│       │    Cross-Region Replication   │                    │
│       ▼              ▼              ▼                    │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐              │
│  │ Aurora   │  │ Aurora   │  │ Aurora   │              │
│  │ Read     │  │ Read     │  │ Read     │              │
│  │ Replica  │  │ Replica  │  │ Replica  │              │
│  └──────────┘  └──────────┘  └──────────┘              │
│                    ┌──────────┐                          │
│                    │ElastiCache│                          │
│                    │  Redis   │                          │
│                    └──────────┘                          │
└─────────────────────────────────────────────────────────┘

VPC Design

Each region gets a dedicated VPC with the following CIDR allocation:

us-east-1:  10.1.0.0/16
eu-west-1:  10.2.0.0/16

Each VPC has three tiers across three availability zones:

Subnet Type	CIDR (per AZ)	Purpose
Public	10.1.1.0/24, 10.1.2.0/24, 10.1.3.0/24	ALB, NAT Gateway, Bastion
Private App	10.1.10.0/24, 10.1.11.0/24, 10.1.12.0/24	ECS Fargate tasks
Private Data	10.1.20.0/24, 10.1.21.0/24, 10.1.22.0/24	RDS, ElastiCache

Connect the two regions via Transit Gateway or VPC Peering for cross-region communication. Use a Transit Gateway if you plan to add more regions or on-premise connectivity later.

Tip

Use a centralized Transit Gateway even for two regions. It makes adding a third region or a Direct Connect connection trivial later and avoids the complexity of full-mesh VPC peering.

ECS Fargate Configuration

Each microservice runs as a separate ECS service in its own task definition:

resource "aws_ecs_task_definition" "product_service" {
  family                   = "product-service"
  requires_compatibilities = ["FARGATE"]
  network_mode             = "awsvpc"
  cpu                      = "512"
  memory                   = "1024"
  execution_role_arn       = aws_iam_role.ecs_execution.arn
  task_role_arn            = aws_iam_role.ecs_task.arn

  container_definitions = jsonencode([
    {
      name  = "product-service"
      image = "${aws_ecr_repository.product.repository_url}:latest"
      portMappings = [{ containerPort = 8080, protocol = "tcp" }]
      environment = [
        { name = "DB_HOST", value = aws_rds_cluster.product.cluster_endpoint },
        { name = "REDIS_HOST", value = aws_elasticache_replication_group.main.primary_endpoint_address }
      ]
      logConfiguration = {
        logDriver = "awslogs"
        options = {
          "awslogs-group"         = "/ecs/product-service"
          "awslogs-region"        = var.aws_region
          "awslogs-stream-prefix" = "ecs"
        }
      }
    }
  ])
}

Service auto-scaling uses CloudWatch metric targets:

resource "aws_appautoscaling_target" "product" {
  service_namespace  = "ecs"
  resource_id        = "service/${aws_ecs_cluster.main.name}/${aws_ecs_service.product.name}"
  scalable_dimension = "ecs:service:DesiredCount"
  min_capacity       = 2
  max_capacity       = 20
}

resource "aws_appautoscaling_policy" "product_cpu" {
  name               = "product-cpu-auto-scaling"
  service_namespace  = "ecs"
  resource_id        = aws_appautoscaling_target.product.resource_id
  scalable_dimension = "ecs:service:DesiredCount"
  policy_type        = "TargetTrackingScaling"

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ECSServiceAverageCPUUtilization"
    }
    target_value = 70
  }
}

RDS Multi-AZ with Cross-Region Replication

Deploy Aurora PostgreSQL with a primary cluster in us-east-1 and a cross-region read replica in eu-west-1:

resource "aws_rds_cluster" "product_primary" {
  cluster_identifier      = "product-db-primary"
  engine                  = "aurora-postgresql"
  engine_mode             = "provisioned"
  database_name           = "products"
  master_username         = "dbadmin"
  master_password         = random_password.db.result
  storage_encrypted       = true
  kms_key_id             = aws_kms_key.rds.arn
  backup_retention_period = 35
  preferred_backup_window = "03:00-04:00"
  vpc_security_group_ids  = [aws_security_group.data_sg.id]
  db_subnet_group_name    = aws_db_subnet_group.data.name
}

resource "aws_rds_cluster_instance" "product_primary_instances" {
  count              = 2
  identifier         = "product-db-primary-${count.index}"
  cluster_identifier = aws_rds_cluster.product_primary.id
  instance_class     = "db.r6g.large"
  engine             = aws_rds_cluster.product_primary.engine
  engine_version     = aws_rds_cluster.product_primary.engine_version
}

CI/CD Pipeline

The pipeline uses CodePipeline with CodeBuild and approval gates:

version: 0.2

phases:
  install:
    runtime-versions:
      python: 3.11
    commands:
      - pip install -r requirements.txt
      - pip install bandit safety
  pre_build:
    commands:
      - bandit -r src/ --exit-zero
      - safety check --full-report
      - docker login -u AWS -p $(aws ecr get-login-password) $AWS_ACCOUNT.dkr.ecr.$AWS_REGION.amazonaws.com
  build:
    commands:
      - docker build -t product-service .
      - docker tag product-service:latest $ECR_REPO:latest
      - docker push $ECR_REPO:latest
  post_build:
    commands:
      - printf '[{"name":"product-service","imageUri":"%s:latest"}]' $ECR_REPO > imagedefinitions.json
artifacts:
  files:
    - imagedefinitions.json

Pipeline stages:

Source — GitHub repository (webhook trigger)
Build — CodeBuild runs tests, security scanning, Docker build/push
Staging Deploy — Deploy to staging ECS service
Integration Test — Run smoke tests against staging
Approval — Manual approval gate
Production Deploy — Deploy to production ECS service (canary deploy)

Observability Stack

Component	Service	Purpose
Metrics	CloudWatch + Custom Metrics	CPU, memory, request count, error rate, business metrics
Logs	CloudWatch Logs + Grafana Loki	Application logs with structured JSON
Tracing	X-Ray + OpenTelemetry Collector	Distributed traces, service map
Dashboards	Grafana (provisioned)	Unified view across CloudWatch, Prometheus, Loki
Alerting	CloudWatch Alarms + Grafana Alerting	SLO-based burn rate alerts routed to on-call

Security Controls

Control	Implementation
Encryption at rest	KMS envelope encryption for RDS, EBS, S3, ElastiCache
Encryption in transit	TLS 1.3 via ACM (CloudFront + ALB), mTLS between services
WAF	AWS Managed Rules + rate-based blocking
DDoS	Shield Advanced
Network segmentation	Security groups per tier, NACL subnet-level deny lists
IAM	Least-privilege roles, task roles for ECS, service roles for RDS
Secrets	AWS Secrets Manager, rotated automatically
Audit	CloudTrail (management + data events), Config rules

Terraform Module Structure

terraform/
  modules/
    vpc/              — VPC, subnets, route tables, NAT Gateways, TG attachments
    ecs-cluster/      — ECS cluster, capacity providers, task definitions, services
    rds-aurora/       — DB cluster, instances, subnet groups, parameter groups
    elasticache/      — Redis replication group, subnet group, parameter group
    alb/              — ALB, target groups, listeners, WAF association
    route53/          — DNS zones, records, health checks
    cloudfront/       — Distribution, origin, WAF association, behaviors
    ci-cd/            — CodePipeline, CodeBuild, ECR repositories
    monitoring/       — CloudWatch dashboards, alarms, log groups
    security/         — KMS keys, WAF ACLs, Shield, Secrets Manager, IAM roles
    s3-backend/       — S3 bucket, DynamoDB table for Terraform state
  environments/
    dev/              — Smaller instances, single AZ
    staging/          — Full multi-AZ, no cross-region
    prod/             — Multi-region, all features

Incident Response Runbook Template

Each runbook should include:

Detection — What alert fires, what SLI/SLO is breached
Triage — Check Grafana dashboard, X-Ray traces, CloudWatch Logs
Mitigation — Specific steps (e.g., failover to secondary region)
Resolution — Apply fix, verify, update runbook
Post-mortem — Root cause, timeline, action items

Info

Store runbooks alongside your Terraform code in a runbooks/ directory. Keep them in version control so they evolve with the infrastructure.

Summary

This reference architecture brings together every concept from the course: VPC design, container orchestration, database high availability, global traffic routing, CI/CD, observability, and defense-in-depth security. Use it as the starting point for your capstone project and adapt it to your specific requirements.