Skip to main content

Skillber v1.0 is here!

Learn more

Incident Response Lifecycle

Checking access...

The NIST SP 800-61 Rev 2 standard, Computer Security Incident Handling Guide, defines the authoritative framework for incident response. It structures IR into four phases that form a continuous improvement cycle:

    graph RL
    A[Preparation] --> B[Detection & Analysis]
    B --> C[Containment, Eradication & Recovery]
    C --> D[Post-Incident Activity]
    D -->|Lessons Learned| A
  

Understanding this lifecycle — and its real-world application — is essential for anyone involved in security operations. According to the 2024 Ponemon Cost of a Data Breach Report, organisations that follow a structured IR lifecycle reduce breach costs by an average of $1.2 million compared to those with ad-hoc response processes.

Phase 1: Preparation

Preparation is the most critical phase — and the most commonly neglected. If you are building your IR capability during an incident, you have already failed.

Components of Preparation

AreaDetailReal-World Example
IR PlanDocumented, approved, and communicatedNIST-based plan with defined roles, escalation paths, and communication templates
IR TeamTrained personnel with defined rolesCSIRT with L1/L2/L3, legal, PR, executive liaison
ToolsPre-deployed and testedEDR, SIEM, SOAR, forensic workstations, network capture
PlaybooksRunbooks for common scenariosRansomware playbook, phishing playbook, DDoS playbook
CommunicationCall tree, secure channels, PR templatesSlack channels, conference bridge, encrypted email, press statement drafts
TrainingRegular exercises and drillsQuarterly tabletop exercises, annual full-scale simulation
InfrastructureLogging, backups, network segmentationCentralised logging with sufficient retention, immutable backups

Preparation Check — The 80/20 Rule

A common mistake is trying to prepare for every possible scenario. The 80/20 rule applies: 80% of incidents fall into 20% of scenarios. Focus preparation on the most likely scenarios:

ScenarioLikelihood (Enterprise)Priority
Phishing / Email compromiseVery highP1
RansomwareHighP1
Insider data theftMediumP2
DDoSMediumP2
Supply chain compromiseLowP3
Nation-state APTVery low (for most orgs)P3

Phase 2: Detection & Analysis

Detection is covered in depth in the Threat Detection module. From an IR perspective, detection triggers the transition from monitoring to response.

The Triage Process

When an alert fires, the IR team follows a structured triage process:

Step 1 — Verify: Is this a true positive? Check the alert details, correlate with other data sources, rule out environmental noise.

Step 2 — Classify: Determine the incident type (malware, phishing, unauthorised access, DDoS, etc.) and severity (critical, high, medium, low).

Step 3 — Scope: What systems, users, and data are affected? Is it a single workstation or a domain-wide compromise?

Step 4 — Impact: What is the potential business impact? Data exfiltration? Ransomware encryption? Regulatory notification?

SeverityDefinitionResponse SLAExample
CriticalActive threat to life/safety, or confirmed breach of critical systems with data exfiltrationImmediate — < 15 min responseActive ransomware encryption across production servers
HighConfirmed compromise of sensitive data or systems< 1 hourPhishing compromise of a finance executive’s email
MediumSuspicious activity without confirmed data loss< 4 hoursMultiple failed logons from unusual geo-location
LowPolicy violation, nuisance activity< 24 hoursEmployee installing unauthorised software

Analysis — Gathering the Picture

Analysis determines what happened, how it happened, and what the adversary is doing:

    graph TD
    A[Alert Fires] --> B[Collect Initial Data]
    B --> C[Correlate Across Sources]
    C --> D{Evidence of Compromise?}
    D -->|Yes| E[Determine Scope]
    D -->|No| F[Investigate Further or Close]
    E --> G[Identify Attacker Actions]
    G --> H[Determine IoCs]
    H --> I[Update Detection Rules]
    I --> J[Escalate to Containment]
  

Info

The first 60 minutes of detection are the most critical. According to CrowdStrike’s 2024 report, the average breakout time is 79 minutes. If your triage takes longer than that, the attacker has already moved laterally.

Phase 3: Containment, Eradication & Recovery

This phase is where the IR team actively stops the adversary and removes their presence.

Containment Strategies

StrategySpeedReversibilityUse Case
Host isolation (disable NIC)SecondsModerate (reboot to restore)Single compromised host
Account disableSecondsHigh (re-enable)Compromised user/service account
Network block (firewall ACL)MinutesHigh (remove rule)Blocking C2 infrastructure
VM isolation (snapshot + disconnect)MinutesHigh (reconnect)Cloud workload compromise
Application quarantineMinutesModerate (remove from quarantine)SaaS account compromise
Full network segmentationHoursLow (complex reconfiguration)Active ransomware spreading

Eradication

Eradication removes the adversary’s presence from the environment. This includes:

  • Removing malware from affected systems
  • Deleting backdoors, webshells, and persistence mechanisms
  • Resetting compromised credentials (passwords, API keys, certificates)
  • Patching exploited vulnerabilities
  • Removing unauthorised accounts and group memberships

The Golden Rule of Eradication: If you cannot be certain that a system is clean, rebuild it from known-good sources. Reimaging is always safer than attempting to clean a compromised system.

Recovery

Recovery restores normal operations:

  1. Restore clean systems from validated backups (test the backup first!)
  2. Verify systems are patched and hardened before reconnecting
  3. Monitor restored systems closely for signs of re-infection
  4. Gradually restore service access as each system is verified clean
  5. Communicate restoration progress to stakeholders

Phase 4: Post-Incident Activity

The final phase — and the one most frequently skipped — is where the organisation learns from the incident and improves its defences.

Post-Mortem Meeting

The post-mortem (also called a “lessons learned” or “after-action review”) should include:

  • Timeline reconstruction: What happened, when, and in what sequence
  • Root cause analysis: How did the attacker gain initial access?
  • Control effectiveness: Which controls worked? Which failed? Which were missing?
  • Process evaluation: Did the IR plan hold up? Were communication channels effective?
  • Metrics: MTTD, MTTR, containment time, recovery time
  • Improvement actions: Specific, owner-assigned actions with deadlines

Post-Mortem Template

INCIDENT POST-MORTEM
Incident ID: INC-2026-0016
Title: Ransomware — Finance Department
Date: 2026-01-16
Severity: Critical
TIMELINE:
[2026-01-14 08:23] — User received phishing email (clicked link)
[2026-01-14 08:25] — QakBot loader downloaded and executed
[2026-01-14 09:45] — Cobalt Strike beacon established
[2026-01-14 14:12] — Lateral movement via RDP to file server
[2026-01-15 02:00] — Ransomware deployed (LockBit 3.0 variant)
[2026-01-15 02:03] — SIEM alert fired (mass file encryption detected)
[2026-01-15 02:07] — L1 analyst acknowledged alert
[2026-01-15 02:15] — Containment initiated: file server isolated
[2026-01-15 02:30] — EDR containment: all affected hosts isolated
[2026-01-15 03:00] — C2 domains blocked at firewall
[2026-01-15 04:00] — IR team completed scoping
[2026-01-15 06:00] — Clean backup restoration began
[2026-01-15 18:00] — Full recovery completed
ROOT CAUSE:
Phishing email with malicious link delivered to finance user. Email security filter
did not detect the URL because the domain was 2 hours old (newly registered).
WHAT WORKED:
- EDR detected and alerted on mass file encryption within 3 minutes
- Offline backups allowed full restoration
- Call tree activation was completed in 8 minutes
WHAT FAILED:
- Email security did not detect the phishing URL (new domain)
- No endpoint isolation trigger automated from SIEM alert
- RDP was open to the file server from all workstations (not segmented)
IMPROVEMENT ACTIONS:
1. [P1] Implement automated host isolation in SOAR playbook for ransomware alerts (Owner: IR Lead, Due: 2 weeks)
2. [P1] Add new-domain detection to email security (Owner: Security Eng, Due: 1 week)
3. [P2] Segment RDP access to file server to specific jump hosts (Owner: Network Team, Due: 4 weeks)
4. [P3] Deploy phishing simulation training for finance department (Owner: Security Awareness, Due: 2 weeks)

Metrics and Reporting

Key metrics to track across all incidents:

MetricDefinitionTarget
MTTD (Mean Time to Detect)Time from compromise to detection< 1 hour
MTTR (Mean Time to Respond)Time from detection to containment< 2 hours
Mean containment timeTime from escalation to containment< 30 minutes
Mean recovery timeTime from containment to full recovery< 24 hours
% of incidents with post-mortemHow many incidents get a formal review100% for critical/high
% of post-mortem actions completedImprovement action closure rate> 80% within 30 days

Case Study: Mandiant and the SolarWinds Response

The SolarWinds breach (discovered December 2020) remains one of the most significant incident response cases in history. FireEye (Mandiant) discovered the breach within their own network and traced it to the SolarWinds Orion supply chain compromise.

Response Timeline

DateEventIR Phase
March 2020Attacker inserts SUNBURST backdoor into SolarWinds Orion buildInitial Compromise
March-Dec 202018,000 organisations receive trojanised updatePropagation
Dec 8, 2020FireEye detects the breach internallyDetection
Dec 13, 2020Mandiant releases public disclosureAnalysis
Dec 14-31, 2020Global IR mobilisation — every affected org begins responseContainment
Jan-Mar 2021Attribution to APT29 (SVR) — infrastructure dismantledEradication
2021-2022Organisations rebuild SolarWinds servers, rotate all credentials, audit cloud accessRecovery
OngoingSupply chain security reforms, software bill of materials (SBOM) mandatesPost-Incident

Key IR Lessons from SolarWinds

  1. Trust nothing, verify everything: The attackers signed their malware with a valid SolarWinds certificate. Code signing does not equal safety.

  2. Supply chain risk is real: Your security depends on the security of every vendor whose software runs in your environment.

  3. 18 months of dwell time: The attackers maintained access for over a year — log retention must exceed maximum possible dwell time.

  4. Backdoors in trusted software: No amount of perimeter security helps when the software itself is compromised. Detection must focus on behaviour, not just signatures.

  5. Coordinate across organisations: The response required unprecedented collaboration between private sector (FireEye, Microsoft, CrowdStrike) and government (CISA, FBI).

Danger

The SolarWinds breach fundamentally changed incident response. It demonstrated that even the most sophisticated security teams can be compromised through trusted vendors. Every IR plan must now include supply chain incident scenarios.

Key Takeaways

  • The NIST SP 800-61 IR lifecycle (Preparation → Detection → Containment/Eradication/Recovery → Post-Incident) provides the foundational framework for all incident response activities
  • Preparation is the most critical phase — build the plan, train the team, and test the tools before an incident occurs
  • Triage classifies incidents by severity and determines the appropriate response SLA — the first 60 minutes are critical
  • Containment strategies range from seconds (host isolation) to hours (network segmentation) — choose speed over elegance
  • Post-incident activity (post-mortem) is how the organisation improves — without it, the same incident will happen again
  • The SolarWinds breach demonstrated that supply chain attacks require new response approaches: behaviour-based detection, extended log retention, and cross-organisation coordination