Risk Management
Checking access...
Risk management is the core discipline of cybersecurity. Every decision a security professional makes — what to prioritise, what to accept, what to pay for — is a risk decision. Without risk management, you are guessing. With it, you are making informed trade-offs that the business can understand.
The Risk Formula
Risk = Threat × Vulnerability × ImpactAll three elements must be present for risk to exist:
| Element | Definition | Example | Without This Element |
|---|---|---|---|
| Threat | What could cause harm | A ransomware gang targeting your industry | No risk — no one is trying to attack you |
| Vulnerability | A weakness that could be exploited | Unpatched Apache Struts server | No risk — there is no way in |
| Impact | The consequence if exploited | $50M in lost revenue from 2-week outage | No risk — even if breached, it does not matter |
Real-world implication: If you have a critical vulnerability on an internal server that has no network connectivity and no sensitive data, there is vulnerability but no impact — so the risk is low. Conversely, if you have a well-patched internet-facing server with customer PII, the vulnerability likelihood is low but the impact is high — so the risk is still significant.
Threat, Vulnerability, Risk — The Relationship
These three terms are often confused. Here is how they relate:
THREAT (attacker knocking on doors) ↓ findsVULNERABILITY (an unlocked door) ↓ leading toIMPACT (what is inside gets stolen) ↓ =RISK (how likely this is × how bad it would be)Example: A ransomware gang (threat) scanning the internet for exposed RDP ports finds that your organisation has RDP open to the internet on a domain controller (vulnerability). If they exploit this, they could encrypt your domain and halt operations for weeks (impact). The risk is high because the threat is active, the vulnerability exists, and the impact is severe.
Risk Assessment Methodologies
Qualitative Risk Assessment
Qualitative assessment uses descriptive scales (High/Medium/Low) rather than numerical values. It is faster, more accessible, and widely used for initial assessments.
graph TD
A[Identify Assets] --> B[Identify Threats]
B --> C[Identify Vulnerabilities]
C --> D[Assess Likelihood: H/M/L]
D --> E[Assess Impact: H/M/L]
E --> F[Calculate Risk: Likelihood × Impact]
F --> G[Prioritise: Heat Map]
Risk heat map (5×5 matrix):
| Likelihood \ Impact | Negligible | Minor | Moderate | Major | Catastrophic |
|---|---|---|---|---|---|
| Almost Certain | Medium | High | High | Critical | Critical |
| Likely | Medium | Medium | High | High | Critical |
| Possible | Low | Medium | Medium | High | High |
| Unlikely | Low | Low | Medium | Medium | High |
| Rare | Low | Low | Low | Medium | Medium |
Example qualitative risk register:
| Asset | Threat | Vulnerability | Likelihood | Impact | Risk | Treatment |
|---|---|---|---|---|---|---|
| Customer database | SQL injection via web app | Web app uses string concatenation for queries | Likely | Catastrophic | Critical | Rewrite with parameterised queries (project) |
| Email system | Spear-phishing attack | No MFA on executive accounts | Almost Certain | Major | Critical | Enforce MFA (1 week) |
| Public website | DDoS attack | No DDoS protection | Possible | Minor | Low | Accept risk |
| Office Wi-Fi | Rogue access point | No 802.1X network access control | Unlikely | Moderate | Medium | Implement NAC (Q3 project) |
Quantitative Risk Assessment (FAIR Model)
Quantitative assessment uses dollar values and statistical analysis. The Factor Analysis of Information Risk (FAIR) model is the industry standard.
Key quantitative terms:
| Term | Definition | Formula |
|---|---|---|
| SLE (Single Loss Expectancy) | Cost of a single incident | SLE = AV × EF |
| AV (Asset Value) | Dollar value of the asset | $ |
| EF (Exposure Factor) | Percentage of asset value lost | % |
| ARO (Annual Rate of Occurrence) | How many times per year | 0.0 - 365 |
| ALE (Annualised Loss Expectancy) | Expected annual cost | ALE = SLE × ARO |
Example — Ransomware risk calculation:
Asset: Financial databases + file serversAsset Value (AV): $5,000,000 (replacement + data value + downtime cost)Exposure Factor (EF): 60% (partial data loss + 3-week restoration)Single Loss Expectancy (SLE) = $5,000,000 × 0.60 = $3,000,000Annual Rate of Occurrence (ARO): 0.3 (one incident every 3-4 years, based on industry stats)Annualised Loss Expectancy (ALE) = $3,000,000 × 0.3 = $900,000/yearNow compare control costs:
Control option 1: Offline backups + tested DR plan Cost: $150,000/year (backup infrastructure + testing) New EF: 10% (recover within 48 hours, minimal data loss) New SLE: $500,000 New ALE: $500,000 × 0.3 = $150,000/year Annual savings: $900,000 - $150,000 = $750,000 ROI: $750,000 - $150,000 = $600,000/year net benefit ✅
Control option 2: Everything-in-the-cloud migration Cost: $2,000,000/year (cloud migration + rearchitecture) New ARO: 0.1 (cloud provider has better baseline security) New ALE: $3,000,000 × 0.1 = $300,000/year Annual savings: $900,000 - $300,000 = $600,000 But $2,000,000 > $600,000 ... negative ROI (unless cloud also provides other benefits)FAIR analysis output:
Risk: Ransomware on financial systems Probable loss frequency: 0.3 events/year Probable loss magnitude: $3,000,000 per event Annualised loss expectancy: $900,000 Confidence interval (90%): $450,000 - $1,800,000
Recommended control: Offline immutable backups + DR plan Residual ALE with control: $150,000 Annual benefit of control: $750,000 Control cost: $150,000/year Net benefit: $600,000/yearRisk Treatment Options
Once risk is identified and assessed, you must decide what to do:
| Strategy | Description | When to Use | Example |
|---|---|---|---|
| Avoid | Eliminate the activity that creates risk | When risk outweighs benefit | ”We will not store credit card numbers — use a tokenisation service” |
| Transfer | Shift risk to a third party | When another party can manage it better | ”We bought cyber insurance to cover ransomware losses” |
| Mitigate | Implement controls to reduce likelihood or impact | When cost of control < expected loss | ”We deployed MFA to reduce account takeover risk” |
| Accept | Acknowledge the risk and monitor it | When cost of control > expected loss | ”We accept the risk of the public wiki being defaced — recovery is trivial” |
Residual vs inherent risk:
INHERENT RISK: Risk before any controls are applied ↓CONTROLS: What you do to reduce risk ↓RESIDUAL RISK: Risk that remains after controls ↓Do you accept residual risk? If not → add more controlsExample — Inherent vs residual for a public web application:
| Risk Element | Inherent (no controls) | Control Applied | Residual (after control) |
|---|---|---|---|
| SQL injection | Critical (public app, no input validation) | WAF + parameterised queries | Low |
| DDoS | High (single server, no capacity) | Cloudflare CDN + auto-scaling | Low |
| Credential stuffing | Critical (no rate limiting, no MFA) | Rate limiting + MFA + account lockout | Low |
| XSS | Critical (user-submitted content, no sanitisation) | Content Security Policy + output encoding | Medium |
Third-Party Risk Management
Most breaches involve a third party. Target was breached through an HVAC vendor. SolarWinds was a supply chain attack. Capital One’s cloud provider (AWS) was not breached — but Capital One misconfigured their use of it.
TPRM process:
1. IDENTIFY: Which vendors have access to our data or network? → Vendor registry with data classification and access level
2. ASSESS: What risk does each vendor present? → Questionnaires (SIG), SOC 2 reports, penetration test results
3. TIER: Not all vendors are equal → Tier 1 (critical): SOC 2 Type II + pen test required → Tier 2 (important): SOC 2 Type II or equivalent → Tier 3 (low): Self-assessment only
4. TREAT: What do we require from each vendor? → Contractual security requirements, right-to-audit clause
5. MONITOR: Continuous oversight → Annual reassessment, breach notification agreementsSample vendor tiering:
| Tier | Criteria | Requirements | Cadence |
|---|---|---|---|
| Tier 1 | Processes PII, or has network access | SOC 2 Type II, pentest, BIA, cyber insurance | Annual assessment |
| Tier 2 | Connects to non-sensitive systems | SOC 2 Type II or equivalent | Annual assessment |
| Tier 3 | No access to our environment | Self-assessment questionnaire | Upon onboarding only |
Case Study: Equifax (Complete)
Equifax is the most important risk management case study in cybersecurity history because it involved failure at every level of the risk management process.
Timeline:
| Date | Event |
|---|---|
| March 7, 2017 | Apache releases patch for CVE-2017-5638 (Struts RCE) — CVSS 10.0 |
| March 7 - May 13 | Equifax has the patch available but does not apply it |
| May 13 | Attackers scan the internet for unpatched Struts servers, find Equifax |
| May 13 - July 29 | Attackers maintain access, move laterally, locate unencrypted database |
| July 29 | Equifax security team detects suspicious traffic |
| September 7 | Equifax publicly discloses the breach |
| 2018-2019 | Congressional hearings, CEO retires, $1.4B in settlements |
Risk management failures at every level:
| Failure | What Should Have Happened |
|---|---|
| No asset inventory — They did not know they had a Struts server | Asset discovery should identify all internet-facing systems |
| No patch SLA — CVSS 10.0 vulnerability unpatched for 2+ months | Critical patching SLA of 48 hours or less |
| No vulnerability scanning — They had a scanner but it did not cover this system | Authenticated scanning covering all systems |
| No network segmentation — Attacker reached the database from the web tier | Database should be in a separate subnet with strict firewall rules |
| No database encryption — 147M records stored in plaintext | Encryption at rest (AES-256) would have made stolen data useless |
| No data exfiltration detection — 147M records left over months | DLP + anomaly detection would have alerted on the volume |
| No IR plan — Took 2 months to discover they were breached | EDR + SIEM with 24/7 monitoring |
The root cause was not technical — it was risk management. Equifax had a vulnerability management program on paper. They ran scans. They had a patching process. But the process failed because:
- The scanning did not cover all assets
- The patching SLA was not enforced
- There was no accountability for missed patches
- The risk of not patching was not communicated to leadership
- The board was not informed of cybersecurity risks
Regulatory outcome: Equifax settled with the FTC for $575 million (the largest data breach settlement in history at the time), plus $175 million to states, plus $1 billion for consumer remediation. Total: ~$1.4 billion.
Key regulatory findings from the FTC complaint:
- Equifax failed to maintain an accurate inventory of their IT systems
- Equifax failed to implement adequate patch management
- Equifax failed to monitor network traffic for anomalous activity
- Equifax failed to segment their network to limit access to sensitive data
- Equifax failed to implement adequate access controls
Risk Management Maturity Model
| Level | Name | Characteristics |
|---|---|---|
| 1 | Initial | Ad-hoc, reactive, no formal process |
| 2 | Repeatable | Basic risk register, qualitative assessment, some SLAs |
| 3 | Defined | Standardised risk methodology (NIST, FAIR), regular assessments |
| 4 | Managed | Quantitative FAIR analysis, risk appetite defined, metrics-driven |
| 5 | Optimised | Continuous risk monitoring, automated risk scoring, board-level reporting |
Key Takeaways
- Risk = Threat × Vulnerability × Impact — all three must be present for risk to exist
- Qualitative assessment (H/M/L heat maps) is faster but subjective; Quantitative assessment (FAIR, ALE = SLE × ARO) is more precise but data-intensive — use both
- Four risk treatment options: Avoid (eliminate the activity), Transfer (insurance, contracts), Mitigate (controls), Accept (monitor) — residual risk remains after controls
- Third-party risk management is essential — most breaches involve a vendor or supply chain
- Equifax is the definitive case study: every risk management process failed (asset inventory, scanning, patching, segmentation, encryption, monitoring)
- Risk management maturity progresses from ad-hoc (Level 1) to continuous and board-reported (Level 5) — most enterprises are at Level 2-3