In this section

Operational Metrics — Speed vs Quality

8-10 hours · Module 1 · Free
What you already know

Section 1.5 built the triage framework that produces classified incidents with dispositions. This section builds the metrics framework that measures whether those classifications are accurate and whether the SOC is effective — not just fast. The disposition data from every triage closure feeds the quality metrics that make SOC effectiveness measurable for the first time in most organizations.

Two dashboards, one SOC

Scenario

NE's monthly security report to the board shows three metrics: mean time to triage (6 minutes), SLA compliance (96%), and total alerts processed (2,400). The board is satisfied — the numbers are green. After INC-NE-2026-0227-001, Rachel calculates four metrics the board has never seen: mean time to detect (14 days), false positive rate (47%), classification accuracy (unmeasurable — no L2 review data exists), and external discovery rate (60% — users find more compromises than the SOC does). The speed dashboard says the SOC is healthy. The quality dashboard says the SOC is failing its mission. Both are accurate. They measure different things.

The metrics a SOC tracks determine what the SOC optimizes. Track mean time to triage and analysts optimize for speed — faster closures, fewer investigation minutes per alert, higher throughput. Track mean time to detect and the conversation changes entirely: why did this attack exist for 14 days before detection? What detection rule should have fired? What enrichment step would have caught it during triage?

Most SOCs track speed metrics because they're automatic. MTTT and SLA compliance fall out of incident queue data with no manual effort. Quality metrics — MTTD, false positive rate, classification accuracy, external discovery rate — require deliberate investment: systematic disposition classification, L2 review of L1 closures, and tracking of who first reported each confirmed incident. The effort is modest (30 seconds per closure for dispositions, 1 hour per week for L2 review), but the data they produce changes everything about how the SOC understands its own effectiveness.

Estimated time: 45 minutes.

SPEED METRICS vs QUALITY METRICS SPEED METRICS (what most SOCs track) MTTT: Mean time to triage (6 min ✓) SLA compliance (96% ✓) Alert volume processed (2,400/mo ✓) Throughput per analyst (800/mo ✓) Measures: how fast the SOC processes alerts Does NOT measure: whether alerts are correct QUALITY METRICS (what matters) MTTD: Mean time to detect (14 days ✗) False positive rate (47% ✗) Classification accuracy (unmeasured ✗) External discovery rate (60% ✗) Measures: whether the SOC catches attacks Requires: systematic dispositions + L2 review NE BEFORE vs AFTER INC-NE-2026-0227-001 Before: MTTT 6m, SLA 96% ← green dashboard After 6 months: MTTD 4.2h, FP 18%, Ext disc. 15% MTTD 14d, FP 47%, Ext disc 60% ← failing Same team, same tools, different measurement

Figure 1.6 — Speed metrics and quality metrics measure different things. NE's speed dashboard was green while quality metrics showed a 14-day MTTD and 60% external discovery rate. The same team improved quality metrics dramatically in six months — by measuring what matters and acting on the data.

Speed metrics — what they tell you

Speed metrics measure how fast the SOC processes its workload. They're useful for staffing analysis and capacity planning. They're misleading as indicators of SOC effectiveness.

Mean time to triage (MTTT)

MTTT measures the average time from alert creation to first analyst action — assignment, classification, or escalation. It tells you how long alerts wait in the queue before someone looks at them. A high MTTT (30+ minutes for critical alerts) indicates insufficient staffing during that period. A consistently high MTTT indicates a persistent capacity problem.

What MTTT doesn't tell you: whether the analyst's action was correct. An analyst who closes 20 alerts per hour with a 3-minute MTTT might be classifying incorrectly on 8 of them. MTTT looks excellent. Classification quality is 60%. The speed metric conceals the accuracy problem.

SLA compliance

SLA compliance measures what percentage of alerts are triaged within the contracted time window — typically 15 minutes for critical, 30 for high, 60 for medium. It's a contractual metric for managed SOC engagements and an operational metric for internal teams. A 96% SLA compliance rate means 4% of alerts exceeded the time window. That 4% might include the critical alerts that arrived during peak volume when all analysts were occupied.

SLA compliance is a floor, not a target. Meeting the SLA means the SOC isn't falling behind. It doesn't mean the SOC is effective. The SOC can achieve 100% SLA compliance by closing every alert in under 15 minutes — even if half the closures are incorrect.

Alert volume and throughput

Total alerts processed and alerts-per-analyst measure workload and capacity. High volume relative to staffing indicates either a noisy detection library (too many false positives), insufficient staffing, or both. Low volume might indicate good detection quality or might indicate that the detection library is too small (few rules, so few alerts — but also few detections).

These metrics are useful inputs for the L3 quality review and the budget conversation. They don't measure SOC effectiveness.

Quality metrics — what they reveal

Quality metrics measure whether the SOC fulfills its mission: detecting threats, investigating effectively, and improving over time. They require deliberate data collection.

Toggle each quality metric below to see how it changes the SOC's understanding of its own effectiveness. Start with all metrics off — the SOC sees only speed data. Enable them one at a time and watch the picture change:

This simulator uses NE's pre-incident baseline to demonstrate how quality metrics change the SOC's self-understanding.

Mean time to detect (MTTD)

MTTD measures the average time between when malicious activity begins and when the SOC detects it. This is the metric that most directly measures SOC effectiveness. IBM's 2025 Cost of a Data Breach Report found organizations averaged 158 days to identify breaches. Mandiant's M-Trends 2026 puts the median global dwell time at 14 days. The top 25% of organizations detect incidents within hours.

MTTD is hard to calculate because it requires knowing when the attack started — information that typically comes from the investigation, not from the alert. An alert fires on Day 14 of a compromise. The investigation determines the attacker gained initial access on Day 1. The MTTD for that incident is 14 days.

The Red Canary team makes an important distinction: use median rather than mean. One extreme outlier — an unmonitored asset where a threat sits idle for two months — can skew the mean significantly while the median is more resilient to outliers. NE tracks both, but the median is the primary reporting metric.

MTTD depends on detection rule coverage (rules have to exist for the technique), detection rule quality (rules have to fire correctly), and triage quality (the alert has to be investigated, not dismissed). Improving MTTD means improving all three — which is why it's the single best metric for overall SOC effectiveness.

How to measure MTTD practically

Calculating MTTD requires two timestamps: when the malicious activity started, and when the SOC detected it. The detection timestamp is straightforward — it's when the alert fired or when the analyst first confirmed the incident. The activity-start timestamp is harder — it comes from the investigation.

For incidents detected by analytics rules, the investigation should determine the earliest evidence of compromise. An alert fires on Day 3 of a compromise when the attacker creates an inbox rule. The investigation traces the initial access to Day 1 — a compromised sign-in. The MTTD is 3 days, not the 6 minutes the alert took to triage after firing.

For incidents detected externally (user report, partner notification), the MTTD is measured from the earliest evidence of compromise to the external notification. These incidents are the most valuable MTTD data points because they represent complete detection failures — the SOC's rules didn't fire at all, and the entire dwell time was undetected.

NE records both timestamps in the incident ticket for every confirmed True Positive: "First evidence of malicious activity" and "First detection by SOC or external." The MTTD calculation runs monthly from this data. The median is the primary metric (resistant to outliers). The mean is tracked secondarily. Both are trended month over month.

The MTTD decomposition

When MTTD is high, the decomposition tells you which component to fix. Break MTTD into three parts: detection gap time (time from attack start to first alert), triage gap time (time from alert to analyst assignment), and investigation gap time (time from assignment to confirmed detection). If the detection gap is large, the problem is detection rule coverage — the rules didn't fire because no rule existed for the technique. If the triage gap is large, the problem is queue management — the alert fired but sat unreviewed. If the investigation gap is large, the problem is triage quality — the analyst saw the alert but didn't recognize it as malicious.

NE's 14-day MTTD for INC-NE-2026-0227-001 decomposed entirely as detection gap time. The alert that eventually fired was a BEC indicator on Day 21 — not the initial AiTM access on Day 1. No rule existed for AiTM token characteristics. The detection gap was 100% of the dwell time. The fix was detection rule improvement (Modules 2-6), not triage improvement.

False positive rate

False positive rate measures what percentage of alerts are noise — alerts that fired but don't represent genuine threats or threat-relevant activity. The calculation uses the disposition data from Section 1.5: FP closures divided by total closures.

Run this query against your Sentinel workspace right now. It calculates your SOC's false positive rate, classification distribution, and top noisy rules — the three numbers that tell you whether your SOC is processing real threats or drowning in noise:

KQL — Your SOC's False Positive Rate (Last 30 Days)
// Calculate your SOC's FP rate and classification distribution
// Run this in Sentinel Logs — no modification needed
SecurityIncident
| where TimeGenerated > ago(30d)
| where Status == "Closed"
| summarize
    TotalClosed = count(),
    TruePositive = countif(Classification == "TruePositive"),
    BenignPositive = countif(Classification == "BenignPositive"),
    FalsePositive = countif(Classification == "FalsePositive"),
    Undetermined = countif(Classification == "Undetermined"
        or Classification == "")
| extend FP_Rate = round(100.0 * FalsePositive / TotalClosed, 1)
| extend TP_Rate = round(100.0 * TruePositive / TotalClosed, 1)
| extend Unclassified_Rate = round(
    100.0 * Undetermined / TotalClosed, 1)

The output gives you three numbers. FP_Rate is your false positive rate — what percentage of your SOC's work is dismissing noise. Above 40% means the SOC spends more time on noise than on real threats. Unclassified_Rate reveals how many incidents are closed without any classification — every unclassified closure is invisible to quality metrics and could be hiding a missed attack. TP_Rate tells you what percentage of your SOC's work results in confirmed threats.

If your Unclassified_Rate is above 50%, your SOC doesn't have a false positive problem — it has a classification discipline problem. Those unclassified closures are the data gap that makes quality measurement impossible.

Now run this query to identify which rules produce the most false positives — these are your top tuning targets:

KQL — Top 10 Noisiest Rules
// Which rules produce the most FPs? Top tuning targets.
SecurityIncident
| where TimeGenerated > ago(30d)
| where Status == "Closed"
| where Classification == "FalsePositive"
| extend RuleName = tostring(
    parse_json(tostring(AdditionalData.alertProductNames))[0])
| summarize FP_Count = count() by Title
| sort by FP_Count desc
| take 10

The top 3 rules in this list are your highest-impact tuning targets. Fixing these three rules will reduce your false positive rate more than any other single action. At NE, the top 3 accounted for 65% of all false positives — fixing them dropped the FP rate from 47% to 22% in one tuning cycle.

A healthy false positive rate is below 25%. Between 25-40% indicates a detection library that needs tuning — specific rules are firing on legitimate activity that should be excluded. Above 40% means the SOC is spending more time dismissing noise than investigating real threats, which directly impacts MTTD (less time available for real investigation), analyst morale (repetitive FP dismissal causes burnout), and classification accuracy (alert fatigue degrades attention).

NE's false positive rate was 47% before the incident. The primary sources: the "suspicious PowerShell" rule firing on IT automation (accounting for 30% of all FPs), the "impossible travel" rule firing on VPN users (20% of FPs), and the "unusual inbox rule" rule firing on users who legitimately manage their email (15% of FPs). Monthly tuning — addressing the top 3 FP sources each month — reduced the rate from 47% to 18% over six months. Each percentage point of FP reduction freed analyst time for genuine investigation.

The FP rate is not just an operational metric — it's a financial one. Every false positive consumes analyst time that costs real money. Set the sliders below to match your SOC and see the annual dollar cost of false positive triage. Move the FP rate slider down to see what a successful tuning program would save:

Classification accuracy

Classification accuracy measures whether L1 triage dispositions are correct. The measurement requires L2 review of a sample of L1 closures — typically 10-20% per week, selected randomly plus all Undetermined classifications.

The L2 reviewer opens the closed incident, reads the L1 analyst's disposition and rationale, runs the enrichment queries independently, and determines whether the L1 classification was correct. Disagreements are categorized: "L1 classified FP, L2 determines TP" (missed attack — critical error), "L1 classified TP, L2 determines BTP" (over-escalation — inefficiency), "L1 classified FP, L2 determines BTP" (wrong category but correct outcome — framework refinement needed).

A healthy classification accuracy rate is above 85%. Below 80% indicates systemic triage problems — the framework needs revision, the analyst needs training, or the detection rules produce alerts that are too ambiguous to classify at L1.

NE couldn't calculate classification accuracy before the incident because no L2 review existed. After implementing weekly 10% review, initial accuracy measured 73%. Within four months — through framework refinement and targeted training on the alert types with lowest accuracy — accuracy improved to 89%.

External discovery rate

External discovery rate measures what percentage of confirmed incidents are discovered by someone other than the SOC — users who report suspicious emails, partners who notify of compromise, customers who detect fraud, or law enforcement that reports credential exposure. M-Trends 2026 data shows 52% internal detection and 48% external discovery globally.

This metric is the most uncomfortable for SOC teams because it directly measures what the SOC missed. An external discovery rate of 60% means users and external parties find more compromises than the SOC does. The SOC's detection library, triage process, and investigation depth all failed for those incidents.

NE's external discovery rate was 60% before the incident (INC-NE-2026-0227-001 was itself an external discovery — the CFO spotted the suspicious wire transfer request). After improving detection coverage and triage quality, the rate dropped to 15% — meaning the SOC now catches 85% of incidents internally.

What we see in 90% of SOC reports to leadership

MTTT and SLA compliance. Both green. The CISO reports "our SOC processes alerts within 6 minutes with 96% SLA compliance" and leadership concludes the security operations program is working. Nobody asks: of the 2,400 alerts processed, how many were real attacks? Of the compromises that existed in the environment, how many did the SOC detect? The report measures activity. It doesn't measure effectiveness. The board doesn't know the difference because nobody presented the quality metrics that would show it.

Building the metrics framework

The deliverable from this section is a metrics framework that defines which metrics the SOC tracks, how each is calculated, what data sources feed them, and what targets the SOC aims for.

The minimum viable metrics set

Start with four quality metrics alongside existing speed metrics. Don't remove speed metrics — they're useful for capacity planning. Add quality metrics that reveal effectiveness.

MTTD: calculated from investigation data (time of initial access vs time of detection). Requires investigation timeline reconstruction on every confirmed TP. Target: under 24 hours for most incident types, under 1 hour for critical.

False positive rate: calculated from disposition data. Requires systematic classification on every closure (Section 1.5). Target: under 25%.

External discovery rate: calculated from incident source data. Requires tracking who first reported each confirmed incident. Target: under 20%.

Escalation accuracy: calculated from L2 review of L1 escalations. Requires L2 to tag each escalation as "warranted" or "reclassifiable at L1." Target: 60-80%.

Reporting cadence

NE reports metrics at three cadences. Weekly: the SOC lead reviews raw numbers during the L3 quality session — are there trends that need immediate attention? A sudden spike in FP rate for a specific rule indicates either a new environmental pattern or a misconfigured detection rule. A jump in external discovery rate indicates detection gaps. Weekly review catches these signals before they compound.

Monthly: the SOC lead produces a one-page metrics summary for Rachel — tracking month-over-month trends across all quality metrics. The monthly report includes the top 3 FP sources (for tuning prioritization), the escalation accuracy breakdown (for training prioritization), and any incidents that exposed detection gaps (for backlog prioritization).

Quarterly: Rachel produces a board-ready report showing the quality metrics alongside the speed metrics — this is the report that demonstrates SOC effectiveness with data.

The quarterly report is the most important because it's the document that justifies the SOC's continued investment. "MTTD improved from 14 days to 4.2 hours. FP rate reduced from 47% to 18%. External discovery rate dropped from 60% to 15%. Same team, same tools, improved operational discipline" is a narrative that leadership understands and that justifies every hour invested in the operational foundation.

Using metrics to drive specific actions

Each quality metric maps to specific operational actions when it degrades. This mapping is what makes metrics operational rather than decorative.

MTTD increasing: decompose into detection gap, triage gap, and investigation gap. If detection gap is the driver, the detection backlog needs new rules for the techniques causing the gap. If triage gap is the driver, staffing or queue management needs attention. If investigation gap is the driver, the triage framework may need clearer classification criteria.

FP rate increasing: identify the top 3 FP sources by rule name. Schedule tuning for those rules in the monthly L3 review. Most FP rate increases are caused by environmental changes — a new application, a new admin workflow, a new service account — that create legitimate activity matching existing detection rules. The fix is an exclusion or watchlist entry, not a rule rewrite.

External discovery rate increasing: review the incidents discovered externally. What technique was used? What detection rule should have caught it? Add the missing detection to the backlog as high priority. External discoveries are the strongest possible signal for detection gaps because they represent confirmed real-world attacks that the SOC completely missed.

Escalation accuracy dropping: review which alert types are being escalated incorrectly. If L1 is over-escalating a specific alert type, the triage framework needs clearer criteria for that type. If L1 is under-escalating, check whether the instinct trigger culture is healthy — analysts may have stopped using the instinct trigger due to implicit pressure to close.

The metrics are not reports to file. They're diagnostic signals that drive specific actions. A metric without a response plan is decoration. Every metric in the framework has a defined response: if it degrades past the threshold, this specific action is taken. That mapping is what turns a dashboard into a management system.

SOC Operations Principle

Speed metrics measure how fast the SOC processes alerts. Quality metrics measure whether the SOC catches attacks. Optimizing speed — faster triage, higher throughput, tighter SLAs — produces a green dashboard and a SOC that processes noise efficiently while missing real threats. Quality metrics require deliberate investment: systematic dispositions, L2 review, investigation timeline data. The investment is modest. The insight it produces changes everything about how the SOC understands itself.

Next
Section 1.7 — The SOC Charter. The operating model, tiers, handover, escalation, triage, and metrics are six operational documents. The SOC charter is the seventh — the master document that assembles them all into a single, authoritative reference.
Unlock the Full Course See Full Course Agenda