In this section

SOC Operating Models

8-10 hours · Module 1 · Free
What you already know

You've seen SOCs operate — maybe an internal team, maybe a managed provider, maybe both. You know the trade-offs intuitively: managed SOCs have coverage hours but lack context, internal teams have context but can't staff 24/7. This section turns that intuition into a structured analysis of each model's capabilities and failure modes, so you can document the model your organization runs — including the gaps you need to mitigate.

Every model has a failure mode

Scenario

A critical alert fires at 22:47 on a Friday. Your managed SOC partner triages it as medium severity per their standard playbook — the alert matched a credential compromise pattern, the playbook says check for MFA and close if MFA is present, and MFA was present. By Monday, your internal team discovers it was the initial access phase of an AiTM campaign where the attacker captured the MFA token through a reverse proxy. The managed SOC followed the playbook correctly. The playbook didn't have a path for what AiTM looks like.

A SOC operating model defines how security monitoring, detection, and response are staffed and delivered. The model you choose determines your coverage hours, your investigation depth, your institutional knowledge retention, and your cost structure. Most organizations choose their operating model by default rather than by design — the company buys an MSSP because a conference recommended one, or hires three analysts because the CISO came from a company that ran an internal SOC. The choice is made without evaluating the trade-offs, so the failure mode that comes with the choice is invisible until an incident reveals it.

The question is not which model is best. The question is which failure mode you can live with and which you need to mitigate. Every model has trade-offs. The discipline is documenting those trade-offs so you manage them deliberately rather than discovering them during an attack.

The managed security services market hit $39.5 billion in 2025 and is projected to reach $66.8 billion by 2030. Forty-three percent of organizations outsource cybersecurity capabilities to managed providers. The hybrid/co-managed model — internal team for business-hours investigation, managed partner for after-hours L1 triage — is the fastest-growing deployment pattern in 2026. But the growth in outsourcing doesn't mean the problem is solved. It means the handoff gap has moved from "nobody monitors after hours" to "the after-hours team monitors with different knowledge, different runbooks, and different incentives than the daytime team."

Estimated time: 40 minutes.

SOC OPERATING MODELS — FAILURE MODE COMPARISON INTERNAL SOC ✓ Deep institutional knowledge ✓ Full control over processes ✓ Investigation depth ✗ Coverage gap: nights, weekends ✗ 24/7 = $660K+ (8-10 FTEs) ✗ Retention: 18-24mo L1 tenure Failure mode: coverage MANAGED SOC (MSSP/MDR) ✓ 24/7 coverage from day one ✓ Lower cost ($36-60K/yr MSSP) ✓ SLA-driven consistency ✗ Context gap: generic runbooks ✗ Incentive to close, not escalate ✗ Closure gap widens with new TTPs Failure mode: context HYBRID / CO-MANAGED ✓ Context + coverage combined ✓ Internal team owns deep work ✓ Manageable cost ✗ Handoff gap between teams ✗ Requires custom MSSP runbooks ✗ Weekly alert review overhead Failure mode: handoff THE QUESTION EVERY MODEL MUST ANSWER What happens at 22:47 on a Friday when a critical alert fires? Who sees it? Who investigates it? Who has the context to know this is AiTM and not a normal credential compromise? NE INC-NE-2026-0227-001: THE COST OF AN UNDOCUMENTED GAP MSSP followed playbook → playbook had no AiTM path → 72 hours undetected → 21 days total dwell

Figure 1.1 — The three operating models and their failure modes. Each model solves one problem and creates another. The hybrid model is the most common for mid-sized organizations — and its handoff gap is the most frequently exploited weakness.

The internal SOC

An internal SOC staffs all monitoring, triage, and investigation roles with employees. The team knows the environment — which users have legitimate reasons for unusual behavior, which servers generate expected alert patterns that look like attacks, which business processes create authentication anomalies every Tuesday at 2 AM because that's when the payroll batch runs.

Institutional knowledge — the primary advantage

This institutional knowledge is the internal SOC's defining strength. When an alert fires for "suspicious sign-in from unusual location," the internal analyst knows that three sales reps travel weekly to regions that trigger the geolocation rule. The analyst closes the alert in 30 seconds with high confidence. A managed SOC analyst who doesn't know those sales reps either escalates (consuming L2 time on a false positive) or closes without confidence (gambling that it's benign).

Institutional knowledge also enables investigation depth. When an L2 analyst investigates a potential BEC compromise, they know the CFO's communication patterns, the finance team's approval workflows, and the vendors who legitimately send payment requests. They distinguish "this email is suspicious" from "this email is inconsistent with how this vendor communicates with our finance team" — a distinction that turns a possible incident into a confirmed compromise or a confident closure. That distinction requires knowing the business, not just the telemetry.

Same alert, different triage — the context gap in action

Internal analyst (30 seconds): "Suspicious sign-in from Dallas. That's where Sarah presents at the vendor conference every March. FP — known travel pattern." → Confident closure.

MSSP analyst (5 minutes): "Suspicious sign-in from Dallas. User normally signs in from Manchester. Escalate? Close? The playbook says check MFA — MFA present. Close as Benign Positive." → Correct per playbook. Wrong if this is AiTM.

The coverage problem

The trade-off is coverage. A single analyst can work one 8-hour shift. 24/7 monitoring requires a minimum of five analysts covering three shifts plus leave and sick coverage. A functional 24/7 SOC with L1, L2, and L3 capability needs 8-10 FTEs. At $100,000-$103,000 average SOC analyst salary (Glassdoor 2026, with a range of $75,000-$137,000 depending on tier and geography), that's $800,000-$1.03 million annually before tools, training, or management overhead.

For organizations under 1,000 employees, that cost typically exceeds what the security budget can support. This means the internal SOC operates business hours only — and that coverage gap is a structural vulnerability. An alert that fires at 11 PM sits in the queue until 8 AM. That's 9 hours of dwell time before any human looks at it. For a Friday night alert, it's potentially 60 hours. Sophisticated attackers know this. They execute outside business hours specifically because the internal SOC's coverage gap is predictable.

The coverage cost in numbers

🕐 Business hours only (3 FTEs): $300K-$400K/yr — 9-hour overnight gap, 60-hour weekend gap

🕐 24/7 internal (8-10 FTEs): $800K-$1.03M/yr — full coverage, full institutional knowledge

🕐 Hybrid: 3 internal + MSSP after-hours: $300K + $36-60K/yr — coverage filled, context gap created

The hybrid model is cheaper by $400K-$600K — but the context gap must be actively managed or attacks hide in the handoff.

The retention problem

The second structural challenge is analyst retention. SOC work — particularly L1 triage — is repetitive, high-volume, and high-pressure. The Tines 2025 Voice of the SOC Analyst report found that 71% of SOC analysts report burnout and 64% are considering leaving their roles within a year. The industry average L1 tenure is 18-24 months before they move to a senior role or a different organization.

Every departure takes institutional knowledge with it. The replacement has technical skills but not the environmental context, and the ramp time to rebuild that context is 3-6 months. If your SOC has three analysts and one leaves, you've lost a third of your institutional knowledge. The remaining two carry extra load during the search and ramp period. One of them burns out faster. The cycle accelerates.

Run this query to see your current data connector status — which telemetry sources are actively ingesting into your Sentinel workspace:

KQL — Which Data Sources Are Active?
// What telemetry is your SOC actually receiving?
Usage
| where TimeGenerated > ago(1d)
| where IsBillable == true
| summarize DataGB = round(sum(Quantity) / 1024, 2)
    by DataType
| sort by DataGB desc

If SigninLogs, OfficeActivity, and DeviceProcessEvents don't appear in the output — those data connectors aren't configured. Your SOC can't detect what it can't see. The output tells you exactly which operating model decisions are possible with your current telemetry.

The managed SOC — MSSP and MDR

A managed SOC outsources monitoring to a third party. The market has fragmented into distinct models — traditional MSSPs that monitor and alert, MDR providers that detect and respond, and SOC-as-a-Service that replicates full SOC workflows. The MDR market alone surpassed $9.6 billion in 2025. The distinctions matter for the handoff design.

What managed SOCs do well

For alerts that match a runbook, managed SOC delivery is consistent and fast. The SLA defines response time — typically 15 minutes for critical alerts, 30 for high, an hour for medium. The provider monitors 24/7 across their client base. Every alert gets triaged within the SLA.

The market has segmented significantly. Traditional MSSPs provide broad infrastructure monitoring — firewall management, log aggregation, vulnerability scanning, compliance reporting — at $36,000-$60,000 annually. They monitor and alert but your team still owns the response. MDR providers focus specifically on detection and response, with analyst teams that investigate and take containment action, at $96,000-$420,000+ annually depending on scope. SOC-as-a-Service replicates full SOC workflows as a subscription: L1-L3 triage, investigation narratives, escalation, and reporting. Gartner increasingly views SOCaaS as a delivery mechanism for MDR rather than a separate category, and the lines are converging as security services spending grows at 11.1% in 2026. The distinction matters for your operating model because it determines what happens after an alert fires — does the provider tell you about it (MSSP), investigate and contain it (MDR), or run a complete SOC workflow against it (SOCaaS)?

MSSP vs MDR vs SOCaaS — what you actually get

MSSP ($36-60K/yr): Monitors and alerts. Your team investigates and contains. Broad coverage, shallow depth. You own the response.

MDR ($96-420K+/yr): Detects, investigates, and contains. Their analysts take action. Narrow focus, deep response. They own the outcome.

SOCaaS ($variable): Full SOC workflow as a service: L1-L3, escalation, reporting. Category converging with MDR.

The question: When an alert fires at 2 AM, does the provider tell you (MSSP), act on it (MDR), or run your entire playbook (SOCaaS)?

The context gap

The limitation is depth. The managed SOC analyst is a generalist who monitors 15-40 clients simultaneously. They know your SIEM and your detection rules. They don't know your environment. When an alert falls outside the runbook — an unusual OAuth consent grant, an authentication sequence suspicious for your organization but normal for another client, a PowerShell command your admin team runs legitimately but resembles a malware dropper — the analyst has three options: escalate, close as indeterminate, or attempt investigation with incomplete context.

Most managed SOC analysts close ambiguous alerts. This is the rational outcome of a business model where throughput and SLA compliance are the contractual metrics. Escalation volume is tracked — too many escalations signals the analyst isn't triaging effectively. The incentive pushes toward closure. A 2025 industry survey found that 40% of all SOC alerts go uninvestigated, and of those that are reviewed, 90% are false positives. In the managed SOC model, the closure incentive compounds this problem: ambiguous alerts that might warrant investigation are closed to maintain throughput.

The closure gap widens

The closure gap — the difference between what the managed SOC closes and what your internal team would investigate — widens every quarter as new attack techniques enter the threat landscape. AiTM phishing, OAuth consent abuse, device code phishing, token theft through session cookie replay — these techniques didn't exist in most managed SOC runbooks two years ago, and many still don't cover them. The managed SOC catches what the runbook covers. The runbook covers what's been seen before. Novel techniques live in the gap.

What we see in 90% of MSSP engagements

The contract specifies 24/7 monitoring and L1 triage with a 15-minute SLA on critical alerts. Monthly reports show 100% SLA compliance. But when you compare the MSSP's closure patterns to your internal team's, the MSSP closes 93% of alerts without escalation while your internal team closes 78%. That 15-point gap represents alerts your team would investigate but the MSSP closes — and it's where the attacks hide. Nobody compares the patterns because the contract measures speed, not quality.

If you have a managed SOC partner, run this query to compare closure patterns between your internal team and your MSSP — the gap between the two is your handoff risk:

KQL — MSSP vs Internal Closure Pattern Comparison
// Compare closure rates: your team vs your MSSP
SecurityIncident
| where TimeGenerated > ago(30d)
| where Status == "Closed"
| extend ClosedBy = Owner.assignedTo
| extend Team = iff(ClosedBy has "@yourdomain.com",
    "Internal", "MSSP")
| summarize
    Total = count(),
    ClosedNoEsc = countif(Classification != "TruePositive"),
    Escalated = countif(Classification == "TruePositive")
    by Team
| extend CloseRate = round(100.0 * ClosedNoEsc / Total, 1)

If your MSSP's CloseRate is 10+ points higher than your internal team's, you have a handoff gap. Those extra closures are alerts your team would investigate but the MSSP doesn't — and that's where novel attack techniques hide.

The hybrid model

Most mid-sized organizations run a hybrid: internal team for business-hours investigation and strategic security functions, managed SOC partner for after-hours L1 triage. The hybrid captures both advantages — institutional knowledge during investigation hours, 24/7 coverage for detection. It's the fastest-growing model in 2026 because the economics are compelling and the alternative (internal 24/7) is unaffordable for most.

Where the handoff breaks

The hybrid model's vulnerability is the handoff. When the internal team leaves for the day, the managed SOC takes over with a different knowledge base, different runbooks, different escalation thresholds, and different incentives. The handoff gap is not a single point of failure — it's a collection of small differences that compound.

The internal team escalates 22% of alerts for deeper investigation. The managed SOC escalates 7%. That 15-point difference means the managed SOC closes three alerts for every one the internal team would investigate. Over a month, that's dozens of alerts with less scrutiny than they'd receive during business hours.

The internal team's runbooks include environment-specific decision paths — "if the user is in the Finance group, treat any inbox rule creation as high priority." The managed SOC's runbooks are generic. The Finance group path doesn't exist.

The internal team knows about the Tuesday payroll batch, the Wednesday patching window, the Thursday data export. The managed SOC doesn't. Every legitimate scheduled process that generates alerts creates false positive volume the managed SOC lacks context to dismiss efficiently.

INC-NE-2026-0227-001 — the handoff gap in practice

At Northgate Engineering, the hybrid model's handoff gap was exposed by INC-NE-2026-0227-001. The AiTM credential phishing campaign's initial alerts arrived at 22:47 on a Friday. BlueVoyant triaged using the standard credential compromise playbook — check for MFA, close if present. MFA was present because the attacker captured the token through a reverse proxy. The playbook had no decision path for "MFA present but captured via AiTM."

The internal team discovered the compromise three days later during their Monday shift review. By that point: 72 hours of persistent mailbox access, three months of executive email read, inbox rules created to hide evidence, and a BEC wire transfer attempt sent to the CFO.

The response was not to abandon the hybrid model — the economics still make it the right choice for NE. The response was to close the handoff gap through three specific changes: custom MSSP runbooks for identity-specific attack patterns (AiTM, token theft, device code phishing), an after-hours escalation trigger for any sign-in followed by inbox rule creation within 30 minutes regardless of MFA status, and a weekly review of MSSP-closed alerts to identify patterns the internal team would have investigated differently.

Documenting the model decision — the ADR format

The operating model is not a choice you make once and forget. It's a decision with known trade-offs that should be documented, reviewed, and updated as the organization and threat landscape change. NE documents their operating model as an Architecture Decision Record — a structured format that captures the decision, the context, the alternatives considered, and critically, the known gaps.

Why ADR format

The ADR answers five questions: What operating model did we choose? Why this model over the alternatives? What does the model give us? What does the model cost us? What gaps exist that we've accepted or plan to mitigate? The format forces specificity — you can't write "we have good coverage" without defining what "good" means, and you can't claim "acceptable risk" without naming the risk.

The "known gaps" section is the most important part. Every model has gaps. Documenting them converts hidden assumptions into tracked risks. NE's original ADR did not include identity-specific escalation as a known gap. The handoff gap was documented in general terms: "after-hours triage performed by MSSP with generic runbooks." What wasn't documented was the specific consequence — that AiTM attacks would bypass the generic runbooks because MFA presence isn't sufficient evidence of benign authentication. The gap existed from day one. The incident revealed it.

Here's what NE's updated ADR looks like after INC-NE-2026-0227-001 — this is the format your ADR should follow:

NE Operating Model ADR — Excerpt (v2.1, Post-Incident)

Decision: Hybrid model retained — internal business hours + BlueVoyant after-hours L1

Rationale: 24/7 internal requires 8-10 FTEs ($800K+/yr). Current budget supports 3 FTEs + MSSP ($336K-$360K/yr). Hybrid provides 24/7 coverage within budget constraints.

What the model gives us: 24/7 L1 triage coverage. Internal L2/L3 investigation during business hours. Institutional knowledge retained for complex investigations.

What the model costs us: $336K-$360K total (3 × $100K FTE + $36-60K BlueVoyant).

Known gap 1: After-hours triage uses generic runbooks. Identity-specific techniques (AiTM, token theft, device code phishing) require custom MSSP runbooks. Status: Custom runbooks deployed March 2026.

Known gap 2: MSSP closure rate 15 points higher than internal team. Weekly review of MSSP closures required to identify missed investigations. Status: Weekly review implemented, SOC lead owns.

Known gap 3: VIP entity awareness limited for MSSP. VIP watchlist shared monthly. Status: Active, last updated 2026-04-01.

Building your own ADR

When you evaluate your own operating model, start with the Friday-night question: what happens at 22:47 when a critical alert fires? Trace the full path from alert to resolution. Identify every point where context, coverage, or escalation depends on an assumption rather than a documented procedure. Those assumptions are your operating model's gaps.

Ask specifically: Does the after-hours team have the same runbooks as the daytime team? Are there environment-specific decision paths the after-hours team doesn't know about? What alert patterns would your internal team investigate that the after-hours team would close? Are there specific techniques — AiTM, OAuth abuse, living-off-the-land — that the after-hours runbooks don't address?

The ADR is the deliverable from this section. By the end, you have a documented operating model with the model choice, the rationale, the cost, the known gaps, and the mitigation plan for each gap. That document becomes part of the SOC charter you build in Section 1.7.

SOC Operations Principle

Every SOC operating model has a failure mode. The internal SOC fails on coverage. The managed SOC fails on context. The hybrid fails on the handoff between them. The operating model decision is not which model is best — it's which failure mode you can mitigate, and whether you've documented the gaps so you manage them deliberately rather than discovering them during an incident.

Next
Section 1.2 — Analyst Tiers and Role Architecture. You've chosen the operating model. Now you define the roles within it — L1 triage, L2 investigation, L3 deep analysis — and the scope boundaries that determine whether an alert exceeds one tier's capability or gets stuck there.
Unlock the Full Course See Full Course Agenda