In this section

The AI Capabilities Matrix for Security Operations

3-4 hours · Module 1 · Free

Scenario

Your SOC lead asks where AI will have the biggest impact on team productivity. The team handles 65 alerts per day, investigates 6 incidents per shift, produces 3 incident reports per week, and has a backlog of 22 detection rules that nobody has time to write. The lead wants a prioritized deployment sequence: which function gets AI first, which gets it last, and which should not get it at all.

Alert triage: high effectiveness, low verification overhead

Alert triage is where AI delivers the most immediate time savings with the lowest risk. The analyst pastes alert details into Claude, receives a plain-language summary with ATT&CK technique mapping and enrichment queries, and uses the combined context to classify the alert. A 5-minute manual triage drops to 90 seconds. Across 65 alerts per day, that is 3.5 hours recovered.

The acceleration happens because triage is a structured input-output task. The alert fields are structured data. The summary is a translation from technical fields to natural language. The enrichment queries follow well-established patterns for each alert type. These are exactly the tasks where LLMs excel: structured transformation with verifiable output. The analyst reads the alert data alongside the summary and catches discrepancies within seconds because the source material is right there.

Where triage AI fails is in severity assessment that depends on environmental context. Claude classified the sign-in from section 1.1 as HIGH based on technique alone. Whether user_01 is a Global Admin or a shared mailbox account changes the severity from HIGH to CRITICAL or from HIGH to MEDIUM. Whether the Lagos IP belongs to a corporate VPN exit node changes the classification from suspicious to expected. AI does not know your environment. The severity judgment stays with the analyst. Module 2 provides triage prompt templates that explicitly separate summarization (AI handles) from severity classification (analyst retains).

Investigation: highest value, highest verification overhead

Investigation produces the largest time savings and the most dangerous errors. The investigation feedback loop demonstrated in Module 0 compresses 45 to 60 minutes of manual work into 15 minutes. The acceleration comes from three sources: query generation (Claude writes KQL faster than you because it does not need to look up field names), cross-table correlation (Claude identifies which tables contain relevant evidence and generates joins between them), and timeline construction (Claude assembles chronological narratives from multi-table results while the analyst would spend 10 to 15 minutes sorting events manually).

Anti-Pattern

Deploying AI across all functions simultaneously

A SOC lead deploys Claude for triage, investigation, detection engineering, documentation, compliance, and automation in the same week. The team has no verification discipline, no prompt templates, and no measurement framework. Three weeks later, an analyst deploys an AI-generated detection rule that fires 200 times per day because nobody tested it against historical data. The actual failure was deploying AI without the verification infrastructure that makes it safe. Each phase in the deployment sequence builds the discipline the next phase requires.

The risk is proportional to the value. Every investigation query must pass the five-check validation. Every factual claim must trace to evidence. The CloudAppEvents field error from section 1.1 would produce a false negative in an active investigation if the analyst ran the query without verification. Here is what the corrected investigation workflow looks like when the analyst feeds real results back to Claude:

Claude Exchange

Analyst Prompt (iteration 2, after running enrichment 1 from C1.1)

The concurrent session query returned 4 sign-ins for user_01 in the last 48 hours. Two from London IPs (17:42 and 08:15 previous day, ResultType 0). One from Lagos IP (03:14, ResultType 0). One from Lagos IP (03:17, ResultType 0, different AppDisplayName: "Microsoft Office 365 Exchange Online"). The Lagos sign-ins are 3 minutes apart targeting different applications. Generate queries to check for inbox rule creation and mail forwarding, and check AADNonInteractiveUserSignInLogs for token replay from the Lagos IP.

Claude Response

The two Lagos sign-ins targeting different apps within 3 minutes is consistent with automated token replay. The attacker authenticated once via the AiTM proxy, captured the session cookie, then replayed it against Exchange Online for mailbox access. Three queries to confirm:

// 1. Non-interactive sign-ins (token replay)

AADNonInteractiveUserSignInLogs

| where UserPrincipalName == "user_01@contoso.com"

| where IPAddress == "192.0.2.45"

| where TimeGenerated > ago(48h)

// 2. Inbox rules (corrected from C1.1)

OfficeActivity

| where UserId == "user_01@contoso.com"

| where Operation in ("New-InboxRule", "Set-InboxRule",

"Set-Mailbox", "Set-MailboxJunkEmailConfiguration")

Analyst Evaluation

✓ Corrected. Iteration 2 used OfficeActivity with UserId (correct field and type) instead of CloudAppEvents with AccountObjectId. The feedback loop self-corrected when the analyst provided evidence from iteration 1.

✓ Analysis sound. The interpretation of two sign-ins to different apps within 3 minutes as automated token replay is the correct investigative conclusion.

✗ Missing: No query for mail forwarding (Set-MailboxAutoReplyConfiguration, Set-TransportRule). Sophisticated BEC operators set forwarding rules rather than inbox rules to exfiltrate mail continuously. Add this to iteration 3.

That exchange demonstrates the investigation feedback loop. Iteration 1 (section 1.1) produced a hallucinated field name. The analyst caught it with Check 2, ran the corrected query, and provided the results to Claude in iteration 2. The second iteration produced better output because Claude was analyzing real evidence rather than generating from training data. Each iteration narrows the investigation scope and reduces hallucination risk because the model has more context to work with.

Detection engineering: highest ROI, highest production risk

A detection engineer who manually translates a threat advisory into a deployed detection rule spends 4 to 8 hours per rule: reading the advisory, mapping to MITRE ATT&CK, identifying the relevant Sentinel tables, writing the KQL, testing against historical data, documenting the rule specification, and deploying. With AI assistance, the engineer describes the advisory and receives a draft rule with ATT&CK mapping, KQL query, false positive predictions, and a rule specification. The engineer validates, tests, and deploys. Total time: 60 to 90 minutes.

The verification overhead is high but focused. KQL syntax is verifiable by running the query. The ATT&CK mapping is verifiable against the matrix. The false positive predictions require testing against 30 days of historical data. The critical risk is in the detection logic itself. An LLM that generates a rule filtering on ResultType == "0" when the detection hypothesis requires all result types creates a blind spot. The analyst discovers this gap only when the rule fails to fire on a real attack, which might be weeks or months later. Module 3 provides the complete advisory-to-rule workflow with mandatory historical testing before production deployment.

The long-term ROI exceeds every other function because detection rules are durable. A triage summary saves 4 minutes on one alert. A detection rule that fires correctly for 18 months saves thousands of analyst-hours of manual hunting. AI compresses the creation cycle, which means your team can write 5 rules per week instead of 1, and the detection coverage gap that every SOC manager stares at starts closing measurably.

IR documentation, compliance, and automation

IR documentation is where AI eliminates the report-writing bottleneck. An analyst finishes a 4-hour investigation and faces 2 to 3 hours of report writing. The technical report, the executive summary, the regulatory notification assessment, the stakeholder communications. Claude drafts all four from investigation notes in minutes. You provide the notes once. Claude produces the documents in their appropriate formats. You review, correct factual claims against your investigation evidence, verify regulatory references against the current regulation text, and ship in 30 to 45 minutes. Module 4 provides the complete template library. The risk is moderate: every factual claim must trace to evidence, and every regulatory citation must be verified. Claude might state that GDPR Article 33 notification is required when the data involved does not constitute personal data under the regulation's definition. The structure is reliable. The facts need checking.

Compliance AI assists with gap analysis, control mapping, and policy drafting. An analyst producing a gap analysis against NIST CSF 2.0 uses AI to generate the initial mapping of current controls to framework categories. AI drafts gap descriptions and recommended remediations. The analyst adjusts to match the actual control implementation. For organizations managing multiple frameworks simultaneously, AI cross-maps controls and identifies where a single implementation satisfies requirements across NIST CSF, ISO 27001, SOC 2, and PCI DSS. The verification overhead is high: an AI-generated policy that uses "should" instead of "shall" changes the enforcement posture of the entire document.

Security automation carries the highest risk because scripts execute in production. A logic error in an investigation query produces a wrong finding that an analyst catches during review. A logic error in an automation script produces wrong actions at machine speed. A containment script that disables the wrong account. An evidence collection script that overwrites volatile data before preserving it. A remediation script that modifies the wrong mailbox rule. Each error is catastrophic and irreversible. Every AI-generated script must be tested in a non-production context. Module 5 provides the validation framework.

Figure 1.2: Deployment follows the risk gradient. Start with triage (daily practice, low risk), end with automation (production execution, high risk).

The deployment sequence is not a suggestion. It is a risk management strategy. Triage builds the prompting habits your team needs every day. Investigation builds the validation discipline that catches hallucinated field names. Detection engineering builds the testing discipline that prevents false-negative rules from reaching production. By the time your team reaches automation and compliance in week 7, the verification instinct is habitual. They catch errors before errors reach production because they have been catching errors for six weeks.

← Previous Next →

Reading width

Text size

The AI Capabilities Matrix for Security Operations

Alert triage: high effectiveness, low verification overhead

Investigation: highest value, highest verification overhead

Detection engineering: highest ROI, highest production risk

IR documentation, compliance, and automation