In this section
Building Your AI Operations Foundation
Scenario
Three analysts used Claude for two weeks. Results are inconsistent. One writes structured prompts that produce reliable output. Another writes vague prompts and spends more time correcting output than the investigation would have taken manually. A third has good prompts but keeps them in personal notes. Your SOC lead asks: how do we scale what works, fix what does not, and measure whether AI is actually making the team faster?
Completing the investigation: iteration 3
The investigation feedback loop has run twice. Iteration 1 (C1.1) produced a triage assessment with a hallucinated field name. Iteration 2 (C1.2) corrected the error using OfficeActivity and identified token replay. Both iterations generated follow-up actions. In a real investigation, you continue until the hypothesis is confirmed or refuted. Here is iteration 3, where the analyst feeds the results from iteration 2 back to Claude:
Three iterations. The first produced a triage with errors. The second corrected the queries and identified the attack mechanism. The third produced containment and evidence collection with one scope gap the analyst caught. Each iteration improved because Claude had more real evidence to work with. This is the investigation feedback loop as methodology: describe context, generate queries, validate against schema, execute, feed results back, iterate.
The quality difference between iteration 1 and iteration 3 is not a model improvement. It is a context improvement. The same model with more evidence produces better output. In iteration 1, Claude had only the alert details. In iteration 3, Claude had 48 hours of sign-in history, confirmed token replay evidence, and a documented inbox rule with BEC indicators. The model's analysis became more accurate because the analyst supplied progressively richer evidence, not because the model got smarter between prompts.
The methodology formalizes this into five steps with quality gates. First, context loading: provide the incident type, initial indicators, relevant timeframe, and environmental context. A prompt saying "investigate suspicious sign-in" produces generic queries. A prompt specifying the user, IP, timestamp, MFA method, and available tables produces targeted queries. Second, generation: Claude produces output based on the context. Third, validation: apply the five-check discipline. Fourth, execution: run validated queries, paste results back. Fifth, iteration: Claude analyzes real results, identifies gaps, generates follow-up queries. A typical investigation runs 3 to 5 iterations, each taking 2 to 3 minutes of generation and validation plus query execution time.
The division of labor matters. Claude generates and analyzes. The analyst decides and acts. Claude cannot decide whether this AiTM compromise warrants immediate executive notification or whether the inbox rule pattern represents financial fraud preparation that requires legal involvement. Those decisions require organizational knowledge and professional judgment. The feedback loop produces the evidence. The analyst makes the call.
Prompt library architecture
The prompt library solves the consistency problem. Instead of each analyst developing their own patterns, the team maintains a shared library of tested prompts organized by function. The architecture uses your Claude Project as the repository:
Each prompt follows a standard format: purpose, required inputs, template text, and validation checklist. Modules 2 through 6 populate each category with tested templates. The maintenance discipline matters as much as the initial build. When an analyst discovers a prompt that produces better results, they contribute it with a note explaining the improvement. When a prompt fails due to a schema change or model update, the team updates the template. The library is a living artifact that improves with use.
Measurement framework
Without measurement, you rely on subjective impressions that tend toward optimism. An analyst who saves 20 minutes on one investigation and spends 30 minutes fighting a hallucinated table name on the next reports that AI is "really helpful" because the positive experience is more memorable. Measurement converts impressions into data.
Establish a baseline before deploying AI for operational work. For one week, record time-to-resolution for every investigation, the number of queries written per investigation, and total analyst-hours spent on triage, investigation, detection engineering, and documentation. This baseline is your comparison point. Without it, any improvement claim is anecdotal.
Track three metrics weekly after deployment. First, time-to-resolution: how long does each investigation take from alert to closure? Compare the pre-AI baseline against the AI-assisted average. The capabilities matrix in section 1.2 predicted 30 to 45 minutes saved per investigation. Measurement confirms or contradicts. If investigations are not getting faster, the problem is usually context loading (analysts providing insufficient information) or verification overhead (prompts producing low-quality output that requires extensive correction).
Second, verification overhead: how much time does each analyst spend validating AI output? If verification time exceeds the time saved by generation, AI is adding friction. This metric also identifies analysts who need training. An analyst whose verification time is consistently high may be over-verifying (checking things that do not need checking) or under-prompting (providing insufficient context, which produces lower-quality output).
Third, output quality: what percentage of AI-generated outputs are usable without modification, usable with minor corrections, or require substantial rewriting? Track by function. You may find that triage summaries are usable 90% of the time while detection rules require correction 60% of the time. The lowest-scoring functions get the most attention in prompt refinement.
Report monthly to your SOC lead and CISO. The data makes the case for continued investment, identifies where training is needed, and provides the evidence base for expanding AI adoption to additional functions.
Operational readiness checklist
Before Module 2, verify each item. Your Claude Project has a system prompt with your organization name, SIEM type, EDR platform, identity provider, query language, and formatting preferences. Test by asking Claude to generate a simple KQL query. If the output uses correct table names for your SIEM, the prompt is working.
The data classification matrix from section 1.5 is documented and communicated to every team member. Each analyst knows which tier applies to their daily data. The matrix should be a knowledge document in your Project and printed for reference.
Your shadow AI detection query is deployed as a scheduled rule or hunting query. You have baseline data showing current AI usage patterns.
The prompt library structure exists with the five categories above (empty for now; Modules 2 through 6 fill them). Each analyst has completed the five-check validation exercise from C0.2 and can demonstrate the discipline on a sample output.
Anti-Pattern
Measuring AI adoption by license count instead of operational impact
A CISO reports 12 licenses purchased, all activated. What the report misses: 3 analysts use AI daily with 40% time reduction, 5 use it occasionally with no workflow change, 4 have never opened the tool. License count measures spend. Time-to-resolution, verification overhead, and output quality measure impact. Report impact.
The measurement framework catches this gap because it tracks operational outcomes, not procurement milestones. When your monthly report shows 3 active users out of 12 licenses, the response is targeted training for the 9 who are not using the tool effectively, not a claim that AI adoption is complete because the licenses are activated.
AI Operations Principle
The investigation feedback loop, the prompt library, the measurement framework, and the validation discipline are operational processes that happen to use AI. A team that deploys AI without these processes gets inconsistent results depending on individual skill. A team that deploys these processes with AI gets consistent, measurable, improving results. The processes are the product. AI is the accelerant.
You now have the complete foundation for operational AI adoption. The feedback loop gives you the methodology. The prompt library gives you consistency. The measurement framework gives you evidence. The readiness checklist confirms you are ready. Module 2 applies all of it to your first operational function: AI-assisted alert triage.
Get weekly detection and investigation techniques
KQL queries, detection rules, and investigation methods — the same depth as this course, delivered every Tuesday.
No spam. Unsubscribe anytime. ~2,000 security practitioners.