In this section

Your First 90 Days

3-4 hours · Module 0 · Free
What you already know

You have the detection gap measurement (Section 0.1), the structural explanation (Section 0.2), the threat landscape (Section 0.3), the operational boundaries (Section 0.4), the approved budget (Section 0.5), the confirmed prerequisites and data sources (Section 0.6), and the skills assessment (Section 0.7). This section converts all of it into an executable 90-day plan with the metrics that prove it is working. Day 1 starts when you finish reading.

Scenario

Rachel has Phil's approval, Priya's time is blocked, and the prerequisites are confirmed. She needs two things before Day 1: the metrics that will prove the program is working at Day 90, and the week-by-week plan that gets Priya from zero campaigns to a quarterly report. Phil will review the program at the 90-day mark. If the report shows measurable progress, the program continues. If it shows vague claims about "improved visibility," Phil will reallocate Priya's hours to triage backlog. The metrics must be specific, measurable, and tied to the outcomes Rachel promised in the business case.

Seven metrics: four for value, three for health

Hunting metrics split into two categories. Value metrics measure what the program produces for the organization. Health metrics measure whether the program is operating sustainably.

Value metric 1: Detection coverage trend. The ATT&CK coverage ratio from Section 0.1, measured quarterly. Starting point: NE's current 29.5%. Target: measurable improvement each quarter as hunt-derived detection rules are deployed. This is the headline metric in every quarterly report because it directly quantifies how much of the attack surface has moved from unmonitored to monitored.

Value metric 2: Hunt-derived detection rules. The number of analytics rules deployed to production that originated from hunting campaigns. Each rule represents a permanent conversion of a known-unknown to a known-known. Target: one rule per campaign as a minimum. Three campaigns per quarter produces three rules per quarter. At the end of Year 1, twelve hunt-derived rules are operating automatically in Sentinel.

Value metric 3: Hunt discovery rate. The percentage of campaigns that produce a positive finding (evidence of the hypothesized threat or a related threat). Industry benchmarks vary widely, but a 10 to 20% positive finding rate indicates well-targeted hypotheses. A rate above 30% suggests the organization has significant undetected activity. A rate of 0% across twelve campaigns suggests the hypotheses are targeting techniques the environment is not exposed to and the backlog needs recalibration.

Value metric 4: Dwell time by discovery source. Median dwell time for incidents discovered through hunting versus incidents discovered through automated detection versus incidents discovered through external notification. If hunting consistently discovers intrusions with shorter dwell times than external notification, the program is compressing the window between compromise and containment.

Health metric 5: Cadence adherence. The percentage of scheduled hunting sessions that actually occurred. Four hours blocked per week across 12 weeks is 48 hours. If only 28 hours were executed, cadence adherence is 58%. Below 75% indicates the protected time is not being defended against competing priorities. This metric is the early warning system for program decay: when cadence drops, campaign completion follows, and the quarterly report thins.

Health metric 6: Backlog depth. The number of hypotheses in the prioritized backlog. A healthy program maintains 8 to 12 hypotheses ready for execution. Below 5 indicates the backlog is not being refreshed from IR findings, coverage gap analysis, and environmental changes. Above 20 indicates hypotheses are accumulating without execution, which suggests a cadence or capacity problem.

Health metric 7: Rule deployment velocity. The number of days between a hunt finding and the corresponding detection rule being deployed in production. Target: under 14 days. If findings take 60 days to become rules, the hunt-to-detection pipeline is bottlenecked, likely at the detection engineering handoff from Section 0.4.

Avoiding vanity metrics

Three metrics look useful but measure nothing actionable and actively mislead leadership about program health.

Total hours spent hunting measures effort, not outcome. A program that spends 200 hours and produces two rules is less effective than one that spends 48 hours and produces twelve rules. Reporting hours creates a perverse incentive to extend campaigns beyond their useful life.

Total queries executed measures activity, not investigation quality. An analyst who runs 500 queries in a session without a structured hypothesis is querying, not hunting. Reporting query counts incentivizes broad, unfocused exploration instead of hypothesis-driven investigation.

Total alerts reviewed is a triage metric, not a hunting metric. If it appears in the hunting report, the boundary between hunting and triage has dissolved. This is the most dangerous vanity metric because it signals that hunting hours are being consumed by triage overflow, which is exactly the failure mode that kills programs.

90-DAY ROADMAP — FOUR PHASES Week 1 Week 3 Week 5 Week 9 Day 90 Phase 1: Foundation Prerequisites · Data audit · Backlog Phase 2: First Campaign Campaign 1 · Methodology · Template Phase 3: Campaigns 2-3 Cross-cluster breadth · 2+ rules Phase 4: Report Campaign 4 · Q1 report 12 hrs 8 hrs 16 hrs 12 hrs Total: 48 hours (4 hrs/week × 12 weeks) Day 90 checkpoint: 4 campaigns, 3+ rules deployed, coverage improvement measured, quarterly report delivered.

Four phases from zero to a quarterly report. Each phase produces documented output that builds toward the Day 90 leadership review.

The four-phase 90-day roadmap

Phase 1: Foundation (Weeks 1 to 2). Confirm all prerequisites from Section 0.6. Run the data source audit query and resolve any ingestion gaps. If a critical table like AADNonInteractiveUserSignInLogs is not ingested, enable it during Week 1 and allow data to accumulate before campaigns that depend on it.

Establish the hypothesis backlog with 8 to 12 hypotheses prioritized by three factors: coverage gap (which techniques have no detection rule), threat landscape relevance (which techniques are actively used against M365 environments per Section 0.3), and data source availability (which hypotheses can be tested with data that is currently ingested). Document the Hunt Cycle methodology so it is repeatable by any analyst who inherits the program. Set up the documentation template for campaign records. Define the metrics dashboard using the seven metrics above. This phase produces no campaigns. It builds the infrastructure that campaigns require.

Phase 2: Backlog construction and first campaign (Weeks 3 to 4). Execute Campaign 1 from the backlog. This should be the most conservative hypothesis: a technique with well-documented patterns, available data sources, and existing KQL examples. AiTM token replay detection against SigninLogs and AADNonInteractiveUserSignInLogs is a strong first campaign because the technique is prevalent, the data sources are well understood, and the KQL patterns are documented in Module 4. Document the full campaign record: hypothesis, data sources, KQL, results, analysis, finding, and detection rule. The first campaign teaches the methodology. The finding is secondary.

Phase 3: Campaigns 2 and 3 (Weeks 5 to 8). Each campaign should target a different data cluster to build breadth across the environment. If Campaign 1 targeted AiTM in the identity cluster, Campaign 2 might target OAuth consent phishing in the collaboration cluster (CloudAppEvents, AuditLogs) and Campaign 3 might target suspicious process execution chains on the endpoint cluster (DeviceProcessEvents).

By Week 8, the hunter has executed three complete Hunt Cycle iterations, produced three documented campaign records, and deployed at least two detection rules. The methodology is becoming routine. More importantly, the hunter is developing the environmental knowledge from Section 0.7 that only comes from querying the data repeatedly. Each campaign builds a behavioral baseline that makes the next campaign more effective because the hunter recognizes legitimate patterns faster and can focus attention on genuine anomalies.

Phase 4: Stabilization and first report (Weeks 9 to 12). Execute Campaign 4 and, if cadence allows, Campaign 5. During this phase, the methodology should feel natural. The hunter opens Wednesday morning, reviews the current campaign's hypothesis, picks up where last session ended, and makes progress without consulting the methodology documentation.

Compile the first quarterly report during Week 11 using the seven metrics defined above. The report should contain: campaigns completed with one-paragraph summaries, hypotheses tested (positive and negative), detection rules deployed with the technique they cover, coverage improvement measured (starting ratio versus ending ratio), and health metrics (cadence adherence, backlog depth, deployment velocity). Write the report for Phil, not for the SOC. Every technical finding should have a one-sentence business translation. Present the report to leadership at the Day 90 review during Week 12.

The time budget

Four hours per week across 12 weeks is 48 hours. The allocation across phases is not equal. Foundation consumes approximately 12 hours (6 hours per week for 2 weeks). Each campaign consumes approximately 6 to 8 hours across one or two weekly sessions. The quarterly report consumes approximately 4 hours. Total: roughly 46 hours, with 2 hours of buffer for methodology adjustment.

This time budget is tight but achievable for a single analyst on protected rotation. It does not leave room for scope creep, extended triage assistance, or campaigns that expand beyond two sessions. The discipline to time-box campaigns is essential. A campaign that runs for six sessions without converging on a finding should be paused, documented as inconclusive, and replaced with the next hypothesis from the backlog. The inconclusive campaign is not a failure. It is data: the hypothesis may need refinement, the data sources may be insufficient, or the technique may not be present. Documenting why the campaign did not converge is as valuable as documenting why it did.

If the four-hour weekly block proves insufficient, the answer is not to extend sessions but to improve efficiency. The Hunt Cycle methodology in Module 1 structures each session so the hunter starts with a clear objective and ends with a documented outcome. Session overhead drops as the methodology becomes habitual. By Campaign 3, most hunters report that the four-hour block produces more investigation output than the first three campaigns combined, because the mechanics are no longer consuming cognitive bandwidth.

The Day 90 checkpoint

The Day 90 review is the program's proof of concept. Rachel presents the quarterly report to Phil. The report contains specific numbers, not aspirational statements.

Analyst Decision

Campaigns completed: 4 of 4 planned. Cadence adherence: 85% (41 of 48 scheduled hours executed).

Findings: 1 positive (unauthorized OAuth application with Mail.ReadWrite access consented 47 days ago, escalated to IR), 3 negative (documented, baselines established).

Detection rules deployed: 3 (AiTM token replay, suspicious MFA registration, high-privilege OAuth consent). Coverage improvement: 29.5% to 32.1%.

Backlog status: 9 hypotheses queued for Quarter 2, prioritized by coverage gap analysis and IR findings from the OAuth investigation.

Recommendation: Continue the program. Increase allocation to 6 hours per week in Quarter 2 to support 5 campaigns. Evaluate contractor augmentation if discovery rate exceeds 25%.

If the report shows this level of specificity, the program continues. The numbers prove that hunting produces measurable detection improvement, discovers threats that automated detection misses, and operates within the budget Rachel promised. Phil does not need to understand AiTM token replay. He needs to see that the program found something automated detection missed, deployed three new rules, and improved coverage by 2.6 percentage points in one quarter.

The Day 90 checkpoint is also when Rachel evaluates whether to scale. If Priya's campaigns are producing findings and rules consistently, the program warrants more hours. If the backlog is growing faster than campaigns can consume it, the program warrants a second hunter or contractor augmentation. If the discovery rate is high (above 25%), the environment has significant undetected activity and the program should be the top security investment for the next quarter. The metrics from the quarterly report drive every scaling decision.

Presenting a Day 90 report that says "we improved our security posture" without specific numbers

Phil asks how. You say "we have better visibility." Phil asks what that means for the business. You cannot answer because the metrics were not defined at Day 1 and the campaigns were not documented with the quarterly report in mind. The Day 90 failure started at Day 1 when the metrics framework was skipped. Define the metrics before the first campaign, document every campaign against the template, and the quarterly report assembles itself.

Threat Hunting Principle

A hunting program proves its value through measurable output: detection rules deployed, coverage improved, findings documented, and dwell time compressed. The 90-day plan builds the methodology, executes the first campaigns, and produces the quarterly report that justifies continuation. Define the metrics at Day 1, document every campaign, and the program sustains itself through demonstrated results.

Next

The Module Summary consolidates every concept from Sections 0.1 through 0.8. Module 1 teaches the Hunt Cycle methodology: the six-step framework that transforms a hypothesis into a documented finding and a deployed detection rule. You have the context. Now you learn the method.

Unlock the Full Course See Full Course Agenda