What a SOC actually does
Strip away the dashboards and the acronym soup, and a Security Operations Centre does exactly four things, in order: collect, detect, triage, respond. Every other process - threat hunting, incident management, KPI reporting, purple-team exercises - is a refinement of one of those four.
The mistake most teams make is investing in the middle of that chain (more detections, fancier dashboards) when the win is almost always at the edges: cleaner telemetry going in, and faster human response coming out.
SIEM vs XDR vs SOAR - the modern stack
You'll hear these three thrown around interchangeably. They're not. Here's the cleanest way to keep them straight in your head:
| Tool | Job | Examples |
|---|---|---|
| SIEM | Centralise & query logs from anywhere; correlate; alert | Splunk, Sentinel, Elastic, QRadar, Chronicle |
| XDR | Vendor-tied detection across endpoint + identity + email + cloud | CrowdStrike Falcon, Defender XDR, SentinelOne |
| SOAR | Automate the response playbooks (enrich, contain, notify) | Cortex XSOAR, Tines, Splunk SOAR |
In practice the boundaries are blurring. Sentinel ships SOAR-style playbooks. CrowdStrike's Falcon LogScale eats SIEM territory. The right question for 2026 isn't "which one?" but "who owns the detection logic, and where does it actually run?" - because that's where your most expensive engineering effort will live for the next five years.
The data pipeline (it's mostly plumbing)
If a SIEM is a queue, the pipeline is the conveyor belt feeding it. A typical flow looks like this:
# 1) Source endpoints, firewalls, dns, identity, cloud control plane, web proxies │ ▼ # 2) Forwarder / shipper (the unsung hero) sysmon, fluent-bit, winlogbeat, vector, splunk UF, agent-based collectors │ ▼ # 3) Normalisation (the part nobody photographs) parse → field-extract → tag (ECS / OCSF / CIM) → enrich (geoip, ASN, asset tag) │ ▼ # 4) Storage (hot / warm / cold) time-indexed buckets, retention tiers, frozen-to-cheap-blob archive │ ▼ # 5) Detection & query KQL / SPL / EQL / Lucene / Sigma → schedule rule → alert object │ ▼ # 6) Case management / SOAR auto-enrich → assign → contain → close / escalate
Three things matter more than the SIEM brand you pick:
- Logging coverage. You can't detect what you don't see. A great rule on missing data fires zero times.
- Schema discipline. Every parser pushes its own field names until somebody enforces a single schema (ECS, OCSF, or your own). Pay this tax early or you'll pay it bleeding from a hundred broken queries later.
- Time. All times in UTC. All timestamps with sub-second precision. Argue with anyone who suggests otherwise.
Writing detections that fire on adversaries, not on Tuesday
The hardest skill in detection engineering isn't writing rules - it's writing rules that don't fire when Sue from finance opens a macro. Here's the loop I use:
The detection design loop
- Pick a TTP, not a product. Anchor every rule to a MITRE ATT&CK technique. "T1003.001 - LSASS dumping" is a target. "Suspicious process activity" is not.
- Read three real samples. Find the technique in your own telemetry, in Atomic Red Team, and in a public IR report. You're looking for the parts that are structurally invariant across attackers - those are what your rule should match on.
- Write it minimally. Detect the smallest stable signal, not the whole noisy chain.
- Tune against 30 days of clean data. If it would have fired more than ~5 times in 30 days, it's a hunt query, not an alert.
- Document the false-positive disposition. "What does it look like when this is benign?" If you can't answer that, the rule isn't done.
A worked example: detecting LSASS access
Say we want to catch credential theft via direct LSASS memory access. The naive
version is "alert on anything touching lsass.exe", which produces
a flood from antivirus, EDR itself, and Windows Defender. The good version uses
Sysmon Event ID 10 (ProcessAccess) and filters on the access mask:
// Sentinel / KQL - LSASS access with credential-dump rights DeviceEvents | where ActionType == "ProcessAccess" | where TargetProcessFileName =~ "lsass.exe" // 0x1010 = PROCESS_VM_READ | PROCESS_QUERY_INFORMATION (the bits Mimikatz wants) | extend mask = toint(InitiatingProcessIntegrityLevel) | where InitiatingProcessFileName !in~ ("MsMpEng.exe", "SenseCncProxy.exe", "taskhostw.exe") | where AdditionalFields has "0x1010" or AdditionalFields has "0x1438" | project Timestamp, DeviceName, InitiatingProcessFileName, InitiatingProcessCommandLine, AdditionalFields
The same logic in Sigma form ports to Splunk, Elastic, Chronicle and most others - write Sigma when you can:
title: Suspicious LSASS Access (Credential Dump Indicators)
id: 5a2f9d61-...redacted...
status: experimental
description: Detects processes opening lsass.exe with PROCESS_VM_READ | QUERY rights.
references:
- https://attack.mitre.org/techniques/T1003/001/
author: Arpitd.com
date: 2026/04/30
logsource:
product: windows
category: process_access
detection:
selection:
TargetImage|endswith: '\lsass.exe'
GrantedAccess:
- '0x1010'
- '0x1438'
- '0x143a'
filter_security:
SourceImage|endswith:
- '\MsMpEng.exe'
- '\SenseCncProxy.exe'
- '\taskhostw.exe'
condition: selection and not filter_security
falsepositives:
- Endpoint backup/forensic agents that legitimately read LSASS
level: high
tags:
- attack.credential_access
- attack.t1003.001
This is the rule I'd ship. It's narrow enough to fire only on real suspicious access, the false-positive disposition is documented, and it's anchored to a single, well-known technique.
Triage: the 3-question filter
An alert lands in your queue. You have ~30 seconds before the next one arrives. Here's the only triage flow you need to memorise - three questions, in order:
- Is this real? (Is the signal genuine, or is the rule misfiring?) If the answer is "rule misfire," your job isn't to investigate - it's to file a tuning ticket and move on.
- Is this scoped? (One host? One user? A blast radius?) Pivot on the smallest unique identifier - usually a process tree, an authentication chain, or a session ID. Map every other system it touched in the last 24 hours.
- Is this actionable? (Will containment hurt the business more than the threat will?) Don't isolate the CFO's laptop at 9am during board prep unless you can prove a confirmed compromise. Match response to confidence.
Force every alert through those three questions and you'll close 80% of them in under five minutes. The remaining 20% are where the actual SOC work happens.
Metrics that survive a board review
Three of these matter. The others are decoration:
- MTTD (Mean Time to Detect). Time from initial activity to first alert. Goal: minutes, not hours. Drives logging investment.
- MTTR (Mean Time to Respond). Time from alert to containment. Goal: minutes for high-severity, hours for medium. Drives playbook investment.
- Alert-to-incident ratio. Of every 100 alerts, how many turned into a real incident? Below 5% and your noise is too high. Above 30% and your detections are probably too narrow.
Vanity metrics to retire: "alerts ingested per day", "rules deployed", "TI feeds consumed". They go up forever and tell you nothing about whether you're catching attackers.
A 30-day starter pack for new analysts
If you're brand-new to SOC work, here's the on-ramp I'd send you on. Each item is a day or two of hands-on time, not theory:
- Stand up a free Splunk + Microsoft Sentinel + Elastic stack on a single VM (the docker-compose lab, not production).
- Ingest your own laptop's Sysmon + auditd events for a week. Hate-stare at the noise.
- Run Atomic Red Team tests T1003 (creds), T1059 (script execution), T1112 (registry mods). Try to see them in your stack.
- Convert three Atomic findings into Sigma rules. Translate them into KQL/SPL/EQL.
- Read three public IR reports - Mandiant M-Trends, the latest Verizon DBIR, a CISA advisory. Trace the kill chain on paper.
- Go through LOTS and LOLBAS and pick five binaries you've never heard of. Search your telemetry for them.
- Volunteer to take a real ticket in your team's queue.
Skip the certification grind for the first month. The certs help; the muscle memory of doing the work helps more.
Closing thought
SIEM is plumbing. SOC is decision-making. Both are easy to over-engineer and almost-impossibly hard to get right at scale, because the adversary is patient and well-funded and the analyst is human and tired. The teams I've seen win do three things relentlessly: kill noisy alerts, close the logging gaps nobody wants to fix, and treat every postmortem as a chance to delete a rule rather than add one.
If you take one thing from this post, take that.