How is a cloud SOC different from a traditional SOC?

Traditional SOC is mostly packet-driven - IDS sensors on network choke points, firewall logs, endpoint EDR. Cloud SOC is mostly log-driven - CloudTrail, Activity Log, Cloud Audit Logs, application logs, identity logs - because the network is largely abstracted away. You can't span-port a managed service. The detections, the tooling, the alert taxonomy, and the response runbooks all shift accordingly. The SOC role still exists; the daily work looks meaningfully different.

Do I need a SIEM if I have GuardDuty / Defender / SCC?

Maybe. Native detection services (GuardDuty on AWS, Defender for Cloud on Azure, Security Command Center on GCP) are excellent at the threats they're tuned for and require near-zero work to enable. But they only see what each cloud sees - and most orgs are multi-cloud, have SaaS, have on-prem, and need to correlate across all of those. A SIEM is the cross-cloud, cross-source aggregation and correlation layer. Small orgs can ride on native detection alone for a while; somewhere around the second cloud or the first compliance audit, a SIEM becomes necessary.

What logs do I actually need to keep?

At minimum: the cloud control-plane audit log (CloudTrail, Activity Log, Cloud Audit Logs) for every account, in every region; identity-provider logs (Entra sign-ins, Okta system log, Cognito); VPC Flow Logs at least at the perimeter VPCs; DNS query logs for workloads; and the management-event logs for every storage and database service. Retain at least 90 days hot in the SIEM, 1 year archived to cheap storage. That covers the floor for any breach investigation.

Is the SOC role going away because of AI?

Tier-1 alert triage is being automated faster than other SOC work - LLM-assisted enrichment, autonomous response on low-risk alerts, summarization of investigation context. What isn't going away: detection engineering (writing the rules), incident response (decisions under uncertainty), threat hunting, post-incident analysis, and the senior IR judgment that decides 'this is real' vs 'this is noise.' The job shape is shifting toward the higher-leverage work, not disappearing.

Cloud SOC & Threat Monitoring

Man analyzing business data and financial graphs on a laptop — Photo by Kaboompics on Pexels

Last updated 2026-05-15 · By Shawn Nunley · Vendor-neutral · View source on GitHub

The 30-second version: A cloud SOC is the team and tooling responsible for detecting, investigating, and responding to security events across your cloud accounts. Where a traditional SOC watches packets and endpoints, a cloud SOC watches logs - control-plane audit trails (CloudTrail / Activity Log / Cloud Audit Logs), identity-provider logs, VPC flow, DNS queries, and application telemetry - because cloud abstracts the network.

The stack: native cloud detectors (GuardDuty, Defender for Cloud, Security Command Center) for fast-wins, a SIEM for cross-source correlation and retention, detection engineering as the practice that writes the rules, SOAR / autonomous response for the repetitive work, and threat intel wired in to enrich alerts. The work is more like data engineering than like staring at IDS console - and the mental shift is the hardest part of the transition.

What cloud threat monitoring is
Cloud SOC vs traditional SOC
The cloud-native detection model
Log sources you actually need
Native cloud detection
SIEM and the cloud
Detection engineering as a practice
Detection categories that matter
Threat intel in cloud
Incident response specifics
SOC team structure & roles
AWS, Azure, and GCP side-by-side
Maturity stages
Common pitfalls
Further reading
FAQ

What cloud threat monitoring is

Threat monitoring is the continuous collection of security-relevant signals from your environment, the rules and models that find suspicious activity in those signals, the alerts that surface what's worth a human's attention, and the analyst workflow that decides what to do about each one. Done well, it's the difference between "we got breached and learned about it from the FBI" and "we got an alert at 02:14, contained it by 02:31, and the customer never noticed."

Cloud threat monitoring is the same discipline applied to cloud environments - where the threat surface is API-shaped, the telemetry is log-shaped, and the dominant attack patterns chain identity compromise into resource manipulation rather than dropping a beacon on a Windows host.

The SOC (Security Operations Center) is the team that runs it. The name comes from the dedicated rooms full of screens of the early 2000s; the practice today is mostly distributed, mostly remote, mostly working in Slack and Jira with one or two big dashboards on the wall.

Cloud SOC vs traditional SOC

The mental-model shift coming from traditional SOC into cloud is the biggest hurdle for most analysts. It's also the one that takes the longest to internalize, because the muscle memory of network-centric monitoring is deep.

Dimension	Traditional SOC	Cloud SOC
Primary signal	Packets, endpoint EDR telemetry, firewall logs	Control-plane audit logs, identity events, app telemetry
Perimeter shape	Network - choke points, DMZ, firewall rules	Identity - every API call has an actor and a permission check
Sensor placement	Span ports, taps, IDS sensors on the network	Logs come for free from the provider; you only choose what to keep
Asset inventory	Hard problem - devices walk in and out	API-queryable, but counted in tens of thousands of ephemeral resources
Forensics target	Disk image of a host, memory capture, packet pcap	Log timeline reconstructed from CloudTrail + identity logs + app logs
Contain action	Isolate host on the network	Revoke session, rotate credential, detach IAM policy, quarantine resource
Dominant attack pattern	Phish → endpoint → lateral → exfil	Phish or token theft → IAM → resource manipulation → exfil
Required skillset	Packet analysis, EDR queries, malware reversing	Query languages (KQL, SPL, ESQL), cloud API knowledge, IAM understanding

The skills overlap is real - incident response judgment, alert triage discipline, the ability to read a log line and understand what an attacker is doing - but the specific knowledge is different enough that "I ran a SOC for 10 years" doesn't translate to "I can run a cloud SOC" without 6-12 months of cloud-specific learning. The reverse is also true.

The cloud-native detection model

Almost every credible cloud detection eventually reduces to the same shape: this identity did this action on this resource from this context at this time, and that combination is suspicious.

The fields that matter:

Identity - user, role, service account, federated subject, automation key.
Action - the API call, with parameters. iam:CreateAccessKey, s3:GetObject, compute.instances.delete.
Resource - the target. ARN, resource ID, bucket name, role name.
Context - source IP, user agent, region, time, MFA used, source account, session origin.
Result - succeeded, denied, rate-limited.

Detections then ask combination questions:

Did a previously unseen identity do a sensitive action?
Did an identity do an action from an impossible location relative to its recent baseline?
Did a sensitive action happen without MFA?
Did a service account do a human-shaped action (console login, IAM change)?
Did an identity do a high-volume sequence that looks like enumeration or exfiltration?
Did a just-created role get used immediately?
Did the cloud's logging be disabled?

The vocabulary maps cleanly onto MITRE ATT&CK Cloud - every detection should map to one or more techniques, both for shared language across the team and for coverage analysis ("we have no detections in the Persistence column").

Log sources you actually need

The full list of logs each cloud can emit is enormous. The list you actually need on day one is smaller. Working priority order:

1. Control-plane audit

CloudTrail (AWS), Activity Log (Azure), Cloud Audit Logs (GCP). Every API call, every account, every region. The single most important log. Org-wide, multi-region, no exceptions.

2. Identity-provider logs

Entra ID sign-in & audit, Okta system log, Cognito events, Cloud Identity. Captures auth events before they reach cloud APIs - phishing, MFA bypass, suspicious sessions live here.

3. Network telemetry

VPC Flow Logs, NSG flow logs, GCP VPC Flow. Less interesting than CloudTrail for most cloud-native attacks, essential when a workload starts talking to known-bad IPs.

4. DNS query logs

Route 53 Resolver logs, Azure DNS, Cloud DNS. C2 beacons, DGA traffic, data exfiltration via DNS - all visible here that aren't in flow logs.

5. Data-plane logs

S3 access logs / data events, Azure Storage diagnostics, GCS data access logs. Required to investigate "what data was actually read?" - control plane only tells you that data was read, not which.

6. Workload & app logs

Lambda / Functions logs, container logs, app stdout / stderr. The application-layer evidence trail; required for app-vulnerability investigations.

7. Configuration history

AWS Config, Azure Resource Graph history, Cloud Asset Inventory. Answers "what did this resource look like before the attacker changed it?" without re-running detections.

8. SaaS & productivity

Workspace / M365 audit logs, GitHub audit log, Slack, etc. Auth events outside the cloud's own log feed; often the first signal of a compromised user.

Retention

Practical floor: 90 days hot in the SIEM (you query directly), 365+ days cold in cheap object storage (you re-hydrate for investigations). CIS / compliance frameworks often require longer cold retention - set policies once at the bucket / storage layer rather than per-source.

Native cloud detection

Each cloud ships its own threat-detection service. They share more than they differ - managed, ML-driven, fed by the cloud's own internal telemetry, deliver findings to a central console with minimal setup. The pragmatic answer is: turn them on, all of them, on day one. They catch a meaningful percentage of real attacks for very little operational cost.

AWS GuardDuty

Continuous threat detection across CloudTrail, VPC Flow Logs, DNS, S3, EKS audit, Lambda, Runtime Monitoring, and Malware Protection. The UnauthorizedAccess:IAMUser/InstanceCredentialExfiltration finding catches the exact pattern that produced the Capital One breach (see the kill chain). Org-wide via delegated admin to your audit account.

Microsoft Defender for Cloud

CSPM + workload protection, with plans for servers, app services, databases, storage, key vault, DNS, containers, APIs. Recommendations + alerts; integrates with Sentinel. The "everything" plan is expensive at scale; most orgs enable the workload-protection plans on prod subs only.

Google Security Command Center

SCC Premium / Enterprise includes Event Threat Detection (CloudTrail-equivalent analysis), Container Threat Detection, Virtual Machine Threat Detection, and a CNAPP-style posture layer. Standard tier is free and covers basic findings; Premium / Enterprise are paid.

What native detection is good for

Out-of-the-box coverage of the most common cloud-attack patterns.
Catching the things the SIEM ingest may be delayed on - these run inside the cloud and see events near-real-time.
Cheap to enable. Setup is hours, not weeks.

What native detection is bad at

Cross-cloud correlation. Defender doesn't see CloudTrail; GuardDuty doesn't see Defender alerts.
Custom detections specific to your environment.
Long retention (the findings UI generally caps at 90 days; raw signals need to flow to your SIEM for longer history).
Aligning to your team's alert taxonomy and workflow.

The right pattern: native detectors enabled everywhere as first-line; SIEM as the aggregation, correlation, and customization layer on top.

SIEM and the cloud

SIEM = Security Information & Event Management. The platform where logs land, get parsed, get queried, get correlated, and produce alerts. Once you have more than one cloud (or one cloud and SaaS, or one cloud and on-prem), the SIEM becomes the only place an analyst can ask cross-source questions.

The current landscape (vendor-neutral)

Splunk Enterprise Security - the heavyweight. Mature, expensive, SPL is its own language with a real learning curve. Strong in enterprise environments that have run Splunk for a decade.
Microsoft Sentinel - cloud-native SIEM on Log Analytics. KQL is the query language. Strong in Microsoft-heavy environments; Defender XDR + Sentinel is a credible end-to-end stack.
Elastic Security - Elastic stack + SIEM + EDR. Strong open-source roots; ESQL / EQL query languages. Cost-effective at scale relative to Splunk.
Google Security Operations (Chronicle) - flat-fee SIEM, YARA-L for detection content, deep Google-scale infrastructure underneath. Strong in cloud-native shops.
CrowdStrike Falcon Next-Gen SIEM (formerly Humio / LogScale) - fast ingest, low-overhead retention, integrated with the EDR side.
Datadog Cloud SIEM - built on the observability platform, attractive for orgs that already run Datadog.
Panther, Sumo Logic, IBM QRadar, Exabeam, Hunters, Devo - credible mid-market or specialty options.

How to pick

What's your existing observability stack? If you run Datadog, Datadog Cloud SIEM has zero ingest plumbing to build. If you're Microsoft-heavy, Sentinel.
How much volume do you ingest? Splunk's per-GB pricing punishes high-volume sources; Chronicle's flat-fee pricing inverts the math.
Who writes detection content? If it's a small team, the vendor's content library (Splunk ESCU, Sentinel content hub, Sigma-compatible) matters more than the query language flexibility.
What's the team's skill profile? KQL vs SPL vs ESQL vs YARA-L are all real ramp time. The one your team already knows wins ties.

What's not a great differentiator anymore: "ingests CloudTrail." They all do.

Detection engineering as a practice

Detection engineering is the discipline of writing, testing, deploying, and maintaining detection rules - treated as code, with the same review, versioning, and CI/CD as application code. Cloud SOC depends on it more than traditional SOC did, because so many cloud-specific detections are inherently custom (your IAM model, your service accounts, your business hours).

The detection-engineering loop

Threat model the technique you want to catch. Where in MITRE ATT&CK does it sit? What's the prerequisite for an attacker to do it?
Find or generate evidence in your logs that the technique would leave. Atomic Red Team, Stratus Red Team, and similar tools simulate cloud attacks specifically so you can see what they look like in CloudTrail.
Write the detection - a query in your SIEM's language or a portable rule (Sigma).
Tune for noise. The first iteration always over-fires. Whitelist legitimate sources, add context conditions, adjust thresholds.
Test continuously. Re-run the atomic emulation periodically - when an upstream log schema changes, your rule may silently break.
Measure efficacy. True positive rate, false positive rate, time-to-alert. Detection content that fires once a quarter and is always a true positive is good; detection content that fires 50 times a day and is always benign is worse than no detection.

Rule sources

Sigma - vendor-neutral detection format; convert to your SIEM's language. Large open library.
Elastic detection-rules - open, MITRE-mapped, well-maintained.
Azure Sentinel content - Microsoft-published KQL detections.
Splunk ESCU - Splunk's open research catalog.
Community contributions from cloud security researchers - Datadog, Permiso, Cyentia, Wiz, Cyera, and others publish detection ideas regularly.

Treat all of these as starting points. Every detection needs tuning to your environment before it earns a place in production.

Detection categories that matter

The classes of detection every cloud SOC should aim to cover, mapped roughly to MITRE ATT&CK Cloud:

Initial Access - impossible-travel sign-ins, sign-ins from anonymizing infrastructure (Tor, known VPN-as-a-service), brute-force / password-spray patterns, MFA fatigue prompts, OAuth consent grants to suspicious apps.
Credential Access - GetSessionToken / AssumeRole from unusual contexts, IMDS credential use from outside the EC2 instance (this is the Capital One detection - see the kill chain), ConsoleLogin by a service account, password resets en masse.
Privilege Escalation - IAM policy changes adding admin permissions, role-trust-policy changes opening cross-account trust, AttachUserPolicy bringing in AdministratorAccess, group-membership additions to privileged groups, iam:PassRole to a high-privilege role.
Persistence - new access keys created, new federated identity providers added, new IAM users in admin accounts, new SSH keys on instances, modified Lambda function execution roles, scheduled jobs created in unusual accounts.
Defense Evasion - CloudTrail disabled or modified, log destinations changed, GuardDuty / Defender / SCC disabled, deletion of S3 access logs, KMS key policy changes that block your audit account.
Discovery - high-volume List* / Describe* API call patterns, bucket enumeration via HeadBucket, IAM enumeration, secret listing.
Lateral Movement - cross-account AssumeRole, new service-to-service connections, new pod-to-pod connections in K8s, new VPC peering or transit-gateway attachments.
Collection & Exfiltration - large-volume GetObject / CopyObject, bucket-replication to external accounts, snapshot sharing with unknown accounts, database export jobs to S3, egress spike from a workload.
Impact - mass-resource deletion, encryption-with-attacker-key (ransomware), public bucket creation, security-group modification opening 0.0.0.0/0, Route 53 hijacks.

For each category, the SOC should have at least one detection live, at least one quarterly emulation that exercises it, and at least one documented runbook. The first time a SOC enumerates "we have zero coverage in Defense Evasion" is usually the day after they needed it.

Threat intel in cloud

Cloud-specific threat intel is younger than network threat intel and the data quality varies. The actionable pieces:

Indicators - known-bad IPs (often anonymizers, residential proxies, Tor exits), domains (C2 infrastructure, exfil destinations), CIDRs (cloud providers known to be abused).
TTPs - attacker techniques specific to cloud. Permiso, Datadog Security Labs, Wiz Research, Mandiant, CrowdStrike all publish good cloud-specific TTPs regularly. MITRE ATT&CK Cloud is the taxonomy.
Attack tooling fingerprints - User-Agent strings from cloud-attack tools (Pacu, ScoutSuite, cloudtail, gcptokenrip), CloudTrail signatures of common attacker workflows.
Industry sharing - FS-ISAC (financial), H-ISAC (health), the cloud-providers' own threat reports, AWS Security Hub findings, Microsoft DART / MSTIC public threat content.

Where to wire it in

Enrich alerts at the SIEM. Every alert that includes an IP should auto-enrich with reputation, ASN, geo, cloud-provider mapping, and known-attacker-tool fingerprints. Native to most modern SIEMs.
Auto-block at the cloud-provider edge on the highest-confidence indicators. Network Firewall, WAF, security groups can ingest threat-intel feeds.
Inform detection content. When a new cloud-attack TTP is published, the detection-engineering team's question is "do we cover this?" and the answer should be in days, not quarters.

Incident response specifics

Cloud IR differs from on-prem IR in two practical ways: the evidence is log-based (no disk image), and containment actions are API calls.

The cloud-IR playbook shape

Triage the alert. Identity, action, resource, context. Decide: noise, suspicious-not-yet-confirmed, confirmed-compromise.
Pivot from one event to the full session. What else did this identity do? Same source IP? Same session token? CloudTrail with the identity's principal-ARN as a filter, plus the time window, plus the identity-provider session ID.
Determine blast radius. What resources did the compromised identity have access to? What of those were actually accessed? IAM Access Analyzer / Permissions Analyzer + CloudTrail data events.
Contain. Revoke session tokens (AWS aws iam delete-access-key + aws sts revoke-session; Azure conditional-access revoke; GCP gcloud iam service-accounts disable). Detach IAM policies. Quarantine compromised workloads (security-group-of-one). Disable user accounts.
Eradicate. Remove persistence - newly created keys, new IAM users, new SSH keys, modified Lambda functions, new federated IdPs.
Recover. Restore from known-clean snapshots if data integrity is in question. Rotate the secrets the attacker may have seen.
Post-incident. Timeline, root cause, what detection missed it, what would have detected it earlier, what's the runbook change.

Tools of the cloud-IR trade

SIEM with strong query language - the timeline reconstruction tool. KQL / SPL / ESQL fluency is the difference between a 1-hour and a 1-day investigation.
Cloud-native incident-response services - AWS Security Incident Response, Azure Defender for Cloud Workflow, GCP Active Assist.
Open-source IR tooling - cloudsploit, Prowler, Steampipe for evidence collection; CS Suite, cliam for IAM analysis.
SOAR - Tines, Torq, Cortex XSOAR, Splunk SOAR, Sentinel Logic Apps. Automates the repetitive containment steps.
Runbooks in version control. Markdown in a private repo, indexed in the SIEM / alert system. The runbook is the documentation of "what to do when this alert fires" - when an analyst is paged at 02:14, they should not be improvising.

A modern server room featuring network equipment with blue illumination — Photo by panumas nikhomkhai on Pexels

SOC team structure & roles

The classic "tier 1 / tier 2 / tier 3" structure persists but is shifting. AI-assisted triage is eating tier-1 work faster than other layers, which is pushing organizations toward flatter SOC structures with stronger detection-engineering and IR capabilities at the senior end.

The roles that actually exist in 2026

SOC Analyst (junior / tier 1). Alert triage, basic enrichment, escalation. The role most affected by AI automation today. Career path is into detection engineering, IR, or threat hunting.
SOC Analyst (senior / tier 2-3). Investigation lead. Reads complex CloudTrail / KQL / SPL queries, judges noise vs real, owns containment decisions for the alert. Knows the cloud APIs cold.
Detection Engineer. Writes, tunes, and maintains detection content. Often a software-engineer profile who learned security; reads attacker writeups, writes the corresponding queries, ships them through CI to production.
Threat Hunter. Looks for evidence of compromise that no detection caught yet. Hypothesis-driven querying - "if an attacker compromised an IRSA-bearing pod, what would I see?" - using SIEM, threat intel, and graph tooling.
Incident Responder / DFIR. Runs incidents end-to-end. Owns the timeline, the contain/eradicate/recover steps, the post-incident review. Often the most senior SOC role; tightly partnered with Legal and Communications when things go big.
SOC Manager / Director. Runs the team, owns metrics (MTTR, MTTC, false-positive rate, coverage), reports to security leadership, owns vendor relationships.
Adjacent roles SOC depends on: Security Engineering (builds the platforms), GRC (defines the controls), Cloud Engineering (operates the cloud the SOC monitors).

For more on these roles and how to break into them, see the Cloud Security Careers page.

AWS, Azure, and GCP side-by-side

The native detection + logging story on each cloud, reduced to a one-screen reference:

Building block	AWS	Azure	GCP
Control-plane audit	CloudTrail (org trail)	Activity Log (subscription / mgmt group)	Cloud Audit Logs (org-aggregated)
Identity events	IAM Identity Center, federated IdP	Entra ID sign-in & audit logs	Cloud Identity audit, Cloud Logging
Network flow	VPC Flow Logs	NSG flow logs, VNet flow logs	VPC Flow Logs
DNS	Route 53 Resolver query logs	Azure DNS analytics	Cloud DNS query logs
Native threat detection	GuardDuty (org-wide, all features)	Defender for Cloud (workload plans)	SCC Premium / Enterprise (Event & Container Threat Detection)
Native SIEM	Security Lake + Athena, OpenSearch	Microsoft Sentinel	Google Security Operations (Chronicle)
SOAR / response automation	Security Hub automation rules, Step Functions, EventBridge → Lambda	Sentinel Logic Apps, Automation Rules	SecOps SOAR (formerly Siemplify)
Posture / inventory	Security Hub, AWS Config	Defender for Cloud, Azure Resource Graph	SCC, Cloud Asset Inventory
IR workflow	AWS Security Incident Response (managed)	Sentinel Incidents, Defender XDR	SCC Cases

Cross-cloud reality: most SOCs end up running one of the third-party SIEMs (Splunk, Sentinel, Chronicle, Elastic, CrowdStrike, Datadog) as the unified plane, with the native cloud detectors enabled below it as sensors. Pure single-cloud-native-SIEM works for organizations genuinely living in one cloud; that's rarer than it sounds.

Maturity stages

SOC capability grows over time. A useful staging model:

Stage 1 - Visibility

Native detection enabled in every account/sub/project. Control-plane logs aggregated to a central destination. 90-day retention live. Alerts route to a defined channel. Coverage measured against MITRE ATT&CK Cloud.

Stage 2 - Triage

SIEM stood up, identity + network + DNS logs flowing. Defined alert taxonomy. Tier-1 analysts (or LLM-assisted automation) triaging within minutes. Documented runbooks for the top 20 alert types. MTTR measured.

Stage 3 - Engineering

Detection-as-code with CI/CD and tests. Adversary emulation (Stratus, Atomic Red Team) running monthly. SOAR automating the repetitive contain/enrich steps. Threat hunting on a regular cadence. Cross-cloud correlation working.

Stage 4 - Resilience

Tabletop exercises quarterly, full red-team / purple-team annually. SLO-driven SOC metrics. Threat intel feeding both detection and proactive hardening. Post-incident reviews drive durable changes in the platform.

Skipping stages is expensive in the same way it is for landing zones - a team trying to stand up SOAR before they have working alert triage just automates noise. The honest sequencing matters.

Common pitfalls

Logs collected but never queried. The 95th-percentile failure mode. Compliance ticks the box; no one is looking at the data. Define what you'll detect before deciding what to log.
Alert overload. Native detectors emit hundreds of findings; many are informational. Ruthless tuning, severity prioritization, and aggregation are essential. Analysts who learn to ignore the channel are unfixable.
No retention strategy. Hot retention is expensive; cold retention is critical for the breach you find out about 9 months later. Define both tiers up front.
SIEM detection content that never gets tested. Log schemas change; APIs deprecate; rules silently rot. Adversary emulation re-validates regularly.
No documented runbooks. The most experienced analyst will leave. If the response isn't written down, it leaves with them.
Native detection disabled in non-prod. Attackers love non-prod because no one watches. Enable native detection everywhere; tune severity per account.
SIEM that no one knows how to query. The most expensive SIEM seat in the world is useless if the analyst can't write the query. Invest in language training (KQL, SPL, ESQL, YARA-L) like you'd invest in cloud platform certifications.
Treating SOC as separate from the cloud platform team. SOC asks for logs; platform team grumbles and ships them on a 6-month timeline. The orgs that ship cloud security well have SOC and platform on the same team, often the same humans.
Buying SOAR before having alerts worth automating. SOAR is a force multiplier of whatever process you feed it. Bad process automated is just bad process at scale.
"AI will fix our SOC." AI is a real productivity layer for triage and enrichment. It is not a substitute for detection engineering, threat hunting, IR judgment, or post-incident learning. Adopt it as augmentation, not replacement.

FAQ

How big does a cloud SOC need to be?

Smaller than most people assume, if it's well-tooled. A 2026 cloud SOC running native detection + a modern SIEM with good content + LLM-assisted triage can credibly cover a mid-size enterprise with 5-8 humans. Larger orgs scale the detection-engineering and IR functions more than tier-1 triage. The traditional "20-seat tier-1 room" is not the right model anymore.

Should the SOC sit inside platform / SRE or separate?

Both work. The orgs that ship cloud security well have SOC and platform close to each other - same chat channel at minimum, often the same on-call rotation. If they're separate, the friction shows up as "we asked for that log to flow last quarter" delays.

Is MDR (managed detection & response) a substitute for an in-house SOC?

It's a credible complement, not a substitute. MDR handles 24x7 monitoring and tier-1 triage; you still need someone in-house who owns detection engineering, IR judgment, and the relationship with the rest of the security org. Pure-MDR with no internal expertise is a known anti-pattern.

What about open-source SIEMs?

Wazuh, OpenSearch, Loki + Grafana - credible options for cost-sensitive shops. Detection content, ML correlation, and operational tooling are typically less mature than commercial SIEMs. Best fit when you have engineers happy to build and maintain the missing parts.

How do you measure if the SOC is working?

MTTD (detection), MTTA (acknowledge), MTTR (respond), MTTC (contain). Coverage against MITRE ATT&CK Cloud. False-positive rate per detection. Number of true-positive incidents detected by the SOC vs reported externally. Post-incident-review follow-through rate. None of these alone tell the story; together they do.

How does this relate to zero trust?

Zero trust says "assume breach and verify continuously." The SOC is the verification function - the team that closes the loop on the assumption. Without monitoring, zero-trust deployments are unverified claims; with monitoring, the principle becomes operationally measurable.

Where next

Threat research sources - the analysts and feeds the SOC reads.
Breach kill chains - concrete examples of what these detections catch.
CSPM vs CNAPP - the posture half of the stack.
Cloud security careers - SOC, detection engineering, IR.
Friday Zoom - SOC and detection-engineering war stories come up monthly. Drop in.