Cloud Pentesting & Red Teaming

AWS, Azure, and GCP attack paths and the open-source tools to exercise them. A vendor-neutral practitioner's guide to the methodology, the offensive toolkit (Pacu, ROADtools, BloodHound, Cloudfox, MicroBurst, Stratus Red Team), the legal and policy boundaries, and the discipline of adversary emulation. The offensive complement to detection engineering and a companion to the CTFs page.

· · Vendor-neutral · View source on GitHub

Authorized testing only. The techniques and tools on this page are for use against systems you own or have explicit written authorization to test. Running them against anyone else's cloud - including a former employer's, a customer's, or a vendor's - is illegal under the U.S. Computer Fraud and Abuse Act, the U.K. Computer Misuse Act, and equivalent laws in most jurisdictions. Get the rules-of-engagement document signed before you start. The CSOH community does not condone unauthorized access.

The 30-second version: Cloud pentesting trades the network for the API and the host for the identity. The control plane is the new attack surface; the IAM graph is the new internal network; a leaked access key is the new SQL injection. The professional toolkit is open source and well-documented - Pacu for AWS, ROADtools and MicroBurst for Azure, gcphound and the Rhino privesc matrix for GCP, BloodHound across all three. Vulnerable training environments - CloudGoat, AWSGoat, Sadcloud, IAM Vulnerable - let you build the muscle memory without the legal risk.

This page is the methodology and the production-targeting perspective. If you want to practice the techniques in a safe environment, the CTFs page is the lab. If you want to know how defenders see these techniques, the detection engineering page is the mirror.

On this page

  1. Why cloud pentesting is different
  2. Legal & provider policy
  3. Rules of engagement
  4. Methodology frameworks
  5. Recon
  6. Initial access vectors
  7. AWS attack paths
  8. Azure attack paths
  9. GCP attack paths
  10. Kubernetes attack paths
  11. Container & supply-chain attacks
  12. Persistence in cloud
  13. Data exfiltration
  14. Open-source toolkit
  15. Pentest vs red team vs adversary emulation
  16. Reporting
  17. CSP offerings & vetted firms
  18. AWS / Azure / GCP side-by-side
  19. Maturity stages
  20. Common pitfalls
  21. Further reading
  22. FAQ

Why cloud pentesting is different

A traditional network pentest starts with an IP range and a goal of finding listening services with exploitable vulnerabilities. The mental model is a perimeter to breach, a network to traverse, and hosts to compromise. The toolkit reflects that: nmap, Metasploit, Burp, BloodHound for on-prem Active Directory.

A cloud pentest starts with a set of accounts, subscriptions, or projects and a goal of finding chains of API calls that produce unauthorized outcomes. The mental model is a control plane to misuse, an IAM graph to traverse, and identities to compromise. nmap rarely helps; SDK calls and IAM policy analysis do.

Three shifts that matter

Practical consequence: the cloud pentester's day is reading IAM policies, mapping trust relationships, enumerating buckets, and writing one-line scripts against provider SDKs - not running scanners against ports. The same person who's expert at internal-network pivoting often struggles for the first month in cloud; the skill transfer is non-trivial. Tools like Pacu exist partly to bridge that gap by encoding the API-call shape of common techniques.

Rules of engagement

Rules of engagement (RoE) is the artifact that turns "permission to test" into "permission to test these specific things in these specific ways." A serviceable RoE document covers:

Scope

Which accounts, subscriptions, projects, regions, services, and applications are in scope. Equally important: what is explicitly out - production databases, third-party integrations, customer data, executive identities. Ambiguity here is the single most common cause of post-test disputes.

Windows

The hours and dates testing is authorized. Some orgs allow 24/7 testing; others want business-hours-only or weekend-only to limit incident response load. Define time zones explicitly; "9-5" is meaningless on a global team.

Techniques

What's permitted - phishing, password spraying, social engineering, denial-of-service simulation, malware deployment, physical. Most pure cloud engagements exclude DoS and social engineering by default; red-team engagements typically include them with explicit approval.

Communications

Daily standups, end-of-day status reports, immediate notification of high-impact findings, the emergency kill-switch contact. Both sides should know who to call at 3 AM when a tester's automation runs longer than expected.

Escalation

What triggers a stop-test? Finding evidence of an active intrusion (not yours), discovering customer PII that wasn't expected in scope, accidentally affecting a production system, regulatory data accessed without authorization. Define the threshold and the response.

IR notification

Who on the SOC / IR team knows the test is happening, and what level of detail they're given. Fully informed = compliance test. Partially informed (deconflict only) = realistic exercise. Uninformed = full red team, with executive sponsor available to vouch on demand.

Sign every revision. RoE that drift mid-engagement without written approval are the source of every "I thought you said we could do that" conversation that ends a relationship with a customer or an internal stakeholder.

Methodology frameworks

You don't need to memorize all of these, but you do need one as your skeleton. The discipline of working through every phase - not just the ones that produced findings last time - is what separates pentesting from ad-hoc hacking.

Pick PTES or ATT&CK as the primary; reference the others. The deliverable should map every finding to a phase and a technique - the reader downstream (engineer, auditor, executive) gets to navigate it that way.

Recon

Cloud recon is less about port-scanning and more about enumerating the public surface area an organization has accidentally created: forgotten buckets, expired CDN domains, certificate-transparency leaks, credentials in public repos. The work is fast, asymmetric, and entirely legal when done against your own org (or with permission).

Public storage enumeration

DNS, ASN, and certificate-transparency

Credential discovery

Initial access vectors

Cloud initial access overwhelmingly comes from credentials, not exploits. The 2022-2025 breach record (Capital One, SolarWinds-related, MOVEit, Snowflake-customer wave, Sisense, Microsoft midnight-blizzard, et al.) is a long list of stolen, exposed, or phished access tokens. Concrete vectors:

AWS attack paths

AWS attack paths overwhelmingly involve IAM. Once you've established a foothold (credentials, role, instance), the workflow is: enumerate what you can do, find the path to what you want, execute it.

Enumeration

IMDS abuse

SSRF against http://169.254.169.254/latest/meta-data/iam/security-credentials/ returns the role name; another GET returns temporary credentials valid for the role's session. IMDSv2 requires a PUT to /latest/api/token first with a TTL header, blocking simple SSRF. Many environments still allow IMDSv1; aws ec2 modify-instance-metadata-options --http-tokens required is the fix.

IAM privilege escalation chains

The named primitives - memorize them, then watch Rhino's full matrix for the rest:

Resource-specific paths

Pacu modules to know

Pacu packages most of the above into named modules. The first-day list:

Azure attack paths

Azure paths split between the management plane (subscriptions, resource groups, RBAC) and the identity plane (Entra ID, formerly Azure AD). The most damaging attacks chain across both - phish an identity, abuse RBAC inheritance, pivot to managed identities, escalate to Global Administrator.

Entra ID / Azure AD recon

Phishing & consent grants

RBAC and managed identity

Tools

GCP attack paths

GCP gets less open-source pentest tooling attention than AWS or Azure, but its attack surface is fully present. Service-account impersonation is the GCP equivalent of iam:AssumeRole, and most privilege escalations chain through it.

Recon

GCP IAM privilege escalation

Rhino Security Labs maintains the canonical GCP IAM privilege escalation matrix. The named primitives:

Resource-specific paths

Tools

Kubernetes attack paths

Kubernetes is its own attack surface that sits inside a cloud account. Compromising a pod gives you a foothold; the question is whether you can escape the pod, escalate within the cluster, or reach the cloud control plane via the cluster's identity.

Tools

More on hardening: the Kubernetes page.

Container & supply-chain attacks

The 2020-2025 wave (SolarWinds, Codecov, ua-parser-js, MOVEit, 3CX, Okta breach via Sisense, XZ Utils, npm typosquats) made supply-chain the most discussed attack vector in cloud security. From an offensive perspective, the patterns:

See threat research - supply chain attacks for the running incident log; see CI/CD for the defensive side.

Persistence in cloud

Persistence in cloud rarely looks like a host-resident implant. It looks like an IAM artifact that an attacker can use to come back even after the initial credentials are rotated.

Defenders: this is exactly the catalog detection engineering should be building rules against. The CloudTrail / Activity Log / Audit Log events for each of the above are well-documented; alerting on them by default catches a large fraction of real-world post-compromise behavior.

Data exfiltration

The recurring pattern: cloud exfil prefers staying inside the cloud's own network where possible. Egress monitoring designed for traditional data exfiltration (lots of bytes leaving via HTTPS) misses most of it. Provider-side controls (resource perimeters, deny-by-default cross-account flows, replication monitoring) are the actual defense.

Open-source toolkit overview

The full landscape, organized by where you'll use it. Most tools are MIT or Apache-2.0; treat the active maintainer list as a signal of whether something is current.

AWS

Azure

GCP

Multi-cloud and graph

Adversary emulation & training

For where to practice with these - full lab and CTF list - see the CTFs page.

Pentest vs red team vs adversary emulation vs purple team

The terms get used interchangeably. They're not the same activity, and confusing them in scoping leads to underwhelmed customers.

Activity Goal Method Deliverable Typical duration
Pentest Find exploitable issues in defined scope Manual + tool-assisted technical testing Finding-by-finding report with CVSS-style scoring 1-4 weeks
Red team Achieve a defined objective covertly Whatever the RoE permits - phish, code, physical Attack narrative + objective-completion summary 4-12 weeks
Adversary emulation Reproduce a specific actor's TTPs to test detections Structured execution of mapped ATT&CK techniques Coverage matrix (which techniques detected, which missed) 1-3 weeks per emulation plan
Purple team Collaboratively improve detection & response Red-team executes, blue-team watches, both iterate live Improved detections; closed gaps; team training 1-5 days per exercise; ongoing program
Bug bounty Crowdsource finding novel issues External researchers; pay per valid report Continuous flow of triaged findings Continuous

A mature program uses several. The most common starting sequence: annual external pentest (for compliance), quarterly internal red team or Stratus-driven exercises (for change validation), continuous bug bounty (for novel-issue volume), regular purple-team days (for detection improvement).

Reporting

The report is the deliverable. Most engagements live or die on it: a clean technical narrative that shows the chain, not just the findings, is what gets remediation prioritized. The structure that travels well:

The single most common reporting mistake is dropping a list of CVEs without showing exploitability. A CVE-2024-something on an internal Lambda runtime is a finding. A CVE-2024-something on a public Lambda function URL that accepts unauthenticated requests and uses a role with iam:PassRole on the admin role is a critical attack chain. Same vuln, different report.

CSP offerings & vetted firms

The cloud providers each ship offensive-security-adjacent services, and there's a small set of consulting firms that have built distinct cloud-pentest reputations.

Provider offerings

Cloud-pentest-known firms

Selection criteria: ask for redacted reports from prior engagements, ask which tools they maintain or contribute to, ask whether they'll use a named team or hand the work to whoever's available. The named-firm-but-junior-tester pattern is the dominant complaint about big-name consulting.

AWS, Azure, and GCP side-by-side

The pentest-relevant differences between the three clouds, reduced to a one-screen reference.

Dimension AWS Azure GCP
Customer pentest policy Permitted on most services without approval (since 2019) Permitted per Rules of Engagement Permitted per Acceptable Use Policy
Notification required For disruptive techniques only (Simulated Events form) For some high-impact tests No, but contact required if AUP applies
Out-of-scope Provider infrastructure, other tenants Provider infrastructure, other tenants, M365 separately Provider infrastructure, other tenants
Dominant identity model IAM users + IAM roles + STS Entra ID + RBAC + managed identities Cloud IAM + service accounts
Privesc primitive iam:PassRole + run/invoke service RBAC role assignment / managed-identity abuse iam.serviceAccounts.actAs + service that runs as it
Metadata endpoint 169.254.169.254 (IMDSv1 / v2) 169.254.169.254/metadata/identity (header required) metadata.google.internal (Metadata-Flavor: Google)
Premier exploitation framework Pacu ROADtools + MicroBurst + AADInternals No single dominant; gcp_enum, gcp_scanner, ScoutSuite
Graph tool BloodHound CE, PMapper, awspx BloodHound CE + AzureHound BloodHound CE + gcphound
Provider attack-emulation tooling AWS Threat Detection Tester, Detective Defender for Cloud attack-path analysis Security Command Center Attack Path Simulation
Vulnerable-by-design lab CloudGoat, AWSGoat, Sadcloud, IAM Vulnerable AzureGoat, MicroBurst-Lab GCPGoat, GCP-Goat

Maturity stages

A useful staging model for a cloud-offensive-testing program:

Stage 1 - Annual external pentest

One external engagement per year, scoped against the compliance framework (SOC 2, ISO 27001, PCI DSS). Findings tracked in a spreadsheet; retest scheduled six months later. The minimum for most audit programs. Cost: $40k-$150k depending on scope.

Stage 2 - Quarterly internal exercises

Stratus Red Team or CloudGoat-driven exercises run quarterly by the security team. Bug bounty stood up. Findings flow to the same backlog as external-pentest findings. Detection engineering starts pairing rules to each emulated technique.

Stage 3 - Continuous adversary emulation

Adversary-emulation plans for the threat actors most relevant to the business - credential-theft groups, ransomware affiliates, supply-chain actors. Each plan run continuously; gaps in detection coverage tracked as a primary security metric. Internal red team forming.

Stage 4 - Mature purple-team program

Dedicated red team, integrated with detection engineering and IR. Engagement objectives align to business risk (exfiltrate the customer database, achieve admin-level persistence undetected for 30 days). Findings inform product roadmap, not just remediation tickets.

Skipping stages is expensive. Stage 4 without Stage 1's compliance signoff means the audit fails. Stage 4 without Stage 2's tool familiarity means the red team can't reproduce its own findings in writing. Sequence matters.

Common pitfalls

Further reading

Reference wikis and matrices

Provider policies

Tools to bookmark

Related CSOH pages

FAQ

How is cloud pentesting different from traditional network pentesting?

The attack surface is the cloud control plane API, not a network of listening ports. Identity replaces network position as the primary perimeter - an attacker with a leaked access key or a stolen OIDC token is already "inside," regardless of what's open on TCP. The toolkit shifts accordingly: less nmap and Metasploit, more Pacu, ROADtools, BloodHound, Cloudfox, and provider SDKs. The vulnerability classes shift too: misconfigured trust policies, over-permissioned roles, IMDS abuse, public storage objects, and privilege-escalation chains across the IAM graph dominate the findings list.

Do I need permission from AWS, Azure, or GCP to pentest my own cloud account?

Mostly no - but read each provider's policy. AWS removed the pre-authorization requirement for most penetration testing against your own resources in 2019; certain disruptive techniques (DNS zone walking, port flooding, protocol flooding, request flooding) still require explicit approval via the Simulated Events form. Azure documents allowed activities in its Penetration Testing Rules of Engagement; some operations require notification. GCP allows customer-owned testing without prior approval but expects compliance with the Acceptable Use Policy. In all three: you are only permitted to test resources you own or have explicit written authorization to test - and never the underlying provider infrastructure, other tenants, or third-party SaaS that happens to live in the same cloud.

What's the single highest-value tool to learn first?

For AWS, Pacu - Rhino Security Labs' exploitation framework. It's modular (60+ modules for enumeration, privilege escalation, persistence, and exfiltration), it teaches the API-call shape of each technique, and the source code is readable. Pair it with Cloudfox for fast situational awareness and PMapper or awspx for privilege-escalation-path analysis. For Azure, ROADtools (recon) plus BloodHound with AzureHound (graph analysis) is the equivalent starter kit. For GCP, gcphound or ScoutSuite plus the GCP IAM Privilege Escalation matrix from Rhino Security Labs.

What's the difference between a pentest, a red team, and adversary emulation?

A pentest is scope-bounded technical testing - find as many exploitable issues in the defined target as time allows, report them. A red team is goal-bounded adversary simulation - achieve a defined objective (exfiltrate a crown-jewel dataset, persist for 30 days undetected, demonstrate ransomware staging) using whatever techniques the rules of engagement permit, including phishing, physical, and supply-chain pivots. Adversary emulation is a structured exercise that reproduces a specific threat actor's known TTPs (mapped to MITRE ATT&CK) to test specific detections, usually as a purple-team collaboration with the SOC. Pentest finds bugs, red team tests resilience, adversary emulation tests detections.

Is it legal to run BloodHound or Pacu against my employer's cloud?

Only with explicit written authorization. "I work here and I'm curious" is not authorization. The Computer Fraud and Abuse Act (U.S.), the Computer Misuse Act (U.K.), and equivalent laws in most jurisdictions criminalize unauthorized access - and that includes employees exceeding their authorized scope, not just outside attackers. Get a written rules-of-engagement document signed by an executive with authority over the systems in scope, naming the testers, the windows, the techniques, and the targets. Without that document you have no defense if something breaks or if findings turn up evidence of someone else's actions.

How often should we be running cloud pentests?

Annual external pentests are table stakes for SOC 2 / ISO 27001 / PCI DSS programs. The valuable cadence is continuous: monthly internal exercises with Stratus Red Team, CloudGoat, or your CSPM's adversary-emulation features; quarterly focused engagements on specific changes (new region, new product, new IAM model); and an annual external red team that tests the whole program end to end. The compliance pentest checks the box; the continuous program finds the real issues before an actor with worse intent does.

Should we hire external testers or build an internal red team?

Both, eventually. External testers bring fresh eyes, current TTPs, and the credibility of an independent report - required for most compliance frameworks and most enterprise customer audits. Internal red teams bring depth on your specific environment, continuous coverage, and the relationship with detection engineering that makes purple teaming work. Most orgs start with annual external pentests, add Stratus / CloudGoat-style internal exercises in year two, and stand up an internal red team somewhere around 200-500 engineers if the threat model justifies it. Financial services, defense, and large SaaS reach that threshold sooner.

Where next