How We Deploy csoh.org Across AWS, GCP & Azure

A plain-English tour of how we host this one static site on three clouds at once - served active/active behind Cloudflare, deployed to each with no stored passwords. Eight layers of security, what each defends against, and the actual config files that make it work. Written for anyone who's ever deployed a website and wondered what a "real" multi-cloud setup looks like underneath.

· · Multi-cloud (AWS · GCP · Azure) · View source on GitHub

Why this page exists. csoh.org is a community for cloud security practitioners. We host it ourselves - one static site served active/active from three clouds at once (AWS, GCP, and Azure) behind Cloudflare - and we treat the deployment as a teaching artifact: every choice we made is on this page, with a plain-English explanation of why it's there and what attack it stops. The Terraform and GitHub Actions YAML that actually runs in production is linked throughout so you can read the real thing.

Who this is for. If you've ever deployed a website to a shared web host (cPanel, Netlify, GitHub Pages) and you want to understand what a "real" cloud deployment looks like - this is your page. We assume you know HTML/HTTP basics. We don't assume you know GCP, IAM, CI/CD, or container security. Every term we use links to the glossary on first use.

How to read this page. Top-to-bottom for the full tour, or jump straight to the section you care about via the table of contents below. Each layer of defense gets its own section that opens with "the attack we're stopping" in plain language before we touch the technical detail.

On this page

  1. The big picture: defense in depth
  2. What attacks are we actually defending against?
  3. Architecture diagram
  4. Layer 1 - Cloudflare: the one edge in front of everything
  5. Layer 2 - Three cloud origins, active/active
  6. Layer 3 - TLS (the lock icon in the browser)
  7. Layer 4 - The origins themselves (S3, Cloud Run, Blob)
  8. Layer 5 - The bytes we ship to each origin
  9. Layer 6 - How we deploy to three clouds without a saved password
  10. Layer 7 - Protecting the deploy pipeline itself
  11. Layer 8 - Logging and what we'd see during an attack
  12. What we didn't do (and why)
  13. What this costs to run
  14. If you want to copy this for your own site
  15. Further reading

The big picture: defense in depth

The single most important idea on this page is called defense in depth: instead of relying on one strong wall, we stack many imperfect ones. If an attacker bypasses one layer, the next one is still in their way. None of the individual controls below are unbreakable - but together, an attacker has to be lucky on every layer at once, while we only have to be lucky on one.

Defense-in-depth onion: eight concentric layers around the application A diagram showing the site at the center surrounded by eight concentric rings. From outermost to innermost: the Cloudflare edge (CDN, DDoS absorption, WAF, load balancer); three cloud origins active/active with health checks; TLS at Full strict plus security headers; locked-down origins (private S3, zero-IAM Cloud Run, $web-only Azure); hardened bytes (digest pin, vulnerability scan, no leaked files); keyless deploy to three clouds (OIDC federation); pipeline guards; and 400-day audit logs. the site 1. Cloudflare edge (CDN, DDoS, WAF, LB) 2. Three origins, active/active + health checks 3. TLS (Full strict) & headers (HSTS, CSP) 4. Locked-down origins (private S3, zero-IAM) 5. Hardened bytes (digest pin, scan, no leaks) 6. Keyless deploy to 3 clouds (OIDC, 1-hour) 7. Pipeline guards (CODEOWNERS, env) 8. 400-day audit logs
Defense in depth visualised. The site is at the center; each ring is a separate layer of defense an attacker would have to defeat to reach it.

Our stack has eight layers. Each one is a section on this page:

  1. Cloudflare in front of everything - terminates TLS, caches at the edge, runs the WAF, sets security headers, applies legacy redirects, and load-balances across our three origins with health checks. This one edge does all of it, in front of three interchangeable origins.
  2. Three cloud origins, active/active - the same site lives on AWS (S3 + CloudFront), GCP (Cloud Run), and Azure (Blob static website). Cloudflare spreads live traffic across all three and pulls any unhealthy one out of rotation automatically.
  3. TLS end-to-end (browser → Cloudflare, Cloudflare → each origin at Full strict) - encrypted, modern ciphers, certificates that auto-renew, no unauthenticated hop anywhere.
  4. The origins themselves - each locked down: S3 is private (reachable only via CloudFront), Cloud Run runs as a zero-permission identity, Azure serves only its public $web container.
  5. The bytes we ship - the GCP container is pinned to known-good bytes and vulnerability-scanned; the object-storage origins get only an allowlisted public file set (sensitive files are never uploaded).
  6. The deploy identity - GitHub Actions deploys to all three clouds without a single stored password, using keyless OIDC federation that grants ~1 hour of narrowly-scoped access per workflow run, per cloud.
  7. The pipeline itself - Code Owners review on the deploy workflow, branch protection on main, secret scanning, push protection.
  8. Logging - Cloudflare zone analytics at the edge, plus per-cloud origin + IAM/audit logs (GCP's kept for 400 days).

Read on for the plain-English version of each layer, what it's defending against, and what the actual config looks like.

What attacks are we actually defending against?

Before designing controls, you need a threat model - a list of "what could go wrong, and roughly how likely is it?" For a public, static site, the surface is smaller than you might think:

That eliminates most of the OWASP Top 10 right off the bat. What's actually left, ranked from most-to-least likely:

  1. Someone tampers with the build. An attacker compromises the base image we use, a GitHub Action we depend on, or a CI token, and slips malicious bytes into our deploy. The site visibly looks normal but ships malware to readers. Mitigated by: pinning the base image to its content hash, pinning every GitHub Action to a specific commit, scanning the built image for vulnerabilities, and refusing to overwrite image tags after they're published.
  2. Someone steals the deploy credentials. Historically the worst single way to compromise a website: leak the CI's deploy password and now anyone with that password can publish whatever they want. Mitigated by: not having a deploy password at all (see Layer 6 - keyless deploys).
  3. Someone messes with our DNS or TLS. Misissued certificate, an on-path attacker downgrading HTTPS to HTTP, DNS hijack. Mitigated by: two-factor on Cloudflare, registrar lock on the domain, HSTS preload (browsers refuse plain HTTP for our domain, period), modern TLS only.
  4. Someone defaces the site. Got into the build pipeline somehow and pushed an embarrassing change. Mitigated by: everything in (1) and (2), plus a one-command rollback (every Cloud Run revision is pinned to a specific image hash; "go back to yesterday's deploy" is a single CLI call).
  5. Volumetric attack (DDoS) or an origin/region outage. Someone tries to take the site offline by sending an enormous amount of traffic - or one cloud simply has a bad day. Mitigated by: Cloudflare absorbing the bulk of traffic at its edge, edge rate limiting, the CDN serving cached content even if an origin goes down, and - new in the multi-cloud design - three independent origins behind a health-checked load balancer, so an entire cloud can fail and the site keeps serving from the other two.
  6. Bot scraping and probing. Bots constantly throw classic attack patterns (?id=1' OR 1=1--) at every endpoint on the public internet. Mitigated by: Cloudflare's WAF (the free Managed Ruleset) plus a rate-limit rule, silently dropping those requests at the edge so they never reach any of our three origins.

What we explicitly do not defend against, because none of it applies to a static site: authenticated session theft, broken access control, business-logic abuse, privilege escalation from an application server. If you're reading this page to copy the design for a site that does have logged-in users - you need more than what's here.

Architecture diagram

Here's the whole system in one picture. Don't worry if some of the labels are unfamiliar - every box is explained in its own section below.

csoh.org production architecture A flow diagram showing the request path on the left from browser through Cloudflare's edge, which load-balances across three cloud origins (AWS S3+CloudFront, GCP Cloud Run, Azure Blob static website); a parallel build path on the right showing a single build that fans out to three keyless-OIDC publish jobs; and an observability layer underneath. Reader's browser HTTPS · sees Cloudflare's cert Cloudflare - the one edge • Terminates browser TLS (Universal SSL) • Edge cache (CDN) + absorbs DDoS • WAF (free managed ruleset) + rate limit • Security headers + legacy redirects • Load Balancer: active/active + health checks • Full (strict) TLS to every origin random steering across healthy origins AWS S3 (private) + CloudFront OAC · HTTPS GCP Cloud Run min=0 · no LB *.run.app Azure Blob static website $web HTTPS byte-identical content on all three Build & deploy path (out of band) git push to main triggers deploy.yml build once search index + stage dist/ fan out · keyless OIDC per cloud publish-aws: s3 sync + CloudFront invalidate assume IAM role via OIDC publish-azure: blob sync to $web Entra federated credential via OIDC publish-gcp: container · Trivy · Cloud Run push immutable tag to Artifact Registry WIF (no keys) · rollback = 1 CLI command each publishes to its origin → Observability - edge + per-cloud Cloudflare zone analytics & LB health · GCP origin/IAM/audit logs → 400-day bucket edge metrics for total traffic; origin logs for what reached each cloud
The full request path on the left, the build & deploy path on the right, the logging pipeline running underneath both. Every box and arrow is config in infra/terraform/.

Everything in that diagram - every origin setting, every Cloudflare rule, every IAM permission across all three clouds - is defined as code in infra/terraform/, which has one directory per cloud (aws/, gcp/, azure/, cloudflare/). We never click around in any cloud console to make changes; the consoles are read-only for normal operation. This is called infrastructure as code, and it's how you keep a real multi-cloud setup from drifting into three different snowflakes nobody can rebuild.

Layer 1 - Cloudflare: the one edge in front of everything

What it stops: volumetric attacks (DDoS), known-bad bots, exposing our origins to the public internet - and a whole cloud going down.

The big idea. Cloudflare isn't just a CDN sitting in front of a cloud load balancer - it is the load balancer, the WAF, the TLS terminator, the redirect engine, and the security-header layer, all at once. Running those same controls again on a per-cloud load balancer would be paying twice for them, so we don't: one edge (Cloudflare's free plan plus the ~$5/mo Load Balancing add-on) does all of it, in front of three interchangeable origins.

MITRE ATT&CK mitigated: T1498 (Network Denial of Service), T1499 (Endpoint Denial of Service), T1595 (Active Scanning).

How it works in plain English. When you type csoh.org in your browser, the DNS lookup returns a Cloudflare IP - not ours. Your browser opens a TLS connection to Cloudflare. Cloudflare looks at the request and one of three things happens:

  1. The page is cached at Cloudflare's edge. Cloudflare returns the cached response directly. We never see this request. (For a static site like ours, this is most traffic.)
  2. The page isn't cached. Cloudflare opens its own connection to our load balancer and fetches it on your behalf, then caches the result so the next reader gets the cached version.
  3. The request is bad. Cloudflare's bot mitigation, rate limiting, or threat intelligence flags it; the request is blocked before it ever reaches us.

This pattern is called a reverse proxy or CDN. The security wins are big:

The trade-off, worth being honest about: Cloudflare's free-plan WAF is a lighter rule set than a tunable OWASP Core Rule Set. For a static site with no database or login that's an easy trade (see the "What we didn't do" section for how we'd restore parity). Our origins only ever see requests from Cloudflare IPs, not real readers - which is exactly why per-IP rate limiting belongs at Cloudflare's edge, where the real client IP is visible, rather than at the origin.

Layer 2 - Three cloud origins, active/active

What it stops: a single cloud (or region) outage taking the site down; vendor lock-in; the cost of running a dedicated cloud load balancer just to get an HTTPS front door.

MITRE ATT&CK mitigated: T1499 (Endpoint Denial of Service, via failover), T1498 (Network Denial of Service), T1195 (Supply Chain Compromise, via not depending on one vendor's pipeline).

The shape. The exact same static site lives on three clouds at once. Cloudflare's Load Balancer holds all three in one pool and uses random origin steering to spread live requests across every healthy origin - this is what "active/active" means: they all serve real traffic simultaneously, not "one live, two on standby." A health monitor probes each origin every minute; any that fails is pulled out of rotation automatically and slipped back in when it recovers.

Why each origin is shaped the way it is

The one hard requirement: every origin must answer over HTTPS with a valid certificate, so the Cloudflare→origin leg can run at Full (strict) - no unencrypted or unauthenticated hop anywhere. That requirement quietly drives each choice:

Notice the pattern: none of the three needs a cloud load balancer, a managed-cert dance, or a WAF product, because Cloudflare does all of that once at the edge. Each origin is reduced to "the cheapest way this vendor will hand me an HTTPS URL for a folder of files."

The three stacks, side by side

Each cloud ends up with a deliberately different shape, because each vendor's cheapest path to a valid-HTTPS origin is different. Here is the full stack on each, end to end - what serves the bytes, what's exposed, how it gets a cert, how CI publishes to it, and what (if anything) runs code:

The AWS, GCP, and Azure deployment stacks side by side Three columns, one per cloud, each showing its keyless deploy mechanism above its HTTPS origin, fed from a single GitHub Actions build at the top and pulled by the Cloudflare edge at the bottom. AWS: OIDC AssumeRoleWithWebIdentity to the csoh-site-publisher IAM role, s3 sync plus CloudFront invalidate, serving a private S3 bucket through CloudFront on a cloudfront.net certificate, running no code. GCP: OIDC Workload Identity Federation to the csoh-deployer service account, a container build with Trivy scan deployed to Cloud Run, serving an nginx container on a run.app certificate as a zero-permission service account. Azure: OIDC Entra federated credential with no client secret, az storage blob sync to the web container, serving the storage account static website on a web.core.windows.net certificate, running no code. GitHub Actions - one build, then publish keyless (OIDC) to all three clouds AWS Keyless deploy OIDC → AssumeRoleWithWebIdentity IAM role: csoh-site-publisher s3 sync + CloudFront invalidate HTTPS origin Private S3 bucket → CloudFront (OAC) cert: *.cloudfront.net runs no code GCP Keyless deploy OIDC → Workload Identity Federation service account: csoh-deployer build · Trivy scan · Cloud Run deploy HTTPS origin Cloud Run (nginx container) ingress=all · scale-to-zero cert: *.run.app runs as a zero-IAM service account Azure Keyless deploy OIDC → Entra federated credential (no client secret stored) az storage blob sync → $web HTTPS origin Storage Account static website $web container only cert: *.web.core.windows.net runs no code Cloudflare edge - health-checks all three, steers traffic to whichever origins are healthy Full (strict) TLS on every Cloudflare → origin hop · one set of WAF / headers / redirects / cache for all three
Each cloud's deployment end to end: one build fans out to three keyless (OIDC) publish jobs, each landing on that vendor's cheapest valid-HTTPS origin; the Cloudflare edge pulls from whichever origins are healthy.
Aspect AWS GCP Azure
Serves the bytes Private S3 bucket behind a CloudFront distribution Cloud Run running our nginx container (scale-to-zero) Storage Account static website ($web container)
Public surface Only the CloudFront URL; bucket blocks all public access (OAC-keyed to the distribution) The *.run.app URL (ingress = all) Only the $web endpoint; every other blob stays private
Origin TLS cert *.cloudfront.net (AWS-managed) *.run.app (Google-managed) *.web.core.windows.net (Azure-managed)
Keyless deploy auth OIDC → sts:AssumeRoleWithWebIdentity → IAM role csoh-site-publisher OIDC → WIF → impersonate csoh-deployer service account OIDC → Entra federated credential on an app registration (no client secret)
Deploy permission scope Write the one bucket + invalidate the one distribution Push to Artifact Registry + deploy Cloud Run revisions Storage Blob Data Contributor on the one account
How CI publishes aws s3 sync --delete + CloudFront invalidate docker build → Trivy scan → push immutable tag → gcloud run deploy az storage blob sync into $web
Runs code? No - static objects, no runtime identity to abuse Yes (nginx) - runs as a zero-IAM service account No - static objects, no runtime identity to abuse
Why this shape S3's own website endpoint is HTTP-only, so CloudFront is the cheapest way to get a valid-HTTPS, private origin Cloud Run gives HTTPS + a cert for free and idles to ~$0, so it needs no load balancer in front The $web endpoint is HTTPS out of the box - the simplest valid origin of the three, no compute at all

The throughline: we let each cloud do the one thing it does cheapest, and pushed everything else (TLS to the browser, caching, WAF, redirects, headers, failover) up to the single Cloudflare edge. That's why two origins run zero code and the third runs a zero-permission container - the less each origin is trusted to do, the smaller the blast radius if any one of them is ever compromised.

One subtlety: the Host header

Each origin answers on its own hostname (…cloudfront.net, …run.app, …web.core.windows.net). If Cloudflare forwarded the public Host: csoh.org to them, each would reject the request - it doesn't recognize that name. So every origin in the Cloudflare pool sets a Host-header override to its own hostname. Small detail, but it's the thing that most often trips people up the first time they put object storage behind a proxy.

WAF, rate limiting, redirects, and caching - set once at the edge

All of it lives at Cloudflare, set once and applied no matter which origin serves the response:

Layer 3 - TLS (the lock icon in the browser)

What it stops: someone reading or modifying the page in transit between the user's browser and us.

MITRE ATT&CK mitigated: T1557 (Adversary-in-the-Middle), T1040 (Network Sniffing), T1565.002 (Transmitted Data Manipulation).

What's TLS? TLS (Transport Layer Security) is the modern name for what people called SSL - the encryption layer that makes URLs https:// instead of http://. Two computers establish a TLS connection, prove identity to each other with certificates, agree on a shared secret, and from there everything is encrypted. The lock icon in the browser is the user-facing signal.

Our setup has TLS at two separate layers, which often confuses people the first time they see it:

  1. Browser ↔ Cloudflare. Cloudflare's "Universal SSL" certificate, valid for csoh.org and www.csoh.org. This is the cert your browser actually validates and shows the lock icon for. It auto-renews on Cloudflare's normal cadence; we don't manage it.
  2. Cloudflare ↔ each origin, at Full (strict). A separate TLS connection on the back side of Cloudflare to whichever origin it picked. Each origin presents its own provider-managed cert (CloudFront's *.cloudfront.net, Cloud Run's *.run.app, Azure's *.web.core.windows.net), and Cloudflare's SSL/TLS mode is set to Full (strict), meaning it validates that origin cert rather than blindly trusting it. There is no unencrypted or unauthenticated hop anywhere in the path.

Why two layers and not just one? Because Cloudflare doesn't have your domain's private key. They generated their own cert that the browser trusts (Cloudflare is a public Certificate Authority); each origin presents a cert its own cloud provider issued and renews. Each cert covers what its owner can prove they control, and neither side has to share secrets.

The hardening details

Other security headers - set once at the edge

HSTS is the highest-impact header but not the only one. With three different origins, having each one set headers identically would be three places to drift out of sync - so we set them once at Cloudflare (a response-header Transform Rule). Whichever cloud serves the bytes, the response carries the same headers:

You can see all of these by running curl -I https://csoh.org/ from any terminal. They're public; that's the point.

Layer 4 - The origins themselves (S3, Cloud Run, Blob)

What it stops: reaching data an origin shouldn't expose; over-permissive cloud access if an origin were ever compromised.

MITRE ATT&CK mitigated: T1530 (Data from Cloud Storage), T1078.004 (Valid Accounts: Cloud Accounts), T1098.003 (Account Manipulation: Additional Cloud Roles).

Each origin is just "the cheapest HTTPS front door this cloud offers for a folder of static files" - but each is locked down so the only thing reachable is the site itself.

Each origin exposes only the site, nothing else

The compute origin's identity has zero permissions

The GCP origin is the one that runs code (nginx in a container), so it's the one that needs an identity. Every workload in GCP runs as a service account - an identity that holds the cloud permissions for whatever code is using it. A misconfigured workload SA is one of the most common cloud security mistakes: people grant "Editor" to the application's identity "to make it work," and now every CVE in the application is potentially also a path to "rewrite all the GCP resources in this project."

Our application is static nginx that makes no GCP API calls. So we created a dedicated service account (csoh-run-runtime) with zero IAM roles and run the container as that identity. If the container were ever compromised - RCE in nginx, malicious bytes in the image, anything - the attacker gets a foothold in a process that can't talk to anything else in the cloud project. Its blast radius is the container itself; that's the whole point. (The object-storage origins, AWS and Azure, run no code at all, so there's no runtime identity to abuse there - the attack surface is just "static files served read-only.")

This is a practical example of zero trust applied to your own application: don't grant your code anything you can't justify, and "I might need it later" is not a justification.

Layer 5 - The bytes we ship to each origin

What it stops: shipping malicious or vulnerable bytes to production by accident - or accidentally publishing a file that should never be public.

MITRE ATT&CK mitigated: T1195.002 (Compromise Software Supply Chain), T1525 (Implant Internal Image), T1552.001 (Unsecured Credentials in Files).

Two origin types, two flavors of "what we ship." The GCP origin ships a container image (nginx + the site), so it gets the full container supply-chain treatment below. The object-storage origins (AWS, Azure) ship a folder of files - so for them, "supply chain" means making sure that folder contains only what's meant to be public.

0. The object-storage origins: an allowlist, not request-time blocking

An nginx origin can keep sensitive files (dotfiles, .py scripts, internal .json, anything with a key in it) present in the container but blocked at request time by nginx rules. Object storage has no request-time rules - whatever you upload is world-readable. So for the object-storage origins we flip the model: a single build step (stage_site.sh) stages a dist/ directory containing only the public file set, and that's what gets synced to S3 and Azure. The allowlist (site-publish.filter) mirrors the nginx block rules exactly, and the build fails loudly if a secret-shaped file ever slips into dist/. Not uploading a file is a stronger guarantee than serving it and hoping a deny rule catches every request for it.

What's a container image?

A container image is a packaged-up filesystem snapshot - your application code, plus the operating system files it needs to run, frozen as one shippable unit. For the GCP origin we push the image to a registry (Google's Artifact Registry); Cloud Run pulls it from there and runs it. The three controls below protect that image.

What's pinning? Pinning is naming a dependency by something the publisher cannot quietly redefine. A version like nginx:1.27-alpine or actions/checkout@v4 looks specific, but it's just a label - whoever owns it can repoint that label at different bytes tomorrow, and your build will pull the new bytes the next time it runs. Pinning means replacing that label with an immutable identifier - for container images, the SHA-256 digest of the exact bytes (@sha256:65645c…); for GitHub Actions, the full commit SHA (@a1b2c3d…). The label can move; the hash can't. If anyone tampers with the artifact upstream, the hash no longer matches, and the build fails closed instead of silently shipping the new bytes. We pin every external thing our deploy depends on (base image, every GitHub Action, the Cloud Run revision we route traffic to) for exactly this reason: it makes our deploy tamper-evident and reproducible. The trade-off is friction - somebody has to manually update the pin when we want a newer version - but that friction is the feature: an automated supply-chain attack can't propagate to us silently.

The supply chain for our container has three places where an attacker could substitute "what we meant to ship" with "what they preferred to ship." Each one gets a control:

1. The base image we start from

A typical Dockerfile starts with a line like FROM nginx:1.27-alpine, meaning "use whatever the nginx 1.27-alpine image is right now." But "right now" is whatever the registry returns. If someone compromised the registry, or the image owner's account, or any link in their build chain - that FROM line ships a malicious base layer into your image, and you'd never know unless you bought tooling specifically to detect it.

We pin to the content hash instead:

FROM nginx:1.27-alpine@sha256:65645c7bb6a0661892a8b03b89d0743208a18dd2f3f17a54ef4b76fb8e2f2a10

That long string after the @ is the cryptographic hash of the exact bytes we expect. Docker downloads the image, computes the hash, and refuses to use it if the hash doesn't match. The registry can't substitute different bytes without changing the hash, and the changed hash would make our build fail. Tamper-evident.

The trade-off: we have to manually update the hash when we want a newer base image. That's friction by design - it means an automated supply-chain attack doesn't propagate to us silently.

2. Stale packages on top of the pinned base

Pinning the hash freezes the base image's bytes. But the OS packages inside that base image (libssl, libxml2, libpng, etc.) keep getting new security fixes upstream. A digest pinned 6 months ago has 6 months of accumulated CVEs baked in.

We solve this by running apk upgrade immediately after the pinned base, in our Dockerfile:

RUN apk upgrade --no-cache && \
    rm -rf /var/cache/apk/*

That tells the package manager: "fetch the current versions of every installed package, in this build." We start from a known good snapshot (the digest pin) and end with current security patches (the upgrade). The next layer (Trivy scanning) verifies we haven't missed anything.

3. The CI build artifact

Even with both controls above, a clever attacker might find a CVE in a newly-disclosed package that we just included. So every container we build gets scanned, in CI, by Trivy - an open-source vulnerability scanner. The scan walks every package in the image, cross-references it against public CVE databases, and the build fails if anything HIGH or CRITICAL shows up:

trivy image \
  --exit-code 1 \
  --ignore-unfixed \
  --severity HIGH,CRITICAL \
  "${IMAGE}"

The --ignore-unfixed flag filters CVEs that don't have a fix available yet - those are noise we can't act on, and including them would just train people to ignore the scan output.

If the scan passes, the image is pushed to Artifact Registry (Google's container registry) with two important properties:

The Artifact Registry repo also has a retention policy: keep the 30 most recent images, delete untagged ones older than 7 days. We keep enough history for any sane rollback without paying for unbounded storage.

Layer 6 - How we deploy to three clouds without a saved password

What it stops: credential theft from the deploy pipeline. The most common single vector of website compromise - and with three clouds, three times the credentials that don't exist to steal.

MITRE ATT&CK mitigated: T1552.001 (Credentials In Files), T1552.004 (Private Keys), T1528 (Steal Application Access Token).

This is the most consequential design choice on this page. Read it carefully. The naïve way to deploy to three clouds would be to store three sets of long-lived credentials (an AWS access key, a GCP service-account JSON, an Azure client secret) in GitHub Secrets - tripling the blast radius of a leaked secrets store. We store none of them. Every cloud is reached with keyless OIDC federation: the same idea, implemented three times. We'll walk through the GCP version in detail because it's representative, then show how AWS and Azure do the identical dance.

The traditional approach (don't do this)

Most CI/CD pipelines deploy by storing a long-lived credential in a secret manager. For GCP, that means a service account key - a JSON file with cryptographic material that proves "I am this service account." Workflow runs read the JSON from secrets, presents it to GCP, and uses the resulting access. This works. It's also the source of countless real breaches, because:

What we do instead: Workload Identity Federation

Workload Identity Federation (WIF) replaces "stored credential" with "prove who you are at the moment you ask for access."

WIF token-exchange sequence diagram A sequence diagram showing five steps for the GitHub Actions runner to obtain a 1-hour Google Cloud access token without any stored credential: GitHub mints an OIDC token, the runner presents it to GCP STS, STS verifies it against a policy that pins the repo, STS returns a short-lived access token, the runner uses it to deploy. GitHub Actions runner (the workflow) GitHub Identity OIDC issuer Google Cloud STS Security Token Service GCP resources Cloud Run, AR, etc. 1 "give me an OIDC token for this run" permissions: id-token: write 2 signed JWT with claims: repository, ref, workflow, run_id 3 "exchange this OIDC token for a GCP access token" workloadidentity.exchangeToken policy check assertion.repository == 'CloudSecurityOfficeHours/csoh.org' 4 1-hour GCP access token, scoped to impersonate csoh-deployer service account 5 push image, deploy revision (token expires in 1 hour) no stored credential anywhere in this flow
How a GitHub Actions run gets a Google Cloud access token without any stored password. The whole exchange takes a few hundred milliseconds at the start of every workflow run; the resulting token expires after one hour.

Walking the diagram step-by-step:

  1. Our GitHub Actions workflow runs. As part of starting up, GitHub mints a short-lived OIDC token for that specific workflow run. The token is signed by GitHub's identity service and includes verifiable claims: this is repo CloudSecurityOfficeHours/csoh.org, on branch main, in workflow deploy.yml, workflow run #12345.
  2. The workflow hands that token to Google Cloud's STS (Security Token Service), saying "exchange this for an access token, please."
  3. Google Cloud's STS checks: do I trust GitHub's identity service as a token issuer? Yes (we configured it to). And does this token's repository claim match the policy I've set? Our policy says it must equal exactly CloudSecurityOfficeHours/csoh.org. Yes.
  4. STS returns a 1-hour Google Cloud access token, scoped to impersonating one specific service account - csoh-deployer, our deploy-only identity.
  5. The workflow uses that token to push containers and deploy revisions. After 1 hour, the token expires.

Crucially: there is no JSON key anywhere in this flow. There's nothing for a leaked GitHub secret to reveal - the deploy auth is created on-demand, scoped to one workflow run, and discarded. If a workflow log somehow leaked, an attacker would get an access token that's valid for at most one hour, scoped to "deploy to this one project," and they'd have to use it before it expired. There's nothing to rotate, because there's nothing stored.

The Terraform that wires this up is in gcp/wif.tf - about 30 lines.

The same pattern, on AWS and Azure

The exchange above isn't a GCP feature - it's the OIDC federation standard, and every major cloud speaks it. So the AWS and Azure publish jobs do the identical dance, just with each cloud's nouns:

Three clouds, three short-lived tokens minted on demand, zero stored credentials. A leaked GitHub secrets store would reveal nothing useful, because the deploy auth for every cloud is created per-run and discarded.

The deploy identities have narrow permissions

Each cloud's deploy identity can do exactly what it needs to publish, and nothing else. The GCP csoh-deployer service account can do exactly three things:

It can't read other GCP projects, disable logging, or escalate to admin. The AWS role and Azure principal are scoped just as tightly: write-one-bucket-and-invalidate-one-distribution, and write-one-storage-account, respectively.

And remember from Layer 4: the GCP runtime identity (csoh-run-runtime) the deployer sets on the running container has zero permissions. So even an attacker who compromises the deploy identity AND uses it to ship a malicious container to production… ends up with a malicious container that has no cloud access. The blast radius is bounded at every layer.

Layer 7 - Protecting the deploy pipeline itself

What it stops: a malicious or accidental change to the workflow that does the deploying.

MITRE ATT&CK mitigated: T1195.001 (Compromise Software Dependencies and Development Tools), T1199 (Trusted Relationship), T1078 (Valid Accounts).

The deploy workflow file can ship code to three production clouds. That makes the file itself as sensitive as a production secret - anyone who can change it can change what gets shipped, everywhere. We protect it with multiple layers on the GitHub side:

Walking the workflow (deploy.yml) - it builds once, then fans out to three publish jobs in parallel:

  1. Triggers on pushes to main matching specific paths (HTML, CSS, JS, Dockerfile, nginx.conf, the staging script, the workflow file itself), plus workflow_dispatch for manual runs.
  2. Concurrency group deploy with cancel-in-progress: true - if a newer commit lands while an older deploy is mid-flight, the older one is cancelled. Prevents the stale-content race where an older deploy publishes after a newer one finishes.
  3. Permissions block scopes the auto-injected GITHUB_TOKEN to contents: read + id-token: write (the latter is what lets each cloud's OIDC exchange happen). Nothing else.
  4. build job - regenerates the search index, runs stage_site.sh to produce the public dist/ folder, and uploads it as an artifact. One build, so all three origins serve byte-identical content.
  5. publish-aws - assumes the IAM role via OIDC, aws s3 sync --delete of dist/ to the bucket, then a CloudFront invalidation.
  6. publish-azure - logs in via the Entra federated credential, az storage blob sync of dist/ into $web (sync handles deletions too).
  7. publish-gcp - the container path: WIF auth, docker build with org.opencontainers.image.* labels, Trivy scan (fails on HIGH/CRITICAL), push to Artifact Registry under a single immutable hash-based tag, then gcloud run deploy. The push is idempotent (it skips if the tag already exists) so a rerun doesn't trip over immutable_tags=true. There's no cache-invalidation step in the deploy - Cloudflare caches at the edge and is purged separately.
  8. Every job declares environment: production, so a fork PR can't run any of them even if it could mint an OIDC token - protected-environment rules only apply on main.

Layer 8 - Logging and what we'd see during an attack

What it stops: nothing directly - but it's how we'd notice if any of the layers above failed.

MITRE ATT&CK detection coverage: T1190 (Exploit Public-Facing Application, via WAF block logs), T1098 (Account Manipulation, via IAM change logs), T1078.004 (Valid Accounts: Cloud Accounts, via the audit log stream).

Even with everything above, you should assume something will eventually go wrong. A vulnerability in nginx, a leaked credential we didn't anticipate, a configuration drift no one caught - there's always a possibility. Logging is what turns "an attack happened" into "an attack happened, here's exactly when, here's exactly what they did, and here's what we need to fix." Without logs, you have no idea.

With three clouds, logging lives in two places. The edge - where total traffic is visible - is Cloudflare's zone analytics and Load Balancer health dashboards: requests, cache hit ratio, WAF blocks, and which origins are healthy. The origins log only what got past Cloudflare's cache and actually reached them. The GCP origin keeps the deepest forensic trail, because that's where the IAM and audit story lives.

By default, Google Cloud Logging keeps logs for 30 days. That's not enough for security work - supply-chain attacks specifically are often discovered months after the fact, and the logs you'd need for forensics are gone. We define a custom 400-day retention bucket and a log sink that routes the security-relevant events into it (see gcp/logging.tf). The filter captures three categories:

(resource.type="cloud_run_revision" AND httpRequest.status>=400)
OR protoPayload.serviceName="iam.googleapis.com"
OR protoPayload.@type="type.googleapis.com/google.cloud.audit.AuditLog"
  1. Every 4xx and 5xx from the Cloud Run origin. Tells us about errors and probes that reached this origin (rather than being served from cache or handled at the edge). Helpful for both performance triage and abuse detection.
  2. Every IAM change. If anyone modifies a permission, grants a role, creates a service account - we have it. The single highest-leverage admin event in any cloud project; you almost always want to know about it before the audit happens.
  3. The full audit log stream. Every API call against this project, who made it, when, with what outcome.

WAF blocks and per-request edge logs live in Cloudflare, at the layer that does the blocking. (Cloudflare's free tier keeps less log history than a paid plan or our GCP sink would, which is part of the trade noted in "What we didn't do.")

What we don't have (yet): a SIEM, real-time alerting, or anomaly detection. For a community site the cost/benefit doesn't justify it; for a production SaaS workload, you'd want this same sink plus an export to BigQuery for long-term analytics or Pub/Sub for streaming detection. We also keep a Cloud Monitoring dashboard for the GCP origin's day-to-day metrics - request rate, latency percentiles, instance count (defined in gcp/monitoring.tf, "csoh.org Origin" in the GCP console) - and watch Cloudflare's analytics for the whole-site view.

What we didn't do (and why)

Listing controls we considered and rejected is more honest than pretending the design is finished. Each of these is a defensible choice for a small static site and a less-defensible choice as the threat surface grows. If you're copying this design for something bigger, this list is your homework.

What this costs to run

Approximate monthly bill at our traffic level (low - we're a community site):

ComponentApprox / month
Cloudflare Load Balancing add-on (Free plan + LB)~$5-7
AWS S3 + CloudFront (free-tier egress)~$0-1
GCP Cloud Run (scale-to-zero) + Artifact Registry~$0-1
Azure Blob static website~$0-1
Terraform state (GCS) + GCP logging< $2
Total~$8-12/mo

A few honest notes on cost:

If you want to copy this for your own site

This isn't a step-by-step tutorial - it's a checklist. Each item links to the actual file we use, so you can read the working version. If you want to copy this for your own static site, the path is roughly:

  1. Pick how many clouds you actually want. The design degrades gracefully: one origin is a normal "static site behind Cloudflare," two gives you failover, three is what we run. Start with one and add origins later - the Cloudflare pool just grows.
  2. Stand up each origin from its Terraform directory. infra/terraform/ has one dir per cloud (aws/, gcp/, azure/, cloudflare/). Each is self-contained: terraform apply in aws/ builds the private bucket + CloudFront + the OIDC role, and so on. See infra/README.md for the per-cloud bootstrap.
  3. Wire the origins into the Cloudflare dir. Feed each origin's hostname (from its terraform output) into cloudflare/ and apply - that creates the Load Balancer, pool, health monitor, security-header + redirect + cache rules.
  4. Copy .github/workflows/deploy.yml and tools/stage_site.sh, and set the per-cloud resource IDs as repo Variables (the README lists exactly which terraform output feeds each one). Every cloud authenticates keyless via OIDC - no secrets to paste.
  5. On GitHub: create a production environment scoped to main; create a CODEOWNERS file requiring your review on workflow + infra paths; turn on branch protection, secret scanning, and Dependabot.
  6. Cut over safely. Verify each origin directly, add them to the Cloudflare LB with the old origin kept as a fallback, then flip DNS - the README has the staged cutover + rollback runbook.

Realistic time investment if you've never used Terraform before: a weekend to read everything carefully and a day to stand up all three clouds. If you only want one origin to start: an evening. If you hit a wall, bring it to Friday Zoom.

Further reading

Questions?

Bring them to Friday Zoom. Several of our regulars run nontrivial GCP setups (multi-project orgs, Binary Authorization in production, signed build artifacts) and are happy to walk through specifics for your environment. The meeting recaps often include cloud-deployment war stories.