The 30-second version: A CI/CD pipeline takes a developer's commit and turns it into running cloud infrastructure - automatically, repeatably, and with a security boundary at every step. CI (continuous integration) builds and tests the change; CD (continuous delivery / deployment) ships it to a cloud account.
The modern shape: a git push triggers a pipeline runner (GitHub Actions, GitLab CI, Jenkins, CodePipeline, Cloud Build, Azure DevOps), which builds an artifact, runs tests and security scans, then assumes a short-lived cloud role via OIDC federation - no long-lived access keys anywhere - and applies the change. Every step is in version control. Every step is auditable. The pipeline is the only identity allowed to change production.
On this page
- What CI/CD actually is
- Why it matters for cloud deployments
- The anatomy of a modern pipeline
- OIDC federation - replacing long-lived keys
- Secrets in pipelines
- AWS, Azure, and GCP side-by-side
- AWS - CodePipeline, OIDC, and the IAM model
- Azure - Azure DevOps, GitHub Actions, Workload Identity
- GCP - Cloud Build and Workload Identity Federation
- Deployment strategies - blue/green, canary, progressive
- Securing the pipeline itself
- Infrastructure-as-code in the pipeline
- Bootstrapping path
- Common pitfalls
- Further reading
- FAQ
What CI/CD actually is
Strip the marketing off and CI/CD is two related practices, glued together by an automated pipeline.
Continuous Integration (CI) is the practice of merging code changes into a shared branch frequently - many times per day per developer - and having an automated build and test run on every merge. The point is that problems surface in minutes, not at the end of a multi-week integration phase. CI is what makes "main is always green" a realistic goal.
Continuous Delivery (CD) extends the pipeline so every passing build is automatically packaged into a deployable artifact and shipped to a staging environment. A human still clicks "release to production." Continuous Deployment goes one step further: every passing build is automatically released to production, with the pipeline's automated checks and progressive rollout as the safety net.
In practice, the acronym "CD" stretches over both. What matters is the pipeline shape: commit โ build โ test โ scan โ deploy, automated end-to-end, with the cloud-side credentials short-lived and tightly scoped.
None of this is cloud-specific in principle. It became essential for cloud because cloud infrastructure is itself code - every account, every network, every IAM role - and the pipeline is the only sustainable way to keep that code in sync with reality across dozens of environments.
Why it matters for cloud deployments
Two teams running the same workload in the cloud. Both deploy to AWS, Azure, or GCP.
Team A deploys from a developer laptop. The laptop has a long-lived access key with broad permissions. The Terraform state lives on someone's hard drive. When the original engineer goes on vacation, nobody knows what version is in prod or how to roll it back. When a credential leaks to GitHub, the blast radius is "everything that engineer had access to" - usually most of the account.
Team B deploys from a pipeline. The pipeline runs in CI, authenticates to the cloud via OIDC for a session that lasts 15 minutes, applies a Terraform plan reviewed in a pull request, and writes the state to a remote backend with locking. Every deploy is in git history. Rolling back is reverting a commit. When a developer's laptop is compromised, the attacker gets nothing useful for production - the developer cannot deploy directly.
The CI/CD pipeline is the difference between those two paths. It's the operational expression of three security principles at once:
- Least privilege - the pipeline's role has exactly the permissions needed for one workload's deploy, and exists only for the seconds it runs.
- Separation of duties - humans review code; the pipeline executes it. No human has standing write access to production.
- Auditability - every production change has a commit, a reviewer, and a pipeline run id. The audit answer is "look at git."
The pipeline also pays back hard on day 200, when an org has 50 workloads in 50 accounts. Without it, every workload's deploy is bespoke. With it, every workload's deploy follows the same shape - scan, plan, apply, verify - and a single change to the template improves all 50.
The anatomy of a modern pipeline
Every credible cloud-deploy pipeline has the same eight stages. The runner differs; the shape doesn't.
1. Source & trigger
A git event (push, merge to main, tag, manual dispatch) kicks off the pipeline. The commit SHA is the unit of work - every later stage refers back to it.
2. Build
Compile source. Package into the deploy artifact - a container image, a zip, a Terraform plan, a Helm chart. Pin every dependency version. Cache aggressively.
3. Test
Unit tests, integration tests, contract tests. Fast feedback first; slower end-to-end tests last. Fail closed - a flaky test means the build fails, not the test getting ignored.
4. Scan
SAST on source, SCA on dependencies, secret scanning on the diff, container scanning on the image, IaC scanning on the Terraform / Bicep / CloudFormation. SBOM generation.
5. Sign & attest
Sign the artifact (Cosign / Sigstore). Attach provenance attestations (SLSA) so the deploy target can verify what built it.
6. Authenticate to cloud
OIDC token exchange for short-lived credentials. Scoped to one repo, one branch, one role. No long-lived access keys stored in CI.
7. Deploy
Terraform apply, helm upgrade, gcloud run deploy, az deployment, kubectl apply, ECS / Lambda / Functions update. Progressive rollout where possible (canary, blue/green).
8. Verify & observe
Smoke tests against the deployed environment. Watch error rates, latency, and security findings for a defined window. Auto-rollback on regression.
A pipeline missing any of these stages isn't necessarily wrong - small projects can collapse stages 3, 4, and 5 into one job - but it should be a deliberate choice, not an omission. The gaps are where breaches live.
OIDC federation - replacing long-lived keys
If there's one thing to take from this page, it's this: do not store long-lived cloud access keys in your CI system. Use OIDC federation instead.
The pattern works the same way on every cloud:
- Your CI runner (GitHub Actions, GitLab, CircleCI, Buildkite, Jenkins with a plugin) is configured as an OIDC identity provider. Every workflow run gets a unique, short-lived signed JWT describing which repo, which branch, which workflow, which run, which actor.
- The cloud side trusts that issuer for specific subject claims. AWS IAM, Azure Entra, and Google Cloud all support this out of the box.
- When the pipeline needs to deploy, it presents its OIDC token. The cloud verifies the issuer's signature, checks the subject matches a configured trust, and returns short-lived credentials (15 minutes to 1 hour) scoped to a specific role.
- The pipeline uses those credentials, then they expire.
Why this matters:
- No secrets to rotate - there's nothing in CI to rotate. The trust is configured once, in the cloud.
- No secrets to leak - there are no long-lived keys to exfiltrate from a compromised workflow or repo. The OIDC token expires in minutes and is bound to one workflow run.
- Tight scoping - trust can be scoped to
repo:myorg/myrepo:ref:refs/heads/main. A workflow from a fork or feature branch cannot assume the production role. - Audit clarity - every cloud API call carries the OIDC subject in CloudTrail / Azure activity log / Cloud Audit Logs. You can trace every production change back to a specific workflow run.
Long-lived access keys in CI are now considered a legacy pattern - virtually every leak postmortem from the last several years (Travis CI 2022, CircleCI 2023, dependency confusion incidents) features one. OIDC removes the secret entirely.
Trust scoping is the security boundary. When you configure OIDC on the cloud side, the subject claim is what stands between your production role and any random pull request. Scope to ref:refs/heads/main at minimum. Scope to specific environment: claims if your CI supports them. Never accept a wildcard subject - that's equivalent to leaving the role open to every workflow run in your repo, including untrusted fork PRs.
Secrets in pipelines - the hierarchy
OIDC (the previous section) handles authenticating the pipeline to the cloud. Most real pipelines also need other secrets: third-party SaaS API tokens, signing keys, database credentials for migration steps, webhook secrets, npm/PyPI publish tokens. The principle: in order of preference, prefer the option that leaves no static secret in CI at all.
- OIDC / workload identity to the destination. If the secret is for a cloud service, federate directly - don't store it. Most cloud APIs and a growing number of SaaS providers (npm, PyPI, Docker Hub via trusted publishing) now accept OIDC tokens directly.
- Vault retrieval at job-start using OIDC. Authenticate to Vault / AWS Secrets Manager / Azure Key Vault / GCP Secret Manager using the pipeline's OIDC identity, fetch the secret, hold it in memory for the job, never persist. The secret rotates in the vault; the pipeline picks up the new value on the next run.
- CI-native secret store, scoped tight. When neither of the above is possible, use the CI provider's secret store with the narrowest scope it supports (environment-scoped, not repo-wide; required approval on the environment).
- Static value in plaintext anywhere. Never. Workflow files, comments, debug logs, Slack DMs to the on-call engineer - this is how every credential-leak postmortem begins.
GitHub Actions
- Repository secrets - flat, available to every workflow in the repo. Lowest control. Use sparingly.
- Environment secrets - scoped to a named environment (e.g.,
production) that can require reviewers before a job runs, restrict deployment branches, and add a wait timer. The right place for production database credentials, signing keys, paid-API tokens. Environments docs. - Organization secrets - shared across repos with an allowlist of which repos can see them. Good for SaaS tokens used across many services; still better to migrate to OIDC where supported.
- OIDC + vault -
hashicorp/vault-actionwithmethod: jwtuses the workflow's GitHub OIDC token to authenticate;aws-actions/aws-secretsmanager-get-secretsafteraws-actions/configure-aws-credentialsfetches secrets via the assumed role. - Push protection - GitHub secret-scanning push protection blocks commits containing detected secrets at
git push. Enable org-wide. - Tradeoffs - repository secret = convenient + broad blast radius; environment secret = approval gate + harder to misuse; OIDC = no blast radius but requires the target system to speak OIDC.
GitLab CI
- Masked variables - GitLab redacts the value in job logs. Required for any sensitive var.
- File-type variables - the value is materialized as a file at a path in
$variable_name, useful for service-account JSON and certificate bundles that should never appear in env-var output. - Protected variables - only available to jobs running on protected branches/tags. The GitLab equivalent of GitHub's environment scoping.
id_tokens- per-job OIDC tokens with configurable audience claims. Works against AWS / Azure / GCP / Vault.- Vault integration - first-class
secrets:keyword in.gitlab-ci.ymlthat retrieves at job-start via OIDC. - Secret Detection - GitLab Ultimate ships built-in secret-detection scanners; community equivalents (Gitleaks, TruffleHog) wire in as pre-merge jobs.
Azure DevOps
- Variable groups linked to Azure Key Vault - the pipeline reads variable values from Key Vault at runtime; the pipeline's service connection identity authenticates. Combined with a managed-identity-backed service connection, no static secret exists in Azure DevOps.
- Secret variables - marked-as-secret in pipeline or variable group, redacted in logs. Use sparingly; prefer Key Vault.
- Service connections with workload identity federation - Azure DevOps WIF service connections replace the older SPN-with-secret pattern entirely.
CircleCI, Buildkite, Jenkins, Argo
- CircleCI contexts - shared secret groups with restrictions on which projects can use them; OIDC tokens available per-job for cloud auth.
- Buildkite - OIDC tokens per job; secrets typically retrieved via the AWS/GCP/Vault SDK using the OIDC-derived identity. Buildkite doesn't store secrets natively, which is deliberate.
- Jenkins - Credentials Binding plugin, HashiCorp Vault plugin, AWS Secrets Manager Credentials Provider. Self-hosted Jenkins controllers are themselves high-value targets - the controller's secrets are the most-stolen credentials in Jenkins history.
- Argo Workflows / Argo CD - reference Kubernetes Secret objects sourced from External Secrets Operator or Vault Agent Injector. See Kubernetes - Cluster secrets.
Self-hosted runner hygiene
Self-hosted runners are the most-abused secret-exposure surface in CI. The compromise pattern is consistent: a runner persists between jobs, a malicious PR runs cat ~/.aws/credentials or scans environment variables, exfiltrates whatever the previous job left behind, and the attacker reuses it indefinitely.
- Ephemeral runners only - one job per VM, destroyed after. Actions Runner Controller on Kubernetes, Philips' AWS runner module, GitLab Kubernetes executor all do this.
- Block fork PRs from privileged jobs. A fork's
pull_requestworkflow must not have access to repo secrets or production OIDC. Usepull_request_targetonly with extreme care (and never to run untrusted code). - Network egress controls - the runner should not be able to reach arbitrary internet destinations during a build. Allowlist your registry, package mirrors, and cloud APIs; deny the rest. StepSecurity Harden-Runner makes this enforceable on GitHub-hosted runners.
Secret-scanning gates
Pre-commit and CI scanners catch the slip-ups the workflow above should have prevented. They are the safety net, not the strategy - if they're catching things regularly, the upstream needs fixing.
- Pre-commit - TruffleHog, Gitleaks, detect-secrets as pre-commit hooks. Cheapest place to catch a secret; runs on the developer's machine before commit.
- Push protection - GitHub push protection, GitLab Secret Detection with pre-receive enforcement. Server-side block at
git push. - CI scan - run the same tool as a required CI check on every PR, blocking merge on findings.
- Historical sweep - quarterly TruffleHog scan of full git history across every repo, including private. Old secrets in old commits are still findable.
- Auto-revoke - some providers (GitHub Advanced Security, AWS, Stripe, Slack) will auto-revoke a leaked token they detect in a public repo. Worth knowing what's automatic vs. what requires you to act.
For the broader identity story underlying all of the above, see IAM - Secrets in developer workflows. For the underlying secrets-manager capabilities, see Data Security & KMS - Secrets management.
AWS, Azure, and GCP side-by-side
Each cloud ships a native CI/CD toolchain. Most teams use a hybrid - a third-party CI runner (GitHub Actions, GitLab) for build/test, and the cloud's native primitives for the deploy and runtime side. The capabilities map nearly one-to-one:
| Building block | AWS | Azure | GCP |
|---|---|---|---|
| Native CI/CD service | CodePipeline + CodeBuild + CodeDeploy | Azure DevOps Pipelines | Cloud Build + Cloud Deploy |
| Source hosting | CodeCommit (deprecated for new) - usually GitHub | Azure Repos - usually GitHub | Cloud Source Repositories - usually GitHub |
| Artifact registry | ECR, CodeArtifact, S3 | Azure Container Registry, Artifacts | Artifact Registry |
| OIDC federation | IAM OIDC provider + role with trust on token.actions.githubusercontent.com |
Workload identity federation in Entra ID app registration | Workload Identity Federation pool + provider |
| Container deploy target | ECS, EKS, App Runner, Lambda | AKS, Container Apps, App Service, Functions | GKE, Cloud Run, Cloud Functions |
| Progressive delivery | CodeDeploy (canary, linear, all-at-once) | Deployment strategies in Pipelines + Container Apps revisions | Cloud Deploy phased rollouts + Cloud Run traffic splits |
| IaC native | CloudFormation, CDK | ARM, Bicep | Deployment Manager (deprecated), Config Controller |
| IaC vendor-neutral | Terraform / OpenTofu, Pulumi | Terraform / OpenTofu, Pulumi | Terraform / OpenTofu, Pulumi |
| Secret store | Secrets Manager, SSM Parameter Store | Key Vault | Secret Manager |
| GitOps option | Argo CD / Flux on EKS, AWS Proton | Argo CD / Flux on AKS, GitOps for AKS (Flux extension) | Argo CD / Flux on GKE, Config Sync |
Anyone who's built a CI/CD pipeline on one cloud has 80% of the conceptual model needed to build one on another. The remaining 20% is the IAM/Entra/Workload Identity specifics and the deploy primitives of each platform.
AWS - CodePipeline, OIDC, and the IAM model
AWS gives you two real choices: build your pipeline entirely inside AWS (CodePipeline / CodeBuild / CodeDeploy), or run the CI in GitHub Actions / GitLab and call out to AWS for the deploy. The second pattern is more common in practice - most teams already live in GitHub.
The OIDC setup
Create an IAM OIDC identity provider for token.actions.githubusercontent.com in your AWS account. Create an IAM role with a trust policy that requires the OIDC subject claim to match your repo and branch - e.g. repo:myorg/myrepo:ref:refs/heads/main. In your GitHub Actions workflow, use aws-actions/configure-aws-credentials with role-to-assume - no aws-access-key-id, no aws-secret-access-key.
The key AWS-specific decisions
- One deploy role per environment, not one per pipeline. Production gets a role; staging gets a role; sandbox gets a role. The trust policy on the prod role only matches the main branch.
- Permission boundary on the deploy role. The role can deploy your workload but can't escalate - it cannot create new IAM users, cannot change its own trust policy, cannot disable CloudTrail.
- CodeDeploy for canary/linear deployments on Lambda and ECS - pre-built rollout patterns with automated rollback on CloudWatch alarms.
- ECR + image scanning enhanced - Amazon Inspector continuous scanning on every push.
- State backend in S3 + DynamoDB for Terraform - versioned bucket, lock table, restricted to the deploy role.
For the in-AWS-only path: CodePipeline orchestrates, CodeBuild runs the build, CodeDeploy handles progressive rollout. It's heavier to set up than GitHub Actions + OIDC but keeps the entire pipeline inside the AWS account boundary - useful for FedRAMP / regulated workloads.
Azure - Azure DevOps, GitHub Actions, Workload Identity
Azure customers split roughly between Azure DevOps Pipelines (Microsoft's hosted CI/CD, still widely used) and GitHub Actions (also Microsoft-owned, now the default for new projects). Both authenticate to Azure the same way: workload identity federation.
The OIDC setup
Create an app registration in Entra ID. Add a federated identity credential that trusts your CI's OIDC issuer - https://token.actions.githubusercontent.com for GitHub Actions, or the Azure DevOps issuer URL. Assign the app the minimum RBAC role on the target subscription or resource group. In the pipeline, use Azure/login with client-id, tenant-id, subscription-id - no client-secret.
The key Azure-specific decisions
- Service principal vs managed identity. For CI running outside Azure, use a service principal (app registration) with federated credentials. For CI running inside Azure (a self-hosted runner on a VM), prefer a managed identity - no federation needed.
- Scope RBAC to the resource group, not the subscription. The deploy identity should be Contributor on the workload's resource group, not the whole sub.
- Azure Pipelines environments with approval gates - production deploys require a named approver, enforced by the pipeline.
- Container Apps revisions for blue/green-style cutovers - multiple revisions live simultaneously with traffic splits between them.
- Bicep is the modern path; ARM templates work but are verbose. Terraform with the AzureRM provider is the vendor-neutral option used by most multi-cloud orgs.
Workload identity federation on Azure is the same security shape as OIDC on AWS - short-lived tokens, no client secrets, scoped to a specific repo and branch. The configuration is in Entra rather than IAM, but the threat model is identical.
GCP - Cloud Build and Workload Identity Federation
Google ships Cloud Build (CI), Cloud Deploy (continuous delivery for GKE and Cloud Run), and Artifact Registry (artifact storage). They integrate cleanly, and Cloud Build runs as a Google-managed service account in your project - so for fully-in-GCP pipelines, no federation is needed.
For GitHub Actions deploying to GCP, the pattern is Workload Identity Federation.
The Workload Identity Federation setup
Create a workload identity pool in your GCP project. Add a provider for token.actions.githubusercontent.com with attribute mappings that pull the GitHub subject into a Google identity. Grant the IAM service account permission to be impersonated by the pool subject, scoped to your repo. In GitHub Actions, use google-github-actions/auth with workload_identity_provider and service_account - no JSON service-account keys.
The key GCP-specific decisions
- Never download a service account key - they're effectively long-lived secrets. WIF or attached service accounts only. Org Policy
iam.disableServiceAccountKeyCreationenforces this org-wide. - Binary Authorization on GKE and Cloud Run - only deploy images that have been signed by your pipeline. Attestor-based admission rejects unsigned images at deploy time.
- Cloud Run revisions with traffic splits - deploy a new revision, send 1% of traffic to it, ramp up. Rollback is one
gcloud run services update-traffic. - Cloud Deploy delivery pipelines for promotion through dev โ staging โ prod with approval gates between stages.
- Artifact Registry vulnerability scanning on every push, integrated with Security Command Center.
The Cloud Build + Cloud Deploy + Artifact Registry trio is the most cohesive native CI/CD stack of the three big clouds - for a workload that lives entirely on GCP and uses containerized deploys, it's the path of least resistance.
Deployment strategies - blue/green, canary, progressive
"Click deploy and hope" is a deployment strategy. It's just a bad one. The cloud's value here is that better strategies are cheap - you have elastic capacity, traffic-shifting load balancers, and managed control planes that make non-trivial rollouts safe by default.
All-at-once
Replace every instance simultaneously. Fast, simple, no extra capacity. Outage during the swap if anything fails. Defensible only for stateless workloads with short startup time, in non-prod, or behind a feature flag.
Rolling
Replace instances one batch at a time (10% โ 25% โ 50% โ 100%). No extra capacity, no outage during deploy, but rollback is slow because you're rolling backward through the same process. The default for Kubernetes deployments.
Blue/green
Two complete environments. Deploy to the idle one (green), smoke-test it, flip 100% of traffic over. Rollback is instant - flip back to blue. Doubles infrastructure during the deploy window. Best fit for stateful workloads where partial rollouts are risky.
Canary
New version runs alongside old. Send 1% of traffic to it; watch error rates and latency. Ramp to 5%, 25%, 100% over minutes or hours. Auto-rollback on metric regression. Needs reliable signals - a flaky SLO means false rollbacks. The safest strategy when you have the observability to support it.
Native support varies by deploy target:
- Lambda / Cloud Functions / Azure Functions - version aliases with traffic weights; CodeDeploy or built-in traffic split.
- ECS / Fargate / Cloud Run / Container Apps - multiple revisions live concurrently; traffic split by weight per revision.
- Kubernetes (EKS / AKS / GKE) - rolling by default; canary via Argo Rollouts, Flagger, or service-mesh weights (Istio, Linkerd).
- VMs / instance groups - managed instance groups, ASGs with canary deployment configurations.
Pick the simplest strategy that meets the workload's blast radius. A static site can do all-at-once. A payments service should not.
Securing the pipeline itself
The pipeline is now the soft underbelly of cloud security. Its credentials are the keys to production; its source artifacts are what runs in prod; its build environment is where supply-chain attacks land. Treat the pipeline like a privileged production system, because it is one.
- OIDC federation for cloud auth. Already covered, but worth repeating. No long-lived access keys or service-account keys stored in CI.
- Branch protection on the trigger branch. The branch that the deploy role trusts must require pull requests, code review, and passing checks. A pipeline trusts the main branch - if anyone can push to main, the trust is meaningless.
- Pin actions and runners by SHA, not tag. Tags are mutable. A third-party action pinned to
@v1can be replaced under your feet; pinned to a commit SHA, it cannot. Dependabot updates the pin atomically. - Restrict who can modify workflows. Workflow files are code that runs with deploy credentials. CODEOWNERS the
.github/workflows/directory; require security-team review for changes. - Ephemeral, isolated runners. Self-hosted runners should be ephemeral (one job per VM, destroyed after) and network-isolated from any environment they don't deploy to. The 2022 SolarWinds-style attacks rely on long-lived shared runners.
- Secrets scoped to environments. If you must store secrets in CI (API tokens for third-party SaaS, for example), scope them to environments that require approval to access. GitHub Actions environments, Azure Pipelines environments, and GitLab protected environments all do this.
- Sign and verify artifacts. Build artifacts get signed (Cosign, Sigstore); deploy targets verify the signature before accepting (Binary Authorization on GCP, Kyverno / Cosign-policy-controller on Kubernetes).
- SBOM and provenance. Generate an SBOM for every build; emit SLSA provenance attestations. When a CVE drops, the answer to "which prod systems are affected" is a query, not a meeting.
- Audit pipeline runs the same as cloud activity. Pipeline logs flow to the same SIEM as CloudTrail / Azure activity / Cloud Audit Logs. The pipeline is part of the security perimeter.
Infrastructure-as-code in the pipeline
If the application code goes through the pipeline, the infrastructure code should too. The same review, scan, and deploy controls that protect your app should protect the network, IAM, and data stores it runs on.
The typical IaC pipeline shape:
- Format & validate -
terraform fmt -check,terraform validate. Trivially cheap, catches typos before review. - Policy scan - Checkov, tfsec, Trivy, Open Policy Agent / Conftest, or your CSPM/CNAPP's IaC scanner. Blocks merges that violate guardrails (public buckets, unencrypted disks, IAM wildcards).
- Plan on pull request -
terraform planoutput posted as a PR comment. Reviewers see exactly what will change. - Apply on merge to main - pipeline assumes deploy role via OIDC, runs
terraform apply, writes state to the remote backend. State backend access is restricted to the deploy role. - Drift detection on a schedule - periodic plan-only runs alert when reality has drifted from code (someone clicked something in the console). Drift is the canary for broken IaC discipline.
Terraform / OpenTofu is the dominant vendor-neutral choice. Pulumi is the same shape with familiar languages. Each cloud's native IaC (CloudFormation/CDK, Bicep, Config Controller) is a credible single-cloud alternative - pick what your team already operates.
For Kubernetes-targeted infrastructure, GitOps controllers (Argo CD, Flux) push this pattern further: the pipeline updates a Git repo with the desired state; the in-cluster controller pulls and reconciles. The cluster itself trusts only the Git repo and the controller's identity, not the CI runner.
Bootstrapping path
For a team standing up CI/CD from scratch into a cloud, a sane order of operations:
- Pick your CI runner first. GitHub Actions is the safe default - broad ecosystem, OIDC support on all three clouds, generous free tier. Azure DevOps, GitLab CI, CircleCI, Buildkite all work; the choice is mostly about where your team already lives.
- Set up OIDC federation before your first deploy. Configure the cloud-side identity provider and a deploy role with narrow permissions. Verify the trust by running a no-op assume-role from a workflow. Never ship the first version with long-lived keys "just to get it working" - those keys live forever afterward.
- Build a "hello world" pipeline that deploys one trivial resource. A single S3 bucket, a single resource group, a single Cloud Run service. The point is to validate the wiring end-to-end before you have anything to break.
- Add the security stages. Secret scanning on every PR. IaC scanning. Container scanning. Wire failures so they block the merge.
- Add tests. Unit tests in the CI side; smoke tests in CD. End-to-end tests on the deployed environment.
- Split prod from non-prod environments. Different deploy roles. Different trust policies. Different state backends. Production trust scoped to the main branch only; non-prod can be looser.
- Add a deploy strategy beyond all-at-once. Rolling or canary, depending on the workload. Wire an automated rollback signal - error rate threshold, healthcheck failure, alarm trip.
- Wire pipeline observability. Slack/Teams notifications on failure. Pipeline run audit log into your SIEM. Metrics on deploy frequency, lead time, change failure rate, MTTR (the DORA metrics).
Realistic time investment for a small team: a couple of weeks to a minimally-credible production pipeline; a quarter or two to one that you'd be comfortable scaling to ten more workloads on.
Common pitfalls
- Long-lived cloud keys "just for now." They live forever. Configure OIDC before the first deploy.
- One pipeline role used everywhere. The same role deploys staging and production, so a compromised staging deploy can write to prod. One role per environment, scoped trust per branch.
- Over-permissioned deploy roles. The deploy role has
*:*on the account "to be safe." It can now create new IAM users, disable logging, exfiltrate the entire account. Permission boundaries and explicit denies should make this impossible. - No verification stage. Pipeline reports "deploy succeeded" the moment the apply finishes - but nothing checks that the service actually serves traffic. Add smoke tests; auto-rollback on failure.
- Mutable artifact tags. Deploying
image:latestmeans the deploy is non-reproducible - re-running the pipeline ships a different artifact. Always deploy a specific SHA or version. - Untrusted third-party actions / plugins. A popular GitHub Action pulled in by tag can be hijacked. Pin by commit SHA; review the source of every new action you adopt.
- Console writes to production "for an emergency." Once that happens, IaC drifts from reality, and the pipeline can no longer be trusted to converge. Break-glass paths exist, but they should be rare, audited, and followed by a reconciliation commit.
- Skipping drift detection. Without scheduled plan-only runs, drift accumulates silently. By the time you notice, untangling it is a multi-week project.
- Treating CI runners as untrusted infrastructure with trusted credentials. Self-hosted runners need network isolation and ephemeral lifecycle. A persistent self-hosted runner with deploy credentials is one compromised dependency away from being a production attack vector.
Further reading
Cloud-native references
- AWS CodePipeline documentation
- GitHub - OIDC to AWS
- Azure Pipelines documentation
- Entra ID workload identity federation
- Google Cloud Build documentation
- GCP Workload Identity Federation
Supply chain & pipeline security
- SLSA - Supply-chain Levels for Software Artifacts
- Sigstore - signing and provenance
- OWASP Top 10 CI/CD Security Risks
- CISA / NSA - Defending CI/CD Environments (PDF)
Practice & metrics
- DORA - DevOps Research and Assessment metrics
- GitHub Actions - deploying
- Argo CD - GitOps for Kubernetes
- Flux - GitOps for Kubernetes
Related CSOH pages
- How we use GitHub Actions - the concrete pipeline that builds csoh.org.
- how we deploy across AWS, GCP & Azure - the small-scale version of these ideas, applied to csoh.org itself.
- Landing zones & cloud foundations - the account / network / logging baseline a pipeline deploys into.
- Cloud security best practices - the principles (least privilege, defense-in-depth, auditability) the pipeline operationalizes.
- Glossary - every acronym on this page, defined.
FAQ
What is the difference between CI and CD?
Continuous Integration (CI) is the build/test side - merging code changes frequently and verifying each merge automatically. Continuous Delivery (CD) is the deploy side - every passing build is automatically packaged and shippable, with a human gating the final release. Continuous Deployment automates the final step too: every passing build goes to production. "CI/CD" as a phrase covers all of it.
Should I use the cloud's native CI/CD or a third-party tool?
For most teams, GitHub Actions (or your existing CI) for build/test and the cloud's native primitives for deploy targets is the right answer. The cloud-native CI services (CodePipeline, Azure DevOps, Cloud Build) are credible but rarely a reason to leave a tool your team already uses. The exception is regulated environments where keeping the entire pipeline inside the cloud account boundary simplifies compliance.
Do I need progressive delivery for a small project?
Not on day one. A static site or a single-team prototype can do all-at-once deploys safely - the blast radius is small and the rollback is fast. Reach for canary or blue/green when the cost of a bad deploy (real users seeing errors, real money lost) is higher than the cost of running the extra capacity.
How does GitOps fit in?
GitOps is a CD pattern, not a replacement for CI. CI still builds and tests artifacts; CI commits the desired state (image tag, manifest) into a Git repo; an in-cluster controller (Argo CD, Flux) pulls and reconciles. The win is that the target cluster never trusts the CI runner - only itself and the Git repo - so a compromised CI cannot directly push into the cluster.
What about secrets - do I still need a secret store with OIDC?
Yes. OIDC removes cloud credentials from CI, but workloads still need third-party API keys, database passwords, signing keys. Those live in Secrets Manager / Key Vault / Secret Manager and are fetched by the workload at runtime - not stored in the pipeline. CI only ever has the cloud role needed to reference the secret, never the secret itself.
How fast should my pipeline be?
The DORA elite benchmark is lead time under one hour from commit to production. For most cloud workloads, that's achievable: 5-10 minutes of CI, 5-10 minutes of deploy + verify, the rest in queue. If your pipeline is over an hour, the problem is usually serialized stages that could run in parallel, or a verification stage waiting on a flaky external dependency.
Can I do CI/CD without containers?
Yes. The pipeline pattern is independent of the artifact type. Lambda zips, VM images (Packer), unikernels, static site bundles, Terraform plans - all flow through the same eight stages. Containers happen to be the dominant artifact today because they unify build + runtime; the CI/CD shape is older than containers and survives them.
How does this relate to zero trust?
The pipeline operationalizes zero trust at the deploy layer. "Verify explicitly" is OIDC federation - every deploy is a fresh authentication with a fresh subject claim. "Least privilege" is per-environment deploy roles with permission boundaries. "Assume breach" is auditability - every change traces to a commit, a reviewer, and a run id, so a compromise has a finite, observable blast radius.
Where next
- How we use GitHub Actions - the working example: the pipeline that ships this site.
- how we deploy across AWS, GCP & Azure - what lands on the other side of that pipeline.
- Landing zones - the account/network/policy baseline a pipeline deploys into.
- Cloud security best practices - the principles your pipeline is implementing.
- Friday Zoom - supply-chain and CI/CD security come up regularly. Drop in and trade pipeline stories.