Kubernetes & Managed Kubernetes

The orchestrator that runs most cloud-native production workloads - and the single largest source of cloud-security misconfiguration. Vendor-neutral guide to what EKS, AKS, and GKE actually manage, what you still own, and the cluster-specific risks: pod escapes, identity chaining, RBAC sprawl, flat pod networking, and admission control.

A modern server room featuring network equipment with blue illumination
Photo by panumas nikhomkhai on Pexels

· · Vendor-neutral · View source on GitHub

The 30-second version: Kubernetes orchestrates containers across a fleet of machines. Managed Kubernetes (EKS on AWS, AKS on Azure, GKE on GCP) hosts the control plane for you - you still own the workloads, the workload identity, the network policy, the RBAC, and the admission controls. The Kubernetes API surface is the same on every cloud; the configuration surface is enormous; and the default posture is permissive.

The two largest cloud-specific Kubernetes risks: identity chaining (a compromised pod stealing the worker node's cloud role via IMDS) and RBAC sprawl (some operator you installed has cluster-admin and can read every secret). Fix those first, then layer on network policy, admission control, and runtime detection.

On this page

  1. What Kubernetes actually is
  2. Why Kubernetes runs the cloud-native world
  3. Shared responsibility - managed K8s edition
  4. The Kubernetes threat model
  5. Identity chaining - pod to node to cloud
  6. RBAC and the API server
  7. Pod security & escapes
  8. Networking - flat by default
  9. Admission control - the policy chokepoint
  10. Cluster secrets & vault integration
  11. Supply chain - what runs in your cluster
  12. EKS, AKS, and GKE side-by-side
  13. Hardening checklist
  14. Common pitfalls
  15. Further reading
  16. FAQ

What Kubernetes actually is

Kubernetes is a cluster orchestrator. You declare what you want - "run three replicas of this container with these env vars and this much memory, expose port 8080, restart on failure" - and Kubernetes places that across a fleet of worker nodes, restarts containers that crash, redirects traffic when something fails, and rolls out new versions without downtime.

The architecture is split in two:

A pod is the unit of scheduling - one or more containers that share a network namespace and storage, scheduled together on the same node. A namespace is a logical grouping for pods, services, and policies. A service account is the pod's identity inside the cluster; combined with workload identity, it's also the pod's identity to the cloud.

None of this is cloud-specific - Kubernetes runs on bare metal, on your laptop (kind, minikube), on a Raspberry Pi cluster. But the dominant deployment shape today is managed Kubernetes in a cloud account.

Why Kubernetes runs the cloud-native world

Kubernetes hit critical mass in cloud because it solved problems the cloud's first-party services left open:

The security cost: Kubernetes is also the largest source of misconfiguration in cloud environments. Wiz, Snyk, and Datadog all publish State-of-Cloud reports - every year, Kubernetes-specific misconfigurations top the list. The platform is powerful precisely because its configuration surface is huge; that surface is also the attack surface.

Shared responsibility - managed K8s edition

The single most useful diagram for cloud-Kubernetes security is the responsibility split. The cloud takes more on the higher-tier managed modes; the user always owns the workload layer.

Layer Self-managed K8s on VMs Standard managed (EKS / AKS / GKE Standard) Fully-managed (GKE Autopilot / EKS Auto Mode)
Control plane You Cloud Cloud
etcd encryption & backup You Cloud (with user-managed KMS option) Cloud
Worker node OS patching You You (with managed node groups: cloud auto-upgrades but you trigger) Cloud
Cluster networking (CNI) You You (cloud provides default; you can swap) Cloud (Autopilot fixes choices)
Workload identity setup You You (cloud provides plumbing) You
RBAC You You You
Network policy / pod isolation You You You (Autopilot enforces some defaults)
Admission control You You You (Autopilot pre-installs some controls)
Pod / workload security context You You You
Image security You You You
Runtime threat detection You Cloud detection available; you install runtime agents Cloud detection available; you install runtime agents

What this table shows: even on the most managed offering, everything above the node OS is yours. The cloud cannot configure your RBAC; it cannot decide which workloads should talk to which; it cannot decide whether a sidecar's permissions are reasonable. That is the security work of running Kubernetes in the cloud.

The Kubernetes threat model

Almost every Kubernetes-in-cloud breach narrative follows the same arc:

  1. Initial foothold in a pod. Vulnerable web app, leaked credentials, supply chain attack in a base image. Any code-execution primitive will do.
  2. Use the pod's service-account token to talk to the API server. Default token is auto-mounted into every pod. RBAC determines how far this gets - too often, to cluster-admin via some operator that was installed unaudited.
  3. Reach the worker node's IMDS if workload identity isn't configured. Steal the node IAM role. Now you have the node's full cloud permissions - usually broad, because the node needs to pull images, mount volumes, write logs, manage load balancers.
  4. Move laterally to other pods on the same node (shared kernel) or to other pods over the flat pod network. Default-allow networking makes this trivial.
  5. Persist via a workload - deploy a new DaemonSet, create a backdoor service account, modify a ConfigMap. With cluster-admin, persistence is trivial.
  6. Exfiltrate via the cluster's outbound network - typically unrestricted to the public internet.

Every defense on this page targets a step in that arc. Workload identity breaks step 3. Network policy breaks step 4. RBAC scoping breaks step 2 and step 5. Admission control prevents step 1's worst-case primitives (privileged pods, hostPath). Runtime detection catches steps that slip through.

Identity chaining - pod to node to cloud

This is the cloud-Kubernetes security failure that produces the worst outcomes. Same shape as the container version (see Containers), with two added wrinkles:

The fix is workload identity

Every cloud now offers a per-pod identity that breaks both legs:

Once workload identity is configured, also block IMDS from pods. On EKS, enforce IMDSv2 with hop-limit 1 (so containers can't reach the metadata interface through the host). On AKS and GKE, the network plugin / hostNetwork enforcement does the equivalent.

Finally - disable automatic service-account token mounting for pods that don't need to talk to the API server. automountServiceAccountToken: false on every workload that isn't an operator. This removes one of the attacker's default tools.

RBAC and the API server

Kubernetes RBAC is the equivalent of cloud IAM, scoped to the API server. Roles grant verbs (get, list, create, delete) on resource types within a namespace. ClusterRoles grant them cluster-wide. RoleBindings and ClusterRoleBindings attach roles to users, groups, or service accounts.

Common ways RBAC goes wrong:

What good looks like

Pod security & escapes

The escape paths covered on the Containers page all apply inside Kubernetes - privileged pods, sensitive capabilities, hostPath / hostPID / hostNetwork, kernel CVEs. Kubernetes adds tooling to enforce hardening cluster-wide:

Pod Security Standards (PSS)

Kubernetes ships three baseline standards: Privileged (no restrictions), Baseline (blocks the worst known escape paths - privileged, hostPath, hostPID, hostNetwork, hostIPC, certain capabilities), and Restricted (enforces hardened defaults: non-root user, read-only root filesystem, dropped capabilities, RuntimeDefault seccomp).

Apply with namespace labels - pod-security.kubernetes.io/enforce: restricted on production namespaces; baseline on namespaces that need slightly more flexibility; never privileged in prod. PSS replaced the deprecated PodSecurityPolicy and is built into modern Kubernetes (1.25+).

Per-pod hardening

The minimum hardened pod spec for production:

Sandbox runtimes for high-risk workloads

For workloads that run customer-supplied or otherwise untrusted code, the host kernel is not enough isolation. Options:

The performance cost is modest; the escape-resistance gain is significant.

Networking - flat by default

Out of the box, every pod can talk to every other pod. The cluster's pod CIDR is flat - no segmentation between namespaces, no segmentation between trust levels, no segmentation at all. This is a Kubernetes design choice (it makes service discovery simple) and a security problem (a compromised pod scans the cluster).

NetworkPolicy

Kubernetes NetworkPolicy resources define allowed ingress and egress at the pod label level. The standard pattern:

  1. Default-deny for all pods in production namespaces (no ingress, no egress).
  2. Explicitly allow what's needed - "frontend can ingress from ingress-controller, egress to backend"; "backend can ingress from frontend, egress to database."
  3. Always allow egress to kube-dns in the kube-system namespace, otherwise DNS breaks.
  4. Always deny egress to the link-local range (169.254.0.0/16) - that's IMDS.

NetworkPolicy requires a CNI plugin that supports it. All three managed clouds support this (Cilium, Calico, AWS VPC CNI with restrictions, Azure CNI Overlay, GKE Dataplane V2). Verify before assuming.

Service mesh

NetworkPolicy is L3/L4 - IP and port. For L7 controls (HTTP method, path, header), and for mutual TLS between every workload, a service mesh adds an identity-aware proxy alongside every pod. Options:

For a small cluster, NetworkPolicy is usually enough. For multi-team production with workloads at different trust levels, the mesh's mTLS and identity-based authorization are the right primitives.

Ingress / egress controls

Admission control - the policy chokepoint

Every change to the cluster flows through the API server. Admission controllers intercept those requests and can mutate or reject them. This is the chokepoint where you enforce "no privileged pods," "no images without signatures," "every namespace must have a NetworkPolicy" - at the place the cluster cannot bypass.

Built-in admission

Kubernetes ships several admission controllers; on managed clouds, the relevant ones are typically enabled by default. Pod Security Standards (covered above) is the most important built-in admission control.

Policy engines

For richer rules - beyond what PSS covers - install a policy engine:

Policies worth running on day one

Audit before enforce

Roll new policies out in audit mode first - log violations without blocking. Once the log is clean (or fixes have shipped), flip to enforce. This pattern avoids the "we deployed a policy and now half the cluster won't start" outage.

Cluster secrets & vault integration

A kind: Secret in Kubernetes is base64-encoded YAML, not encrypted. kubectl get secret foo -o yaml | base64 -d recovers the plaintext, and the same applies to anyone reading etcd directly. Production clusters need (1) etcd-level encryption-at-rest, and (2) an external source of truth for the secret values so rotation, audit, and lifecycle live in a real secrets manager - not in git and not in kubectl. The patterns below are typically combined, not picked individually. For the crypto and capability details behind the secrets managers themselves, see Data Security & KMS - Secrets management; for the broader workflow story, see IAM - Secrets in developer workflows.

1. Encrypt etcd at rest with a KMS provider

The minimum bar. Configure the API server's KMS encryption provider so Secret resources are encrypted with a customer-managed key before being written to etcd. Managed offerings make this one toggle:

This protects against etcd-snapshot exfiltration, backup theft, and direct etcd reads. It does not protect against an attacker with API access - the API server will decrypt and serve. KMS encryption is necessary but not sufficient.

2. External Secrets Operator (ESO)

External Secrets Operator is the de-facto standard. You define an ExternalSecret CRD that references a value in an external store (AWS Secrets Manager, Azure Key Vault, GCP Secret Manager, HashiCorp Vault, 1Password, Doppler, Infisical, Akeyless, GitHub, and many more), and ESO materializes it as a Kubernetes Secret, refreshing on a schedule. The cluster doesn't own the secret - the secrets manager does, and ESO is the synchronizer.

3. Secrets Store CSI Driver

Secrets Store CSI Driver mounts secrets directly from the external store as a tmpfs volume in the pod - they never become K8s Secret objects, never hit etcd. Provider plugins exist for AWS, Azure, GCP, and Vault.

4. Vault Agent Injector

Vault Agent Injector is a mutating webhook that adds a Vault Agent sidecar (or init container) to annotated pods. The agent authenticates to Vault using the pod's service-account JWT, fetches secrets, and writes them to a shared volume (or templated config file). Right answer when the team already runs Vault and wants dynamic secrets (DB credentials minted per-pod, expiring with the pod).

5. SOPS for GitOps

SOPS encrypts the values inside YAML / JSON / env / ini files using a cloud KMS key (or age / PGP), leaving the keys readable so diffs make sense. The encrypted file is safe to commit to git. Flux and Argo CD both have first-class SOPS integration: the reconciler decrypts at apply-time using its workload identity. Right answer for GitOps-first teams that want declarative cluster state including secrets, with the source of truth in git.

6. Sealed Secrets

Bitnami Sealed Secrets is the older "encrypt-to-the-cluster" pattern - a controller in-cluster holds a private key, anyone can encrypt against the matching public key, only that controller can decrypt. Simple to operate, but the keypair is cluster-bound (multi-cluster requires re-sealing) and the controller is a single point of failure for decryption.

7. SPIFFE / SPIRE for cross-environment workload identity

SPIFFE / SPIRE issues short-lived X.509 or JWT identities (SVIDs) to workloads based on attestation (which node, which pod, which service account). The cross-environment story is the point: a workload in Kubernetes, a workload on a VM, a workload on-prem can all hold a SPIFFE identity that downstream services trust uniformly. Right answer for multi-cluster, multi-cloud, or hybrid-on-prem deployments where you want one identity fabric instead of one per platform.

Picking among them

What about kubectl create secret?

Fine for ephemeral demos and bootstrap. Not fine as the source of truth for anything in production. The path to "we don't know what's deployed" starts with a human running kubectl create secret at 2am to fix an outage; the secret is now nowhere except etcd, and nobody knows it exists six months later when the rotation is overdue. Every production secret should originate in a secrets manager and arrive in the cluster via ESO / CSI / Vault Agent.

For pipeline-side patterns (how the secret gets into the manager in the first place from CI), see CI/CD - Secrets in pipelines.

Supply chain - what runs in your cluster

The cluster runs more than your code. Every Helm chart you install, every operator, every CRD, every add-on (logging, metrics, ingress, secrets, GitOps) is third-party code with cluster permissions. The supply chain extends beyond your own image build.

Man analyzing business data and financial graphs on a laptop
Photo by Kaboompics on Pexels

EKS, AKS, and GKE side-by-side

All three managed Kubernetes services run upstream Kubernetes - same API, same primitives, same kubectl. The differences are in the managed scope, the cloud integrations, and the operational defaults.

Building block EKS AKS GKE
Control plane management Cloud (HA across 3 AZ) Cloud (free tier; paid for SLA) Cloud (regional or zonal)
Fully-managed mode EKS Auto Mode, Fargate profiles Container Apps (separate product) GKE Autopilot
Workload identity IRSA, EKS Pod Identity AKS Workload Identity (Entra federated) GKE Workload Identity
Default CNI AWS VPC CNI (pod = VPC IP) Azure CNI Overlay or kubenet GKE Dataplane V2 (Cilium-based) or routes
NetworkPolicy support VPC CNI w/ policy or Cilium Calico, Cilium, Azure NPM Built-in via Dataplane V2
Image admission Kyverno / Gatekeeper + AWS Signer Image integrity policies, Defender for Containers Binary Authorization (native)
Runtime threat detection GuardDuty EKS Protection & Runtime Monitoring Defender for Containers SCC Container Threat Detection
Audit log destination CloudWatch Logs Azure Monitor / Log Analytics Cloud Audit Logs
Private control plane Private endpoint option Private cluster option Private cluster option
Auto-upgrade Managed node groups with maintenance windows Auto-upgrade channel Release channels (rapid / regular / stable)

For teams new to Kubernetes, fully-managed modes (GKE Autopilot, EKS Auto Mode + Fargate, AKS + Container Apps for workloads that fit) remove enormous classes of operational and security work - no node patching, no privileged-pod admission decisions to make, no CNI choice to second-guess. The trade-off is reduced flexibility; for most application workloads, that's a feature.

Hardening checklist

The non-negotiable Kubernetes-in-cloud hardening list:

Identity

Workload identity (IRSA / WIF) for every pod that talks to the cloud. IMDS blocked from pods. Default SA token mounting disabled. Cloud SSO for human cluster access.

RBAC

One service account per workload. Narrowest possible Role; no wildcards. ClusterRoleBindings audited. API server audit logs flowing to SIEM.

Pod & node

Pod Security Standards restricted on prod namespaces. Sandbox runtime (gVisor / Kata / Fargate) for untrusted workloads. Nodes auto-patched on a defined cadence.

Network & admission

Default-deny NetworkPolicy. Egress allow-list. Admission control (Kyverno) rejecting risky configs. Signed-image enforcement at admission.

Add: runtime detection (Falco, Tetragon, your CNAPP runtime module); centralized container and audit logs; image scanning at build and registry; an SBOM database for fast CVE response; cluster autoscaler limits to cap blast radius from runaway pods.

Common pitfalls

Further reading

Official documentation

Benchmarks & guidance

Tooling

Related CSOH pages

FAQ

Do I need Kubernetes at all?

Often, no. If you have one or two workloads, a serverless container service (Cloud Run, Fargate, Container Apps) gives you containerized deployment without operating a cluster. The decision point is around 5-10 distinct workloads with shared infrastructure needs (service-to-service identity, scheduled jobs, multi-region rollout). Below that, Kubernetes is overhead; above it, the operational model pays back.

EKS vs AKS vs GKE - which is "most secure"?

The Kubernetes API surface is the same on all three. GKE has historically led on built-in defaults - Workload Identity, Binary Authorization, GKE Sandbox, Autopilot - but EKS and AKS have closed the gap and offer equivalents. The bigger driver is "which cloud do you already live in?" - running EKS from an Azure organization or GKE from AWS adds federation complexity you don't need.

Should I use Helm?

For deploying off-the-shelf software (a database operator, an ingress controller), Helm is the lingua franca and you'll use it whether you like it or not. For your own workloads, Helm or Kustomize or raw YAML are all defensible. The security-relevant choice is "are these manifests in git, reviewed, and deployed by a pipeline" - not which tool generated them.

What about GitOps?

GitOps controllers (Argo CD, Flux) pull desired state from a Git repo and reconcile it into the cluster. Two big security wins: the cluster never trusts the CI runner (only itself and the Git repo), and every cluster change is in git history. For production workloads, GitOps is the right deploy model in 2026. See the CI/CD page for how it sits next to the build pipeline.

Are CRDs (custom resources) a security risk?

The CRD itself is data - it doesn't run code. But CRDs are usually paired with operators (controllers) that do run code, often with broad cluster permissions. Auditing CRD-bearing operators for RBAC scope and image provenance is the same exercise as auditing any third-party software with cluster-level access.

How does this relate to zero trust?

Kubernetes plus a service mesh is one of the cleanest places to put zero-trust principles into production. mTLS between every workload (verify explicitly), per-pod workload identity (least privilege), default-deny network policy (assume breach), continuous runtime visibility (verify continuously). The cluster gives you the primitives; the configuration is the work.

Where next