What is a container escape?

A container escape is any technique that lets code running inside a container affect the host or other containers on it. The most common paths: a process running as root in a container with a host filesystem mount, a container with --privileged or CAP_SYS_ADMIN, a kernel vulnerability the container kernel shares with the host (e.g. CVE-2022-0492 cgroups, dirty pipe, leaky vessels), or a container with hostPID/hostNetwork that can see and signal host processes. The fix is layered: drop capabilities, run as non-root, enforce seccomp/AppArmor profiles, and patch the host kernel.

Why is the instance metadata service (IMDS) dangerous for containers?

Containers running on a cloud VM share the host's network namespace by default, which means they can reach the instance metadata service at 169.254.169.254 and request the VM's IAM credentials. A compromised container can pivot from 'I can run code in one container' to 'I have the entire VM's cloud permissions.' This is identity chaining, and it's how the Capital One breach worked. The fix on AWS is IMDSv2 with hop-limit 1 (so containers can't reach it through the host), or - better - give each workload its own scoped identity via IRSA / Workload Identity / managed-identity-per-pod, and block IMDS access from workloads entirely.

Are managed container services more secure than self-hosted?

Generally yes - Cloud Run, Fargate, Container Apps, and similar managed services run each workload in a per-customer micro-VM (Firecracker on AWS, similar on the others). The kernel boundary is hardened, the host is patched by the cloud, and the user can't misconfigure --privileged. The tradeoff is reduced control: no DaemonSets, no privileged sidecars, no kernel modules. For most application workloads that's a feature, not a limitation.

Containers & Cloud Security

Q: Is a container a security boundary?

A container is a process-isolation boundary, not a strong security boundary. Containers on the same host share a kernel - a kernel vulnerability, a misconfigured capability, or a privileged container can let a process inside one container affect another or the host. Treat the host (the VM the container runs on) as the real boundary; treat the container as a lightweight isolation layer that needs hardening on top. If you need a hard tenant boundary, use sandboxed runtimes (gVisor, Kata Containers, Firecracker) or separate VMs.

From above contemporary server cable trays in a modern data center — Photo by Brett Sayles on Pexels

Last updated 2026-05-15 · By Shawn Nunley · Vendor-neutral · View source on GitHub

The 30-second version: A container is a Linux process (or set of processes) isolated by namespaces and cgroups, packaged with its dependencies into an immutable image. Multiple containers share one host kernel - so the boundary is real but thin. In the cloud, containers run on a VM whose instance metadata service hands out the VM's IAM credentials. A compromised container that can reach IMDS gets the whole VM's cloud permissions.

That's the dominant cloud-container risk: identity chaining - pivoting from "code execution in one container" to "the cloud role attached to the VM." Add container escapes via privileged flags or kernel CVEs, flat pod-to-pod networking, and pulled-from-public-internet base images, and you have the full container-security surface in the cloud.

What a container actually is
Why containers and cloud are inseparable
The boundary - is a container a security boundary?
The six risk categories that matter
Container escapes - the real paths
Identity chaining - IMDS and stolen cloud roles
Networking - flat by default
Supply chain - images, registries, signing
Minimal & hardened base images
Runtime detection
AWS, Azure, and GCP container services
Hardening checklist
Common pitfalls
Further reading
FAQ

What a container actually is

A container is a Linux process (or group of processes) with three things wrapped around it:

Namespaces - kernel features that give the process its own view of system resources. PID namespace makes the process think it's the only one on the system. Mount namespace gives it its own filesystem root. Network namespace gives it its own interfaces. UTS, IPC, user, cgroup namespaces round it out. Different containers see different namespaces; the kernel keeps the views separate.
Cgroups (control groups) - kernel mechanism for capping CPU, memory, I/O, and PID counts per group of processes. The container can't starve the host, can't fork-bomb the system, can't allocate more memory than its limit.
An image - a tarball containing the filesystem the process sees, plus a manifest describing how to run it. Built once, shipped to a registry, pulled to any host, run identically.

That's the entire abstraction. There is no "container kernel" - the host's kernel runs every container's code. There is no hypervisor - containers are not VMs. They start in milliseconds because they're just processes; the only thing being created is the namespace and cgroup wiring around an exec.

Container runtimes (containerd, CRI-O, Docker Engine, runc underneath) automate the namespace setup and image management. The Open Container Initiative (OCI) standardizes the image format and runtime API so the same image runs on any conformant runtime.

Why containers and cloud are inseparable

The cloud's value proposition is "elastic capacity priced by the second." Containers are the most efficient unit to fill that capacity:

Sub-second startup. A VM takes 30-90 seconds to boot. A container takes 50ms. When traffic spikes 10x, only containers scale in time.
Density. Hundreds of containers fit on a VM that would host one or two VMs. Cloud compute is billed by the VM; density translates directly to cost.
Immutable artifacts. The image is content-addressed. The same image hash that ran in staging runs in prod - no "works on my machine," no config drift.
One artifact across clouds. An OCI image runs on EKS, AKS, GKE, ECS, Cloud Run, Container Apps, App Runner, Lambda (with caveats). The portability argument matters when you have to leave one cloud.
Operability. Health checks, log shipping, metrics, auto-restart on crash - the container runtime gives you these for free, regardless of language.

The cost of that efficiency, from a security standpoint, is a shared kernel and a shared network namespace on the host. Both become attack surface when the threat model is "untrusted code in a container" instead of "trusted code on a VM."

The boundary - is a container a security boundary?

It is not. Or more precisely: a container is a process-isolation boundary, not a tenant-isolation boundary. The difference matters.

If you run two of your own microservices in two containers on the same VM, the isolation is fine for most purposes - the kernel keeps their namespaces separate, cgroups keep one from starving the other. That's process isolation. It works.

If you run two different customers' code in two containers on the same VM, the isolation is not fine. A kernel CVE affects both. A misconfigured capability on one container can let it inspect or modify the other. A noisy-neighbor exploit (Spectre, Foreshadow) can leak data across the shared CPU cache. That's a tenant boundary, and the container abstraction was not designed for it.

The cloud providers know this. That's why AWS Fargate, Azure Container Apps, Cloud Run, and similar managed services run each workload in its own micro-VM (Firecracker or equivalent). The container API is preserved; the actual isolation is at the VM level, where the hypervisor is the boundary.

If you self-host on EKS / AKS / GKE / vanilla VMs, you control where containers land. If two workloads should not share a kernel - different trust levels, customer multi-tenancy, regulated data - use sandbox runtimes (gVisor, Kata Containers) or pin them to different node pools. The platform won't do this for you.

The six risk categories that matter

Container security in the cloud collapses into six recurring categories. The names differ across vendors; the threats don't.

1. Container escape

Process inside a container affects the host or other containers. Privileged flags, kernel CVEs, hostPath mounts, raw capabilities. The classic "get root on the node."

2. Identity chaining

Compromised container reaches the instance metadata service and steals the VM's IAM credentials - pivoting from one workload's permissions to the host's entire cloud role.

3. Lateral network movement

Default container networking is flat. Any pod can reach any other pod, and often the cloud control plane. One compromised container scans the internal network for soft targets.

4. Supply chain

Base images pulled from public registries, dependencies pulled from public package managers, build-time injection. Compromise the image; compromise every deploy.

5. Secrets exposure

Secrets baked into images, passed as env vars (visible in logs and on disk), or readable through the API server. The container becomes a secrets-extraction target.

6. Runtime visibility gap

Without runtime instrumentation, "the container is running" is all you know. Process trees, network connections, syscalls - none of it is captured by default.

Container escapes - the real paths

"Container escape" sounds exotic. In practice it's a small number of well-known paths, almost always opened by configuration choices the user made.

The privileged container

Running with --privileged (or securityContext.privileged: true in Kubernetes) disables almost every isolation feature the runtime adds. The container has all Linux capabilities, including CAP_SYS_ADMIN; sees all devices; can mount filesystems; can load kernel modules. From a privileged container, getting host root is a single chroot away. Trail of Bits has the canonical write-up.

Sensitive capabilities without --privileged

Even without the privileged flag, specific capabilities are dangerous: CAP_SYS_ADMIN, CAP_SYS_PTRACE, CAP_SYS_MODULE, CAP_NET_ADMIN, CAP_DAC_READ_SEARCH. Drop all capabilities by default and add back only what the workload needs.

HostPath / hostPID / hostNetwork / hostIPC

Mounting / (or /var/run/docker.sock) from the host into the container is direct escape. So is hostPID: true (you can signal host processes) and hostNetwork: true (you see and bind host interfaces, including the IMDS interface). Admission controllers should reject these by default in production.

Kernel CVEs

Because the host kernel is shared, a kernel vulnerability is a container-escape vulnerability. Notable examples: CVE-2022-0492 (cgroups v1 release_agent), CVE-2022-0185 (filesystem context heap overflow), Leaky Vessels (runc / Docker, 2024), Dirty Pipe. The defense is keeping host kernels patched - which on managed services is the cloud's job, on self-managed nodes is yours.

Misconfigured docker.sock

A container that can talk to /var/run/docker.sock can ask the daemon to start a new container - privileged, with the host root mounted. Used in numerous public breaches (Tesla 2018, various crypto-jacking incidents). Never mount the docker socket into anything other than CI/admin tooling, and even then, isolate.

Defense in layers

Run as non-root. USER 1000 in the Dockerfile; runAsNonRoot: true in the pod spec. Root inside a container is one CVE away from root on the host.
Read-only root filesystem. readOnlyRootFilesystem: true. Mount writable paths explicitly with tmpfs or volumes.
Drop all capabilities; add back the minimum. capabilities: drop: ["ALL"] in the security context.
Seccomp profile. RuntimeDefault blocks ~50 risky syscalls (ptrace, keyctl, raw socket creation). Use a tighter custom profile for sensitive workloads.
AppArmor / SELinux. Mandatory access control layered on top of capabilities. Distros vary; both managed and self-managed clusters can run with profiles enabled.
Sandboxed runtimes for untrusted code. gVisor (user-space syscall interception) or Kata (per-container micro-VM). Slight perf cost; significant escape-prevention benefit.

Identity chaining - IMDS and stolen cloud roles

This is the cloud-specific container risk. Every cloud VM has an Instance Metadata Service at the link-local address 169.254.169.254 (or fd00:ec2::254 on AWS IPv6, similar paths on Azure and GCP). The metadata service hands out, among other things, the IAM credentials of the role attached to the VM.

By default, containers running on the VM share its network namespace - they can reach 169.254.169.254. A compromised container with code execution can run curl http://169.254.169.254/... and walk away with the VM's full cloud permissions. That's identity chaining: workload compromise → host compromise → cloud compromise, in one hop.

This is exactly the Capital One breach pattern (see the Capital One kill chain) - except instead of an SSRF in a WAF, the entry point is whatever runs in your container.

The fix has two parts

1. Per-workload identity, not per-host. Every cloud now offers a way to give an individual workload (pod, container, function) its own short-lived cloud credentials, scoped to that workload only:

AWS - IAM Roles for Service Accounts (IRSA) on EKS, EKS Pod Identity, task roles on ECS, function roles on Lambda.
Azure - Workload Identity on AKS (federated to Entra), managed identity per Container Apps revision.
GCP - Workload Identity Federation on GKE, per-revision service account on Cloud Run.

With per-workload identity, the container has only the permissions it needs. A compromise gets you that workload's credentials - not the VM's.

2. Block IMDS from containers entirely. Belt-and-suspenders: even with workload identity in place, the host's IMDS shouldn't be reachable from containers.

AWS - enforce IMDSv2 with hop-limit 1 (HttpPutResponseHopLimit: 1). Containers, which have at least one network hop, can't reach the metadata service through the host's interface.
Azure - block 169.254.169.254 at the iptables / network policy layer on the host.
GCP - workload identity already breaks this path; metadata server endpoints have additional Metadata-Flavor: Google header enforcement.

If your container framework is "the VM has a god role and every container uses it" - that's the legacy pattern. Move every workload to per-pod / per-task / per-function identity. Block IMDS from containers as a hard rule.

Networking - flat by default

Container networking starts permissive and gets restricted. By default in most environments:

Any container can open a TCP connection to any other container in the same network.
Any container can reach the host's IP and any port the host exposes.
Egress to the public internet is unrestricted.
DNS resolution can be used to enumerate internal services and other tenants.

That means a single compromised container is a scanner. It probes for accessible internal services, vulnerable management endpoints (Redis with no auth, Elasticsearch open to 0.0.0.0, a CI agent listening on the cluster network), and the cloud control plane (IMDS being the prime target).

What containment looks like

Network policies. On Kubernetes, NetworkPolicy resources define allowed ingress/egress per pod. Default-deny in production namespaces, then explicitly allow what's needed.
Service mesh. Istio, Linkerd, Cilium, Consul - mTLS between services, identity-based authorization, traffic-flow telemetry. Brings the "every connection is authenticated" zero-trust model down to the pod level.
Egress controls. Restrict outbound to a curated allow-list (your package registries, your dependencies, your APIs). Egress proxies (Cilium's L7 policies, NAT gateways with allow-list rules, service mesh egress gateways) make this practical.
No direct external exposure. Containers behind a load balancer, ingress, or API gateway - never with a public IP directly on the pod.
Block link-local addresses. Drop traffic to 169.254.0.0/16 from containers unless explicitly required.

Flat network is fine for a single-team sandbox. It is not fine for production. Treat the assumption "this container only needs to talk to these specific places" as the design baseline.

Supply chain - images, registries, signing

The container image is the artifact that ships to every environment. Compromising it compromises every deploy. The supply chain risks layer up:

Public base images. Pulling FROM ubuntu:latest means inheriting whatever's in that image - including unpatched CVEs, packages you don't need, and a writable root. Pinning to a digest (FROM ubuntu@sha256:...) at least makes the input deterministic; switching to a distroless or minimal base (Distroless, Wolfi, Chainguard, Alpine when libc-compatible) shrinks attack surface dramatically.
Public dependency repositories. npm, PyPI, RubyGems, Maven Central, NuGet, Go modules. All have been the source of typosquats, dependency confusion attacks, and account takeovers that injected malicious code. Lockfiles + vulnerability scanning + private mirrors are the standard mitigations.
Layer cache poisoning. Build caches in CI/CD shared across jobs can leak credentials and let one job's layer poison another's image.
Registry compromise. If anyone can push to your image registry, anyone can replace prod images. Lock down registry write access to your CI pipeline; require signed images.
Long-running images. The image you built six months ago still has six-month-old CVEs. Rebuild on a schedule even when the source hasn't changed, so base-image patches flow through.

What good looks like

Scan every image at build and push time - Trivy, Grype, Snyk, Wiz, the cloud's native scanner (ECR scanning, Defender for Cloud, Artifact Registry vulnerability scanning). Fail the build on HIGH/CRITICAL.
Sign every image - Cosign / Sigstore. Push signature attestations to the registry.
Verify at deploy time - admission controllers reject unsigned images. AWS Signer + Kyverno on EKS; Binary Authorization on GKE; Defender admission control on AKS.
Generate and store SBOMs - Syft, the cloud's SBOM tools. When the next big CVE drops, you query the SBOM database instead of running through your fleet image-by-image.
SLSA provenance - emit attestations describing which builder built the image from which source. Reject images without provenance in prod.

The pipeline that builds the image is part of the supply chain. See the CI/CD page for how to lock that down.

Minimal & hardened base images

A standard Ubuntu or Debian base ships with hundreds of packages your application doesn't use - a shell, a package manager, network tools, build toolchains. Every one of those packages is potential CVE surface and potential attacker tooling. The biggest single lever you have to reduce both is to start from a minimal base image.

The category has grown into a small market of its own. Each vendor takes a slightly different cut at the same problem: ship the smallest, most-current, most-trustworthy base image possible, so the application is the only thing left to harden.

Distroless

Google's open-source minimal images - language runtimes only, no shell, no package manager, no busybox. The original "remove everything you don't need" base. Free, well-maintained, but rebuild cadence is community-paced and CVE coverage is best-effort.

Chainguard Images

Commercial-grade minimal images built on the Wolfi "undistribution." Daily rebuilds, signed with Sigstore, SLSA L2 provenance, SBOM attached, often zero known CVEs at any moment. Free tier for older versions; paid for current and LTS.

Minimus

Newer entrant focused on minimal, continuously-rebuilt images plus a remediation workflow - they ship CVE-free replacements for your existing base images and track which of your workloads still ship vulnerable bases. Commercial; the value proposition is "one click to a clean fleet."

Wiz secured images

Wiz ships hardened base images bundled with their CNAPP - minimal, signed, scanned in the same plane as the runtime workloads they observe. Differentiator is the closed loop: the image, the registry scan, and the production-runtime detection are all in one product.

RapidFort

Profiles your container's actual syscall and file usage in CI, then strips out everything the workload never touches. Less of a "use my image" play, more of a "shrink whatever image you already have" approach.

Docker Hardened Images

Docker's own commercial minimal-image line, integrated with Docker Hub and Scout. Same general shape - minimal, signed, rebuilt - sold through the registry you already pull from.

Why this category exists at all

Three real problems push organizations off "FROM ubuntu":

The "Ubuntu has 90 CVEs today" reality. Even a freshly-rebuilt mainstream base image typically has dozens of known vulnerabilities - almost always in packages your app never touches. The scanner sees them; the audit sees them; the auditor doesn't know libxslt is unreachable from your Go binary. You spend cycles either patching or arguing it doesn't matter.
The shell-in-prod problem. When an attacker pops a web app shell, the next thing they run is curl, wget, bash, id, find. A minimal base with no shell and no curl is dramatically less useful to them - a "code execution" primitive without standard tooling is a much harder pivot than one in a full Linux userspace.
Rebuild cadence. Your Ubuntu base from six months ago has six months of accumulated CVEs. The minimal-image vendors solve this by rebuilding daily from upstream sources, signing each build, and pushing the new digest. Your CI just re-pulls; the CVE count resets.

How minimal images help the ecosystem, not just one image

The category's real impact is upstream of any one shop:

Shifts CVE work from every consumer to one producer. Instead of 10,000 organizations each patching their Ubuntu images, one vendor maintains the rebuild pipeline and ships the result. Aggregate vulnerability-management cost across the ecosystem drops by orders of magnitude.
Normalizes signing & provenance. Every serious minimal-image provider ships Sigstore signatures and SLSA provenance attestations. Adoption of those primitives in downstream pipelines (Binary Authorization, Kyverno verify-image rules) is partly being pulled by these vendors making them table stakes.
Pressures distro maintainers. The rise of Wolfi, Distroless, and Chainguard has visibly accelerated Alpine, Ubuntu Minimal, and Debian's own rebuild and SBOM efforts. The "we always lagged on CVEs" defense doesn't hold when there's a CVE-free alternative on Docker Hub.
Shrinks the average attack surface in production. An ecosystem where most production containers carry no shell and no package manager is one where post-exploitation tooling has to be staged in (and that staging is itself detectable). Even partial adoption pulls the median attacker's job harder.
Better signal for runtime detection. Falco / Tetragon / CNAPP-runtime rules generate fewer false positives against a minimal image - there's no legitimate reason for /bin/sh to execute if the image doesn't contain it. The alerts become high-signal because the baseline is so quiet.

How to actually adopt one

Start with new workloads. Retrofitting FROM ubuntu images often surfaces accidental dependencies on system binaries. Greenfield workloads adopt minimal bases with near-zero friction.
Use multi-stage builds. Build with a full toolchain image; copy the artifact into the minimal final stage. The fat image never reaches production; the published image carries only the runtime.
Pin to digests. Minimal-image vendors push new digests daily - that's the feature. Pin image@sha256:... in your manifests and let your dependency-update bot (Dependabot, Renovate) bump them in PR.
Plan for the missing shell. Debugging a distroless container in prod requires kubectl debug with an ephemeral debug container, or sidecar tooling. Train the team on this before the incident.
Verify signatures at admission. See the supply-chain section. Pulling Chainguard or Minimus is half the value; enforcing that only their signed digests can run in prod is the other half.

None of this replaces the rest of the container hardening list - non-root user, dropped capabilities, read-only root filesystem, network policy, workload identity. It removes a whole category of base-image vulnerabilities before the rest of the hardening even applies.

Runtime detection

Scanning catches what's in the image. Network policy controls what can talk. Neither tells you when a process inside a running container does something unusual - a shell spawning from a web server, an unexpected outbound connection, a credential-file read, a privilege escalation attempt.

Runtime detection closes that gap. The dominant approach is eBPF-based syscall instrumentation:

Falco - CNCF project, the de facto open-source runtime-security engine. Rule-based detection on syscalls, container metadata, and Kubernetes audit logs.
Tetragon - from Isovalent/Cilium, eBPF runtime enforcement (can block, not just alert).
CNAPP runtime modules - Wiz, Aqua, Sysdig, CrowdStrike, Defender for Containers. Same idea, packaged as managed agents with SaaS analytics.

What runtime detection catches: shell spawns inside production containers, processes writing to /etc/shadow, network connections to known C2 infrastructure, capability escalation, attempts to access /proc/1/root, anomalous outbound DNS. The signal is high; tuning the rules to fit your normal workload behavior is the work.

If you don't have runtime visibility, the answer to "did anything weird run in our containers last month?" is "we don't know."

Man analyzing business data and financial graphs on a laptop — Photo by Kaboompics on Pexels

AWS, Azure, and GCP container services

Every cloud ships an orchestrator (managed Kubernetes), a serverless container runtime, and a registry. The capabilities map closely:

Building block	AWS	Azure	GCP
Managed Kubernetes	EKS	AKS	GKE (Standard & Autopilot)
Serverless containers	Fargate (with ECS or EKS), App Runner	Container Apps, Container Instances	Cloud Run
Simple container service	ECS (Elastic Container Service)	Container Apps	Cloud Run / Cloud Run Jobs
Image registry	ECR (Elastic Container Registry)	Azure Container Registry	Artifact Registry
Image scanning	ECR scanning (basic + enhanced via Inspector)	Defender for Containers (registry + runtime)	Artifact Registry vulnerability scanning
Per-workload identity	IRSA, EKS Pod Identity, ECS task roles	AKS workload identity (federated to Entra)	GKE workload identity, Cloud Run service accounts
Sandboxed runtime	Fargate (Firecracker micro-VM)	Container Apps (managed isolation), AKS Confidential Containers	Cloud Run (gVisor), GKE Sandbox (gVisor)
Admission/signing enforcement	AWS Signer + Kyverno on EKS	Image integrity policies on AKS	Binary Authorization on GKE & Cloud Run
Runtime threat detection	GuardDuty for EKS & ECS Runtime Monitoring	Defender for Containers	Security Command Center Container Threat Detection

For a workload that doesn't need cluster primitives, the serverless options (Fargate, Container Apps, Cloud Run) are dramatically simpler to secure: no nodes to patch, no kernel to share, no --privileged footgun. For workloads that do need Kubernetes, see the Kubernetes page for the cluster-specific considerations.

Hardening checklist

The non-negotiable container hardening list for cloud workloads:

Image

Minimal/distroless base. Pinned to digest. No secrets baked in. Scanned at build, push, and on a schedule. Signed; signature verified at deploy.

Runtime config

Non-root user. Read-only root filesystem. All capabilities dropped. RuntimeDefault seccomp profile. No --privileged, no hostPath, no hostNetwork.

Identity

Per-workload cloud identity (IRSA / Workload Identity). IMDS unreachable from containers. No long-lived cloud credentials in env vars or images.

Network

Default-deny network policy. Explicit egress allow-list. mTLS for service-to-service. No direct public IPs on workload containers.

Layer on top: admission controllers that enforce these defaults (Pod Security Standards, Kyverno, OPA Gatekeeper); runtime detection (Falco, Tetragon, or your CNAPP's runtime module); centralized logs from every container; and an SBOM database for fast CVE response.

Common pitfalls

Treating containers as the security boundary. They're not. A kernel CVE makes them not. Plan for what happens when a container is compromised, not just how to prevent it.
The VM god-role. All containers inherit the host's broad IAM role. One compromise = everything. Move to per-workload identity.
Secrets in env vars. They land in container metadata, in logs, in /proc/1/environ, in kubectl describe output. Use a secrets store mounted as files with strict permissions, or fetched at runtime.
"It's just an internal service, no need for auth." Internal containers get compromised. Authenticate every connection, even between your own services.
Pulling latest tag in production. Today's latest isn't tomorrow's. Pin to immutable digests; let the pipeline manage upgrades.
Running as root inside containers because "it's only inside the container." Until a kernel CVE makes it not. Run as non-root by default.
No runtime visibility. If a container is compromised and you have no runtime detection, you find out from the breach report.
Building once, never rebuilding. Base images get patched; your image doesn't, until you rebuild. Schedule rebuilds even when source code hasn't changed.
Sidecar privilege bleed. A privileged sidecar (logging agent, mesh proxy) inherits its privileges into the same pod's namespace. Audit your sidecars; some popular ones run with more than they need.

FAQ

Are Docker, containerd, and Kubernetes the same thing?

No. Docker is a developer-facing tool (CLI + daemon) that builds and runs containers. containerd is the lower-level runtime that actually starts processes; Docker uses it under the hood, and so does Kubernetes. Kubernetes is an orchestrator - it decides which containers run where, across many hosts. You can run containers without Kubernetes (ECS, Cloud Run, plain Docker); you can't run Kubernetes without a container runtime.

What's the difference between containers and VMs?

A VM has its own kernel; a container shares the host's. VMs boot in seconds with hundreds of MB of memory overhead; containers start in milliseconds with no kernel overhead. VMs are a strong isolation boundary (hypervisor); containers are a weaker one (kernel namespaces). The cloud's "best of both" answer is the micro-VM - Firecracker, Kata - which gives the container API on top of a fast hypervisor.

Is rootless Docker enough?

It helps a lot - the daemon no longer runs as root, and a container compromise is no longer "the daemon's root." But it's not a silver bullet: the kernel is still shared, kernel CVEs still escape, the workload still needs to drop capabilities and run as a non-root user inside the container. Rootless removes one big foot-gun; the rest of the hardening still applies.

How is "containerless" / serverless different?

Cloud Run, Container Apps, Fargate, App Runner - these run your container image, but the platform manages the host, the kernel, the runtime sandboxing, and the scaling. You don't have a node to log into. The trade-off is reduced control (no DaemonSets, no privileged sidecars) for significantly less security surface to manage. For most application workloads, this is the right default in 2026.

How does this relate to zero trust?

Containers operationalize zero trust at the workload layer when you do this right: per-workload identity (verify explicitly), default-deny networking (least privilege), mTLS between services (verify continuously), runtime detection (assume breach). The container is a unit of identity, not just a unit of deployment.

Where next

Kubernetes & managed Kubernetes - the cluster layer on top of containers.
CI/CD for cloud deployments - the pipeline that builds and signs your images.
Capital One kill chain - identity chaining via the metadata service, end to end.
CSPM vs CNAPP - runtime visibility and posture tools.
Friday Zoom - container escape stories show up monthly. Drop in.