Data Security, KMS & Secrets for Cloud

A hand holding a padlock - encryption keys are the new locks — Photo by Pixabay on Pexels

Last updated 2026-05-17 · By Shawn Nunley · Vendor-neutral · View source on GitHub

The 30-second version: Cloud data security is two disciplines stitched together: cryptography (the math - AES-GCM, RSA, ECDSA, the curves, the modes, the algorithms) and key management (the operational layer - who holds the keys, how they rotate, how they're audited, what happens when one is compromised). The first is solved; the providers ship the same NIST-approved primitives. The second is where every cloud breach involving "encrypted" data actually fails: a KMS key with a permissive policy, a secret committed to git, a database connection string in an environment variable, a service account with permanent credentials.

The modern stack is customer-managed keys (CMK) in a cloud KMS, envelope encryption for everything that scales (DEK/KEK), an HSM only when a regulator or contract names it, a secrets manager with rotation and dynamic credentials, TLS everywhere with mTLS for service-to-service, and a small set of confidential-computing workloads where the provider must never see plaintext. The work is making all of that legible to an auditor and resilient to the human mistakes that produce most incidents.

Data classification
Encryption at rest
Encryption in transit
KMS fundamentals - envelope encryption
BYOK, HYOK, CMK & customer-managed keys
HSMs & FIPS levels
Secrets management
Kubernetes secrets
Key rotation
Tokenization vs encryption vs hashing
Data Loss Prevention (DLP)
Confidential computing
Database encryption nuances
AWS, Azure, and GCP side-by-side
Maturity stages
Common pitfalls
Further reading
FAQ

Data classification - the upstream of every other control

Every other control on this page is downstream of one question: what is this data, and how bad is it if the wrong person sees it? An org that has never classified its data ends up either over-protecting everything (expensive, slow, and so painful that engineers route around the controls) or under-protecting everything (cheap right up until the incident). Classification is the cheapest, highest-leverage upstream investment in a data-security program.

A workable four-tier scheme - the one most enterprises converge on after a few iterations:

Public

Marketing pages, open-source code, published research. No restriction on disclosure. The control is integrity (don't let it be tampered with), not confidentiality. Sits in CDNs, public buckets, public repos - and the only data-security question is "did we accidentally publish something from a higher tier here?"

Internal

Internal documents, employee directories, non-sensitive operational metrics. Disclosure is embarrassing but not damaging. Encrypted at rest by default; access controlled by org-membership and SSO; no special handling beyond standard logging.

Confidential

Customer data, source code, financial records, employee PII, business plans. Disclosure damages the business or a customer. Encrypted at rest with customer-managed keys; access logged and reviewed; egress monitored; DLP rules apply; backups separately encrypted; production environments segregated from dev/test.

Restricted

Cardholder data (PCI), protected health information (HIPAA), authentication secrets, signing keys, source data subject to legal hold, defense or government data. Disclosure is regulator-reportable, contractually breach-triggering, or career-ending. Tokenized or vaulted where possible; HSM-backed keys; field-level encryption; tight IAM with break-glass auditing; often segregated cloud accounts, sometimes confidential computing.

The tiers are not academic. They determine which KMS key family encrypts the data, which secrets manager holds the credentials, whether DLP rules apply, which network paths the data can traverse, where backups land, and whether confidential computing is on the table. Without classification, every control discussion is about "data" in the abstract - and abstract controls tend to be either everywhere (expensive) or nowhere (insecure).

Practical execution: classify by data owner and storage location, not by individual record. A bucket holds restricted data; a table contains confidential records; a Kafka topic carries internal events. Tag the resource with the classification, propagate the tag through IaC, and let policy enforcement key off the tag. Tools like Macie, Purview Information Protection, and Cloud DLP can auto-discover and label data at scale, but the human-curated owner-mapping is still the spine.

Encryption at rest - what the cloud gives you, what you add

All three major clouds encrypt managed storage at rest by default, with provider-owned keys, using AES-256. S3, EBS, RDS, DynamoDB, Azure Blob, Azure Disks, Azure SQL, GCS, Persistent Disks, BigQuery, Spanner - all of it, encrypted out of the box. This is necessary and insufficient. The defaults protect against physical theft of disks from a data center; they do not protect against an IAM principal in your account who has permission to read the bucket. "Encrypted at rest" in the marketing sense is not what an auditor or regulator means by the same phrase.

The hierarchy you actually want

Provider-managed keys (AWS-owned, Microsoft-managed, Google-managed). The default. Zero operational overhead, no key policy to manage, no rotation work. Appropriate for public and internal-tier data. Cannot satisfy a regulator who asks "can you revoke the cloud provider's ability to decrypt this?"
Customer-managed keys (CMK). Keys you create inside the cloud KMS - AWS KMS, Azure Key Vault, GCP Cloud KMS. The key material is generated inside the provider's HSM, but you own the policy, the rotation schedule, the audit logs, and (importantly) the ability to revoke. This is the right default for confidential-tier data. Minor cost per key and per call; meaningful operational discipline (key policies must be correct or workloads break).
BYOK (Bring Your Own Key). You generate the key material outside the cloud (typically on an on-prem HSM) and import it. Useful when a regulator asks where the key entropy came from. The provider's KMS still operates the crypto.
HYOK / External keys. The key never enters the cloud at all. AWS KMS External Key Store (XKS), Azure Key Vault Managed HSM with HYOK, GCP External Key Manager. The cloud KMS calls out to your external key manager for every operation. Maximum control, real latency cost, real availability risk - the external key manager becoming unavailable means the data is unavailable.

For an opinionated default at most companies: customer-managed keys for everything containing confidential or restricted data, provider-managed for the rest. Don't go to BYOK or HYOK unless a named requirement forces it; the operational tax is real, and CMK satisfies most "we control the keys" questions.

Storage-level details

S3 / Blob / GCS. SSE-S3 (provider key), SSE-KMS (CMK), SSE-C (customer-supplied per-request key, rare). For confidential data, use SSE-KMS with a dedicated CMK per workload - not the default account-level key. Bucket-level "encrypt by default" should be enforced via SCP / Azure Policy / Org Policy so a developer can't create an unencrypted bucket.
EBS / Azure Disks / Persistent Disks. Disk encryption is account-default toggleable on all three. Enable the default; volumes created without encryption should fail policy.
RDS / Azure SQL / Cloud SQL. Encryption-at-rest is per-instance and immutable after creation on all three - you cannot retrofit encryption onto an existing unencrypted database without a dump/restore. Get it right at provisioning.
Backups and snapshots. Easy to forget: a CMK-encrypted database with a default-key-encrypted snapshot is one snapshot copy away from being decryptable by a different key authority. Use the same CMK (or a CMK with explicit cross-account access) for snapshots.

Encryption in transit - TLS everywhere, mTLS for service-to-service

The minimum bar in 2026 is TLS 1.2+ on every external endpoint and every internal service-to-service hop. Anything less is a finding in every framework and a real-world risk on any shared network. TLS 1.3 is now broadly supported and should be preferred where it works; legacy TLS 1.0 / 1.1 should be off by policy, not by intention.

Certificate management

AWS Certificate Manager (ACM) - free public certs for ALB / NLB / CloudFront / API Gateway, auto-rotation. ACM Private CA for internal PKI. The right default on AWS.
Azure Key Vault Certificates - managed certs integrated with App Service, Front Door, Application Gateway. Pairs with a public CA (DigiCert, GlobalSign integration) or internal CA.
Google Cloud Certificate Manager - managed certs for Load Balancing, with Google-managed and customer-uploaded options. Pairs with Certificate Authority Service (CAS) for private PKI.
cert-manager - the Kubernetes-native cert lifecycle controller. Issues from Let's Encrypt, internal CAs, Vault, or the cloud CA services; renews automatically. Pair with a ClusterIssuer pointed at ACM PCA / CAS / Key Vault for production clusters.
Let's Encrypt - free 90-day public certs, automation-only issuance. The right answer for most public-facing services that don't need EV/OV validation.

mTLS for service-to-service

Server-only TLS verifies the server to the client. Mutual TLS (mTLS) verifies both - every service presents a cert, every connection validates both sides. The argument for mTLS is that any plaintext-trusted internal network is a flat network; if a workload is compromised, the attacker can talk to every other service. mTLS forces the attacker to also steal a valid client cert.

The 2026 way to do mTLS at scale is service mesh - Istio, Linkerd, Consul Connect, AWS App Mesh - or the cloud-native equivalents (Anthos Service Mesh, AWS VPC Lattice). The mesh handles cert issuance and rotation transparently via SPIFFE/SPIRE-style identity, so applications don't manage certs themselves.

For non-mesh workloads, SPIFFE / SPIRE provides workload identity and short-lived certs for any compute; AWS Private CA Connector for Kubernetes, Azure AD workload identity, and GCP Workload Identity Federation cover the cloud-native paths.

Don't forget the boring parts

HSTS on every public site. Strict-Transport-Security: max-age=31536000; includeSubDomains; preload. Then submit to the preload list.
HTTP→HTTPS redirects at the edge, not the application.
Database connections - RDS, Aurora, Cloud SQL, Azure SQL all support TLS, but it is not always on by default. Enforce rds.force_ssl / equivalent at the parameter-group level.
Inter-region replication - same story. S3 cross-region replication is over TLS, but check the assumption for any data-replication path that crosses a region or VPC boundary.
Cert expiry monitoring - every team has the story of the outage caused by an expired cert nobody owned. The cloud-native cert managers above eliminate this for managed endpoints; the residual risk is internal services using uploaded certs that nobody renewed. Monitor expiry as a first-class metric.

KMS fundamentals - envelope encryption

Every cloud KMS works the same way under the hood: envelope encryption. Understanding the pattern is the foundation for understanding everything else on this page.

The naïve approach to encrypting an object is to call the KMS, ask it to encrypt N bytes, and store the ciphertext. This breaks immediately at scale: every megabyte of data means a round-trip to the KMS, the KMS becomes a bottleneck, and the cost of N calls × M bytes for any nontrivial workload is untenable. Envelope encryption solves this with two layers.

The two-layer scheme

The application asks KMS for a new data key (DEK). KMS generates a random 256-bit AES key, encrypts it with the long-lived key encryption key (KEK) inside the HSM, and returns both - the plaintext DEK (for immediate use) and the encrypted DEK (for storage).
The application uses the plaintext DEK to encrypt the data locally - AES-256-GCM, AES-256-CBC, ChaCha20-Poly1305, whatever's appropriate. Then the plaintext DEK is wiped from memory; only the ciphertext data and the encrypted DEK remain.
To decrypt: the application sends the encrypted DEK to KMS, KMS unwraps it with the KEK and returns the plaintext DEK, the application decrypts the data, the plaintext DEK is wiped again.

The encrypted DEK is small (a few hundred bytes); it travels with the data, gets stored alongside it. The plaintext KEK never leaves the HSM. The pattern means:

Bulk crypto is fast. Encrypting a 10 GB object means one KMS call (for the DEK) and one local AES operation - not 10 GB of API traffic.
Rotation is cheap. Rotating the KEK doesn't re-encrypt the data; it re-wraps the encrypted DEK (a few hundred bytes per object). Old ciphertext stays valid until the next KEK rotation.
Revocation is instant. Disabling or scheduling-deletion of the KEK renders every encrypted DEK undecryptable, which renders every encrypted ciphertext irretrievable. No need to crawl billions of objects to "delete" them.
The blast radius of a stolen DEK is bounded. A leaked DEK decrypts only the data it was minted for. The KEK never leaves the HSM, so it cannot be leaked.

Key hierarchies

Mature programs don't have a single KEK; they have a tree. A typical hierarchy:

Root key - held in an HSM, used only to wrap account/region master keys. Touched extremely rarely.
Account / region master KEKs - one per account, per region. Used to wrap workload-level KEKs.
Workload KEKs - one per service or per data classification. Different access policies; rotation cadences can differ.
DEKs - generated per object or per session. Ephemeral.

In practice on cloud, the "root" lives in the cloud KMS or an HSM, and you get one or two layers above DEKs (account-level CMK + workload-level CMK). The benefit of layering is that you can rotate a workload key without touching the account key, and revoke access to a workload key without affecting others.

Key policies vs IAM

A subtle and frequently-misconfigured detail: in AWS KMS, the key policy is the primary access-control mechanism, and IAM is secondary. A user with full IAM permissions on KMS does not automatically have permission to use a specific key - the key policy must also grant it (or delegate to IAM via kms:CallerAccount). This is intentional: it prevents an account-wide IAM compromise from automatically giving the attacker key access. Azure Key Vault and GCP Cloud KMS use a single RBAC / IAM model on the key resource; the same defense-in-depth principle applies - keep the key policy tight, audit who is on it, and never use a wildcard * principal unless you really mean it.

BYOK, HYOK, CMK & customer-managed keys

The terminology in this space is genuinely confusing because different vendors use the same acronyms with different meanings. The practical taxonomy:

Model	Key material generated	Key material lives	Crypto operations performed by	When to use
Provider-managed	In provider HSM	Provider HSM	Provider KMS	Default. Public / internal data.
CMK (customer-managed)	In provider HSM	Provider HSM, customer-controlled policy	Provider KMS	Default for confidential data. Right answer for most.
BYOK (import)	On customer HSM	Provider HSM (imported)	Provider KMS	Regulator asks where entropy came from.
HYOK / External keys	On customer HSM	Customer HSM (never enters cloud)	Customer's external HSM, called by provider	Provider must never have unilateral plaintext access.

The right model for most cloud workloads is CMK. It satisfies "do you control the keys?" in every audit context that matters, costs cents per key per month, and doesn't impose latency or availability penalties. The cloud's HSMs are FIPS-validated and operationally hardened in ways that most on-prem HSM deployments aren't.

BYOK is a real, named requirement in a small number of regulatory contexts (some financial-services and government regimes that audit key generation entropy). It is also frequently asked for by procurement teams without a clear understanding of what they're buying - pushing back is reasonable. If you must do BYOK, the operational pattern is: generate the key on an on-prem HSM (Thales / Entrust / Utimaco), import to the cloud KMS via the documented BYOK flow, store the original somewhere safe in case you need to re-import, and rotate on a defined cadence.

HYOK is the answer to "we must be able to revoke the provider's access by yanking the key out of their environment, instantly." That requirement is rare and consequential - every cryptographic operation now depends on the external key manager being reachable, which means the data is unavailable any time the external key manager is. Implementations: AWS KMS External Key Store (XKS), Azure Key Vault Managed HSM with HYOK, Google Cloud External Key Manager. Vendors providing the external HSM side: Thales CipherTrust, Entrust KeyControl, Fortanix, Virtru.

HSMs & FIPS levels

An HSM (Hardware Security Module) is a dedicated cryptographic appliance - physical or virtual-but-isolated - that stores keys and performs cryptographic operations without ever exposing the key material to general-purpose compute. The cloud KMS services are backed by HSMs under the hood (the multi-tenant kind); dedicated HSM services give you a single-tenant boundary.

The cloud HSM offerings

AWS CloudHSM - single-tenant FIPS 140-2 Level 3 HSMs (Cavium / Marvell hardware). PKCS#11, JCE, KSP/CNG APIs. Used when an application requires direct HSM API access, or when key isolation per tenant is mandated.
Azure Key Vault Managed HSM - single-tenant FIPS 140-3 Level 3 HSMs. Same API surface as the standard Key Vault, just dedicated hardware. The 2026 default for regulated Azure workloads needing dedicated HSM.
Google Cloud HSM - FIPS 140-2 Level 3 HSMs, integrated transparently with Cloud KMS (no separate API; you just choose the HSM protection level when creating the key). The lowest-friction model of the three.
Azure Dedicated HSM - full Thales Luna HSMs in a VNet. Lift-and-shift target for existing on-prem HSM applications; less common now that Managed HSM exists.

FIPS 140-2 / 140-3 levels - what they actually mean

NIST FIPS 140 is the standard cryptographic-module validation framework. Level numbers don't compose simply; they describe distinct properties:

Level 1. Approved algorithms, no physical protection. A software cryptographic library can be Level 1.
Level 2. Tamper-evident physical packaging - you can tell if someone opened it.
Level 3. Tamper-resistant - the device actively responds to physical tampering (zeroizes keys, locks down). This is the common bar for cloud HSM services.
Level 4. Tamper-active environmental protection - voltage, temperature, EM emission protection. Rare; nation-state / defense use.

FIPS 140-3 is the current version, superseding 140-2; new validations are 140-3. Most cloud HSM services validate at Level 3, which is what regulated industries typically require. If a customer or regulator asks for "FIPS 140-3 Level 3", the cloud HSM offerings (or the multi-tenant KMS, depending on the precise requirement) check the box.

Practical guidance: use the cloud KMS for almost everything. Step up to a dedicated HSM when one of: (a) a regulator names "dedicated HSM" or "single-tenant key isolation" specifically; (b) the application requires PKCS#11 or KSP/CNG direct access; (c) you have an existing on-prem HSM-rooted PKI you're migrating. Don't pay for a dedicated HSM because someone said "HSM" in a meeting.

Secrets management - passwords, tokens, API keys, certs

Encryption protects data; secrets management protects the credentials that grant access to data. The two failure modes a secrets manager prevents: secrets committed to source control, and secrets that never rotate. Every cloud has a managed secrets service; HashiCorp Vault is the open-source / multi-cloud heavyweight; the ecosystem has converged on a common feature surface.

Capability	AWS Secrets Manager	Azure Key Vault	GCP Secret Manager	HashiCorp Vault
Automatic rotation	Native for RDS, DocumentDB, Redshift; Lambda for custom	Native for Azure SQL, MySQL, PostgreSQL via Event Grid	Customer Lambda-style via Cloud Functions / Workflows	Native for many engines (DB, AD, cloud)
Dynamic secrets	Limited - rotation, not on-demand minting	No	No	Yes - the headline feature
Versioning	Yes (AWSCURRENT / AWSPENDING / AWSPREVIOUS)	Yes (per-secret version history)	Yes (numbered versions)	Yes
Cross-region replication	Yes	Via geo-replication of Key Vault	Yes (per-secret replication policy)	Performance replication tier
Audit log	CloudTrail (every Get is logged)	Activity Log + Key Vault diagnostic logs	Cloud Audit Logs	Vault audit devices (file, syslog, socket)
Cost shape	Per-secret/month + API calls	Per-transaction	Per-version-month + access ops	License + ops (open-source available)
Multi-cloud	No	No	No	Yes - the headline reason to choose Vault

Dynamic secrets - why Vault is different

The cloud-native secrets managers store and rotate static secrets - they hold a value, return it on request, and rotate it on a schedule. HashiCorp Vault's headline capability is dynamic secrets: when a workload requests a database credential, Vault mints a brand-new one on the fly (creates the user in the database, returns the credential, sets a TTL), and revokes it when the TTL expires. The credential exists only for the lifetime of the workload that asked for it. There is no "current password" to rotate; every credential is single-use.

Dynamic secrets eliminate the "secret in memory long enough to be exfiltrated" risk for whole categories of workload. AWS, Azure, and GCP have partial equivalents - IAM Roles for EC2, Managed Identities, Workload Identity - but those are cloud-native and don't extend to the database, the message broker, the CA, or the partner API. Vault's value proposition is uniform dynamic-secrets mechanics across all of them.

Ergonomics - the often-decisive factor

Secrets managers fail when developers route around them. The patterns that make adoption stick:

Workload identity, not credentials. The application authenticates to the secrets manager via its cloud identity (IAM role, managed identity, workload identity) - never with a static credential. This is the prerequisite; if your app needs a username/password to fetch its other secrets, you've recursed the problem.
SDK or sidecar injection, not env vars. Secrets retrieved at runtime, held in memory, never written to disk or env. AWS Secrets Manager Agent, Azure Key Vault SDK, Vault Agent sidecars.
Reasonable caching with bounded TTL. Hitting the secrets manager on every request is slow and expensive. Cache for a defined TTL (5-15 minutes is typical) and refresh on cache miss or rotation event.
Local-dev parity. If the secrets manager doesn't have a sane local-dev story (LocalStack, a dev tenant, a stub), developers will hardcode a value "just for testing" and forget. Provide the dev path explicitly.

How developers actually consume secrets - the four surfaces

The capabilities above are necessary but the breach reality is that secrets leak from the workflows that consume them, not from the vaults that store them. Four surfaces matter, each with its own integration pattern. The deep treatment for each lives on the corresponding topic page; the summary here is the bridge.

Local development & IDEs. aws sso login, az login, gcloud auth, plus wrappers like aws-vault, op run (1Password), doppler run, infisical run, and Vault Agent on workstations. The goal: kill the .env file. See IAM - Secrets in developer workflows.
CI/CD pipelines. OIDC federation to the cloud (no static keys in CI), vault retrieval at job-start using the OIDC identity, environment-scoped CI secret stores as a last resort, and secret-scanning gates (TruffleHog, Gitleaks, GitHub push protection) as the safety net. See CI/CD - Secrets in pipelines.
Terraform / OpenTofu. Dynamic provider credentials (HCP Terraform, GitHub Actions, GitLab) for the cloud auth, and Vault / cloud-native data sources (aws_secretsmanager_secret_version, azurerm_key_vault_secret, google_secret_manager_secret_version) for secret values. State files must be treated as secrets - encrypted backends are mandatory; OpenTofu's native state encryption raises the floor.
Kubernetes clusters. KMS-backed etcd encryption as the baseline, then External Secrets Operator, Secrets Store CSI Driver, Vault Agent Injector, SOPS, or Sealed Secrets depending on the operating model. See Kubernetes secrets (below) for the cryptography angle and Kubernetes - Cluster secrets & vault integration for the cluster-operator view.

The connective tissue across all four: workload identity is how the consumer authenticates to the vault. If your application needs a static credential to fetch its other static credentials, you've recursed the problem. Federated identity at every retrieval is the only pattern that scales.

Kubernetes secrets - base64 isn't encryption

The number-one cloud-security pitfall in Kubernetes is the assumption that kind: Secret means encrypted. It doesn't. A Kubernetes Secret object is base64-encoded YAML; echo $value | base64 -d recovers the plaintext, and anyone with the right RBAC to call kubectl get secret (or to read etcd directly) sees it.

The patterns that actually protect secrets in Kubernetes - typically used in combination:

1. Encrypt etcd at rest

The first thing: configure the API server to encrypt Secret resources before writing them to etcd. KMS providers for AWS KMS, Azure Key Vault, and GCP KMS make this seamless on managed clusters (EKS, AKS, GKE all support it natively). This protects against etcd snapshot exfiltration and against attackers who reach the underlying storage. It does not protect against an attacker who has API access - the API server will happily decrypt and serve.

2. External secrets operator (ESO)

External Secrets Operator is the de-facto standard for pulling secrets from an external secrets manager (AWS Secrets Manager, Azure Key Vault, GCP Secret Manager, Vault, 1Password, others) into Kubernetes Secret objects on demand. The cluster doesn't store the secret long-term; ESO refreshes it on a schedule. This keeps the source of truth in the secrets manager and gives you rotation for free.

3. Secrets Store CSI driver

The Secrets Store CSI Driver mounts secrets directly as volumes from the external secrets manager - they never become Kubernetes Secret objects at all. Lower attack surface than ESO (no etcd write), but a different operational model (volume-mounted files rather than env vars). Most cloud providers ship a provider plugin (AWS, Azure, GCP, Vault).

4. SOPS for git-stored encrypted secrets

SOPS (Secrets OPerationS) encrypts individual values inside YAML / JSON / env / ini files, using a cloud KMS or PGP. The encrypted file is safe to commit to git; only the values are encrypted, the keys remain readable so diffs are sane. Pairs with Flux and Argo CD for GitOps workflows. Right answer for declarative cluster configs where the secrets need to ship through git.

5. Sealed Secrets

Bitnami Sealed Secrets is the original "commit encrypted secrets to git" pattern - a controller in-cluster has a public key, anyone can encrypt a secret against it, only that controller can decrypt. The trade-off vs SOPS: cluster-bound (secrets are encrypted to a specific cluster's keypair), and the controller is single-point-of-failure for decryption.

Picking among them

Production cluster, secrets live in a cloud secrets manager: ESO or CSI driver.
GitOps workflow with secrets in repo: SOPS (multi-cluster) or Sealed Secrets (single-cluster).
Multi-cloud, want one secrets layer for everything: Vault + ESO/CSI.
Any of the above without etcd encryption enabled: incomplete - turn on etcd encryption first.

See the Kubernetes page for the broader cluster-hardening context.

Key rotation - what it actually does

"Rotate your keys" is one of those compliance-checkbox phrases that hides genuinely different mechanics depending on what kind of key. Treating them all the same is how programs end up either over-rotating (operational pain, outages) or under-rotating (the auditor finds the 4-year-old static key).

KEK rotation - re-wrap, not re-encrypt

When AWS KMS, Azure Key Vault, or GCP Cloud KMS rotates a customer-managed symmetric key, it creates a new key version. The old version stays around (because it's needed to decrypt old data); new encrypt operations use the new version. The data itself is not re-encrypted. The encrypted DEKs in storage are still wrapped with the old KEK version, and the KMS keeps that version available indefinitely for decryption. Eventually - on the next backup cycle, on the next write - the DEK is unwrapped, the data is rewritten, and the new DEK is wrapped with the current KEK version.

Automatic annual rotation is offered on all three clouds for customer-managed symmetric keys; turn it on and largely forget about it. Asymmetric keys (RSA, ECDSA) used for signing typically don't auto-rotate because rotating them means re-issuing every signed artifact - handle manually with a planned cadence and overlap.

DEK rotation - per-object or per-session

DEKs are ephemeral by design. A new one is generated for each object or each session. "Rotating" a DEK means generating a new one for the next write; the old encrypted data is still readable via its own (still-wrapped) DEK until it's overwritten. There is no separate operation to perform.

Secret rotation - actually changing the credential

When you rotate a database password or an API key, the credential value changes. Every consumer needs the new value. The two patterns:

Brokered rotation. A secrets manager owns the password, rotates it in both the consumer and the producer (database). The consumer reads the secret on each connection - it always gets the current value. AWS Secrets Manager's RDS rotation works this way.
Two-secret overlap. AWSPENDING and AWSCURRENT - a new secret is provisioned, consumers start receiving it, after a grace period the old one is decommissioned. Required when the producer can't atomically update.

Cadence by data class

Restricted (PCI, HIPAA, signing keys). KEKs annually (or per regulator); secrets every 30-90 days; static service-account keys eliminated entirely.
Confidential. KEKs annually; secrets 90 days; certs at issuer cadence (90 days for ACM / Let's Encrypt, 1 year for many internal CAs).
Internal / public. KEKs annually (it's nearly free); secrets when needed (compromise, departure).

The single highest-leverage move is eliminate long-lived static credentials - service account keys, IAM access keys, machine passwords. Replace with workload identity (IRSA, Managed Identity, Workload Identity Federation, SPIFFE) and short-lived OIDC tokens. A credential that lives for 15 minutes doesn't need a rotation policy.

Tokenization vs encryption vs hashing

Three related-but-distinct techniques. Choosing wrong is a common cause of "we technically protect this data, but the protection doesn't help us."

Technique	Reversible?	Original value usable in place?	Reduces audit scope?	Use when
Encryption	Yes, with the key	After decrypt	No (the encrypted data is still the data)	Application needs the original; protection is at rest / in transit
Tokenization	Yes, by lookup against the vault	No (token has no relation to value)	Yes - non-vault systems hold only tokens	PCI scope reduction, GDPR pseudonymization, sharing data with third parties
Hashing (one-way)	No	No (only comparison)	Yes - hash is not the data	Passwords, deduplication keys, anonymous IDs - never recover the original

Tokenization - the PCI scope-reduction lever

The classic example: a SaaS company processes credit cards. The naïve approach stores the PAN (primary account number) encrypted in the production database; every system that touches that database is now in PCI DSS scope. The tokenized approach stores the PAN only in a dedicated tokenization vault (typically a third-party service: Stripe, Adyen, Braintree, Skyflow, VGS); every other system holds an opaque token like tok_3K7s.... PCI scope shrinks to the vault and the small set of systems that detokenize. Order of magnitude cost reduction in audit, in pen-test, in controls implementation.

Tokenization is also the cleanest implementation of GDPR pseudonymization (Article 4(5)) - the personal data is in the vault, the analytics warehouse holds tokens, the lookup table is the controlled re-identification path.

Hashing - only one-way

Hashing is for cases where you never need to recover the original. Two common uses: password storage (use a slow KDF - Argon2id, scrypt, or bcrypt with a strong cost factor - never raw SHA-256, which can be brute-forced billions of times per second on a GPU), and stable anonymous identifiers (hash an email address with a secret salt to produce a consistent identifier that can't be reversed without the salt).

For password hashing specifically: Argon2id with at least 19 MiB memory, 2 iterations, 1 parallelism is the OWASP 2024 recommendation. If using bcrypt, cost factor 12+. Never SHA-256, MD5, or any unsalted hash.

Format-preserving encryption (FPE)

A specialty technique - encrypts so that the ciphertext has the same length and character set as the plaintext (e.g., a 16-digit number stays 16 digits). Useful when retrofitting encryption into a legacy system that can't accept variable-length blobs. NIST SP 800-38G defines FF1 and FF3 modes. Use only when format preservation is genuinely required; standard AES is preferable when the schema allows it.

Data Loss Prevention (DLP)

DLP is the discipline of finding sensitive data where it shouldn't be - PCI card numbers in a Slack channel, social security numbers in a misclassified bucket, source code on a personal Gmail. The cloud-native services emphasize data discovery and classification; the third-party tools emphasize egress monitoring and exfiltration prevention. Mature programs run both.

Cloud-native DLP

Amazon Macie - sensitive-data discovery in S3. Built-in identifiers for PII, PCI, PHI, credentials; custom identifiers via regex and keyword. Findings to Security Hub and EventBridge. Pricing has historically been the limiting factor at scale; read the docs before turning it loose on a petabyte.
Microsoft Purview Information Protection + Purview DLP - broader scope: data discovery across M365, Azure storage, on-prem, plus DLP policies that block sharing and copying. Strong fit if you're already a heavy M365 / Azure shop.
GCP Sensitive Data Protection (formerly Cloud DLP) - API-based inspection of any data you send it; integrates with BigQuery, GCS, and arbitrary text. Strong de-identification primitives (tokenization, masking, FPE, k-anonymity) built in.

Third-party DLP

Nightfall - SaaS-data-leak focused (Slack, GitHub, GDrive, Jira, Salesforce). Good fit for the "PCI ended up in a Slack channel" problem class.
Cyberhaven - data lineage and tracing approach; tracks where data actually flows rather than relying on classification patterns. Useful for IP-theft / insider-risk programs.
Netskope, Zscaler, Skyhigh Security - CASB / SSE platforms with DLP modules for SaaS and web egress.
Varonis, BigID - data discovery and classification at enterprise scale, deeper than Macie/Purview on legacy and on-prem data stores.

DLP is high-noise by nature - false positives are inevitable, and a DLP program that doesn't tune aggressively becomes either ignored (alert fatigue) or hated (false blocks). Start with monitor mode, classify the findings by signal-to-noise ratio, and only move the highest-confidence rule families into blocking mode after tuning.

Confidential computing - protecting data in use

Encryption at rest and in transit protect data when it's stored and when it's moving. They don't protect it while it's being processed - at the moment the CPU operates on data, it's in plaintext in memory. Confidential computing closes that third gap with hardware Trusted Execution Environments (TEEs) that keep memory encrypted even from the host OS, hypervisor, and cloud-provider personnel.

The hardware primitives

Intel SGX - application-level enclaves. Small TCB, strong isolation, but requires application-level porting. Older approach; deprecated for client CPUs, available on select Xeon Scalable.
Intel TDX (Trust Domain Extensions) - VM-level confidential computing. The whole VM runs in a TDX trust domain; the hypervisor cannot read its memory. The 2026 mainstream Intel approach.
AMD SEV-SNP - Secure Encrypted Virtualization with Secure Nested Paging. VM-level memory encryption with integrity protection. The mainstream AMD approach; broadly deployed in cloud.
ARM CCA (Confidential Compute Architecture) - Realms on ARMv9. Emerging; relevant for ARM-based cloud instances.

The cloud offerings

AWS Nitro Enclaves - isolated CPU/memory regions carved out of a parent EC2 instance; no networking, no persistent storage, attestation via Nitro hypervisor. Different model from SGX/TDX/SEV - leverages the Nitro architecture rather than CPU enclaves directly. Popular for key-management workloads (KMS custom key stores, signing services), private LLM inference, multi-party computation.
Azure Confidential VMs & Confidential Containers - VMs on AMD SEV-SNP or Intel TDX; container support via Confidential Container Instances and AKS Confidential Containers. Broad and increasingly default-on for sensitive workloads.
Google Confidential VMs & Confidential GKE Nodes - AMD SEV-SNP and Intel TDX. Confidential GKE Nodes is the easiest entry path on GCP; just a node-pool flag.

When confidential computing matters

It's overkill for most workloads. The named use cases:

Cloud-provider zero-trust scenarios. Regulatory or contractual requirement that the provider not be able to read data even in theory - sovereign workloads, certain financial-services and healthcare deployments.
Multi-party computation. Two distrustful parties want to compute on combined data (joint analytics, fraud detection across institutions) without revealing the underlying records to each other. The TEE is the trusted intermediary.
High-value secrets in use. Signing services, LLM inference on proprietary models, root-of-trust operations. The TEE protects the secret while it's being used, not just while it's stored.
Attestation requirements. "Prove to me that this code is running on this hardware in this configuration." TEE remote attestation produces a hardware-signed measurement.

If none of those apply, default to good IAM, good KMS, and good network controls. Confidential computing has real cost - instance-type restrictions, slightly higher pricing, attestation operational overhead - that's worth paying when the requirement is named and skipped otherwise.

Database encryption nuances

"Encrypt the database" hides several distinct controls operating at different layers. Knowing which one you have determines what threat it actually mitigates.

Transparent Data Encryption (TDE)

The database engine encrypts files on disk as it writes them, decrypts as it reads. Transparent to the application; SQL sees plaintext. Protects against attackers who get the raw storage (a stolen disk, an exfiltrated backup file). Does not protect against attackers with database credentials or with DBA access. AWS RDS, Azure SQL, Cloud SQL all do this by default with provider-managed or customer-managed keys.

TDE alone is what "encrypted database" means in most marketing copy. It's necessary, it's not sufficient for sensitive data - anyone who can log into the database sees plaintext.

Column-level / field-level encryption

Specific columns (the SSN, the credit card, the medical record number) are encrypted, with the key held outside the database. The application encrypts before insert and decrypts after read; the database stores only ciphertext. Protects against DBA access, against compromised database credentials with limited read scope, and against backup exfiltration. Adds complexity - searching encrypted columns requires special handling.

Deterministic vs randomized encryption

A subtle but important choice for column-level encryption:

Randomized. Each encryption of the same plaintext produces different ciphertext (random IV). Stronger; no information leakage. But: WHERE column = 'value' doesn't work - every row has different ciphertext.
Deterministic. Same plaintext always produces the same ciphertext. Allows equality lookups (you encrypt the search value and compare). But: leaks frequency information (an attacker sees which rows have the same value).

The right answer depends on the column. A patient's last name probably needs randomized (and you index on a separate keyed hash for lookups); a country code probably tolerates deterministic.

Application-level encryption

The encryption happens entirely in the application; the database is just a blob store. Maximum control, maximum complexity, no database features (indexing, full-text search, aggregations on encrypted fields) work without specialized infrastructure (searchable encryption, ORE schemes, or homomorphic encryption - all with real performance costs).

The pragmatic guidance: TDE everywhere as the floor. Column-level for restricted data (PCI / PHI / authentication). Application-level for the rare case where you don't want the database engine to ever have access to plaintext (and you're willing to live without indexes on those fields).

AWS, Azure, and GCP side-by-side

The native data-security capabilities each cloud ships, reduced to a one-screen reference:

Capability	AWS	Azure	GCP
Managed KMS	AWS KMS (multi-tenant HSM-backed)	Azure Key Vault (Standard / Premium)	Cloud KMS
Dedicated HSM	CloudHSM (FIPS 140-2 L3)	Key Vault Managed HSM (FIPS 140-3 L3); Dedicated HSM (Thales Luna)	Cloud HSM (FIPS 140-2 L3)
External keys (HYOK)	KMS External Key Store (XKS)	Key Vault Managed HSM HYOK	External Key Manager (EKM)
Secrets management	Secrets Manager; SSM Parameter Store (free, simpler)	Key Vault (secrets + keys + certs unified)	Secret Manager
Certificate management	ACM (public + Private CA)	Key Vault Certificates	Certificate Manager + CAS
Confidential compute	Nitro Enclaves; SEV-SNP & TDX instance families	Confidential VMs (SEV-SNP, TDX); Confidential Containers	Confidential VMs (SEV-SNP, TDX); Confidential GKE Nodes
Data classification / DLP	Macie	Purview Information Protection & DLP	Sensitive Data Protection (Cloud DLP)
Workload identity	IAM Roles for Service Accounts (IRSA); EC2 Instance Profile	Managed Identity; AKS Workload Identity	Workload Identity Federation; GKE Workload Identity
Database TDE	RDS, Aurora, Redshift - CMK supported	Azure SQL, Managed Instance - CMK supported	Cloud SQL, Spanner, BigQuery - CMEK supported
Storage default encryption	S3, EBS, RDS, DDB - on by default	Blob, Disks, SQL - on by default	GCS, PD, Cloud SQL - on by default
Key audit log	CloudTrail (every KMS operation)	Key Vault diagnostic logs to Log Analytics	Cloud Audit Logs (data access)

The native primitives are functionally equivalent at the 80% mark. The interesting differences are operational ergonomics: Azure Key Vault unifies keys / secrets / certs in one service (convenient when you want everything together; awkward when you want them on separate policies); AWS separates KMS / Secrets Manager / ACM (cleaner separation, more services to wire up); GCP's Cloud KMS is the lowest-friction model for envelope encryption integration with managed data services (most have a one-line "use this CMEK" flag).

Maturity stages

A staging model for a cloud data-security program:

Stage 1 - Defaults on

Provider-managed encryption at rest on all storage. TLS on all public endpoints. Some secrets are in environment variables, some in Parameter Store / Key Vault. No formal data classification. KMS used "for things that need it" without a written policy. Key rotation is manual when it happens.

Stage 2 - Customer-managed

Data classification scheme published and tagged on resources. Confidential and restricted data on customer-managed KMS keys with automatic annual rotation. Secrets centralized in a managed secrets service with rotation enabled for databases. No static IAM access keys in production. TLS 1.2+ enforced via policy. Etcd encryption on Kubernetes.

Stage 3 - Engineered

Key hierarchy with per-workload CMKs. Dynamic secrets for databases via Vault or equivalent. mTLS between services via service mesh. External Secrets Operator (or equivalent) in every cluster. DLP discovery running on data lakes. CSPM rules ensure no untagged or improperly classified resources. SBOM and signed artifacts in the deploy pipeline.

Stage 4 - Strategic

Confidential computing for the named workloads. HYOK for the specific regulatory case. Field-level encryption for restricted columns. Tokenization vault reducing PCI scope. Hardware-attested LLM inference. Key compromise drills run quarterly. Data security informs product (tenant-managed keys as a feature, data-residency as a SKU).

The skip-stage cost is real here. Jumping to Stage 4 (confidential computing, HYOK) before Stage 2 (basic CMK and secrets hygiene) is paying for nation-state-resistant cryptography while still committing API keys to git.

Common pitfalls

Secrets in environment variables committed to git. The single most common cloud-breach precursor. A leaked AWS key gets weaponized in minutes. Pre-commit scanning (Gitleaks, TruffleHog) + secrets manager + workload identity is the answer. Rotate every secret that touched a public repo, immediately and irrevocably.
IAM-only key policies. An AWS KMS key with the default key policy (and only the default) grants the whole account root + admin IAM access. If a developer with admin IAM is compromised, the keys are compromised. Tighten key policies to specific principals; don't rely on IAM alone.
Encrypted bucket with public read. S3 encryption protects the bytes from someone who steals the disk. It does not stop a public-read bucket policy from serving the plaintext to the internet. "Encrypted" and "private" are independent properties. Both must be enforced.
No rotation policy because "automatic rotation is on." Automatic KEK rotation on the cloud KMS doesn't rotate database passwords, API keys, certificates, or service account keys. Inventory every secret category, define a rotation cadence per category, automate or alert when it lapses.
One KMS key for everything. A single account-level CMK encrypting every workload's data is one access-policy mistake away from a blast-radius incident. Per-workload CMKs are cheap (cents per month) and contain the blast radius.
Backups encrypted with a different key authority than primary. The production database uses your CMK, the cross-region snapshot uses the default key, the audit log says "encrypted" but you can't actually revoke the snapshot. Check the encryption configuration of every replica, snapshot, and backup explicitly.
Treating kind: Secret as encrypted. It isn't. Enable etcd encryption-at-rest with a KMS provider, and pull real secrets from a real secrets manager via ESO or the CSI driver.
Static service account / IAM keys with no expiry. A 4-year-old service account JSON file in a CI environment is a finding waiting to happen. Eliminate via Workload Identity Federation / OIDC / IRSA. If you must use a key, set a calendar reminder and rotate.
TDE called out as "encryption" in audit responses without context. TDE protects against disk theft, not against credential compromise. If the auditor's actual question is "can a rogue DBA read this data?" TDE is not the answer; column-level encryption or tokenization is.
Self-signed or never-renewed internal certs. Internal-only doesn't mean unmonitored. Expired internal certs cause outages; self-signed certs prevent meaningful trust verification. Run a proper internal CA (ACM PCA, CAS, Vault PKI) and let cert-manager / equivalent handle the lifecycle.

FAQ

What is envelope encryption and why does every cloud KMS use it?

Envelope encryption is a two-layer scheme: the data is encrypted with a data encryption key (DEK), and the DEK itself is encrypted with a key encryption key (KEK) that lives inside the KMS or HSM. The encrypted DEK travels alongside the data; the plaintext KEK never leaves the secure module. The reason every cloud KMS works this way is performance and blast-radius: bulk symmetric encryption of a 10 GB object with a local DEK is orders of magnitude faster than round-tripping every byte through KMS, and rotating the KEK only requires re-wrapping the small DEK, not re-encrypting petabytes of data. It also means that revoking access to the KEK instantly renders all the data ciphertexts undecryptable, even if an attacker has copied them.

What is the difference between BYOK, HYOK, and customer-managed keys?

Customer-managed keys (CMK) are keys you create and administer inside the provider's KMS - the key material is generated by the provider's HSMs, and you control the policy, rotation, and lifecycle. BYOK (Bring Your Own Key) means you generate the key material outside the cloud (often on an on-prem HSM) and import it; the cloud KMS still operates the cryptography, but the root entropy came from you. HYOK (Hold Your Own Key) means the key material never enters the cloud at all - the KMS calls out to an external key manager (Azure Key Vault Managed HSM with HYOK, GCP External Key Manager, AWS KMS XKS) for every cryptographic operation. CMK is the right default for almost everyone; BYOK is for orgs with a regulatory or contractual requirement to control key generation; HYOK is for the small set of workloads where the provider must never have unilateral access to the plaintext, and you are willing to accept the latency, availability, and operational cost.

Are Kubernetes secrets actually encrypted?

By default, no. A Kubernetes Secret object is base64-encoded - not encrypted. Anyone with read access to the etcd database, or to the Kubernetes API with the right RBAC, sees the plaintext after a trivial decode. To get real encryption you need encryption-at-rest configured on etcd (a KMS provider plugin pointing at AWS KMS, Azure Key Vault, or GCP KMS), and you should additionally pull secrets from an external secrets manager via External Secrets Operator, the Secrets Store CSI driver, or SOPS-encrypted manifests. Sealed-secrets is a related pattern that lets you commit encrypted secrets to git safely. Treat the in-cluster Secret object as a delivery mechanism, not as the source of truth.

How often should keys be rotated?

It depends on the key's role. KEKs in a cloud KMS rotate cheaply (annual is the common cadence, and AWS, Azure, and GCP all support automatic annual rotation of customer-managed symmetric keys) because rotation only re-wraps the DEKs - it does not re-encrypt the underlying data. DEKs rotate per object or per session; the application generates a fresh DEK for each large object or each connection. Long-lived secrets (database passwords, API tokens) should rotate on a 30-90 day cadence, ideally driven by a secrets manager that brokers dynamic credentials so the rotation is invisible to the consumer. Certificates rotate per their lifetime (90 days for Let's Encrypt and ACM private CA defaults). Static service-account keys should not exist at all - replace with workload identity or short-lived OIDC tokens.

When should I tokenize instead of encrypt?

Tokenize when the goal is to remove sensitive data from a system entirely, not just to make it unreadable in storage. The canonical use case is PCI DSS scope reduction: a tokenization service holds the cardholder data, every other system holds only opaque tokens, and PCI scope shrinks to the tokenization vault. Tokenization is also a good fit for GDPR pseudonymization - the analytics system gets tokens, the lookup table sits in a controlled environment. Encrypt when the data has to remain usable in place (the application needs to decrypt and operate on it). Hash when you only need a one-way comparison and never need to recover the original (passwords, deduplication keys); use a slow KDF like Argon2id, scrypt, or bcrypt for password hashing, never a raw SHA-256.

Do I need an HSM, or is the cloud KMS enough?

For the vast majority of workloads, the cloud's default KMS is enough. AWS KMS, Azure Key Vault Standard, and Google Cloud KMS all back keys with FIPS 140-2 Level 3 (and increasingly FIPS 140-3) validated HSMs under the hood - you don't see the HSM, but the cryptography is HSM-protected. Step up to a dedicated HSM (CloudHSM, Azure Key Vault Managed HSM, Google Cloud HSM) when you need single-tenant key isolation, full FIPS 140-3 Level 3 attestation for an auditor who specifically asks, or PKCS#11 / JCE / KSP-style direct API access for legacy applications. Dedicated HSMs cost meaningfully more and have lower availability guarantees than shared KMS - pay for them only when the requirement is named.

What is confidential computing and when does it matter?

Confidential computing keeps data encrypted while it is being processed - not just at rest and in transit, but in CPU registers and RAM. Hardware Trusted Execution Environments (TEEs) - Intel SGX and TDX, AMD SEV-SNP, AWS Nitro Enclaves, Azure Confidential VMs, Confidential GKE - create an isolated execution context whose memory is encrypted and whose state cannot be inspected by the host OS, hypervisor, or cloud provider personnel. It matters in three scenarios: regulatory or contractual requirements that the cloud provider not be able to read the data even theoretically (sovereign workloads, some financial-services and healthcare use cases); multi-party computation where two distrustful parties want to compute on combined data without revealing it to each other; and processing of high-value secrets like signing keys or model weights where you want hardware attestation of the runtime. For most workloads, confidential computing is overkill; for the ones that need it, nothing else will do.

Where next

IAM & Identity - the access control that decides who can call your KMS in the first place.
Zero Trust - the architectural pattern data security plugs into.
GRC - turning key-rotation and encryption posture into audit evidence.
Kubernetes - cluster-level secrets, etcd encryption, and CSI patterns.
CSPM vs CNAPP - the posture tooling that catches data-security drift.
Friday Zoom - KMS hierarchies, secrets rotation, and confidential computing come up often. Drop in.