Backup, DR & Ransomware Resilience in Cloud

Rows of illuminated server racks in a data center — Photo on Pexels

Last updated 2026-05-17 · By Shawn Nunley · Vendor-neutral · View source on GitHub

The 30-second version: Cloud ransomware doesn't just encrypt your production data - it encrypts your backups first. The discipline of backup and DR moved from operations to security the day attackers learned that destroying the recovery path is what makes a ransom non-negotiable. Immutability is the new MFA: a backup that can be deleted by any production identity is not a backup, it is a liability with a retention policy.

The cloud-native pattern that holds: 3-2-1-1-0 (three copies, two media types, one offsite, one immutable, zero errors verified by restore), a pull-based backup architecture into a separate account / subscription / project with no admin trust to production, Compliance-mode Object Lock / Vault Lock with KMS keys in a different trust domain, and a quarterly restoration drill that proves you can actually do this under stress. Everything else is documentation.

Why this is a security control, not just ops
The 3-2-1 rule (and the 3-2-1-1-0 update)
RTO and RPO
Backup vs replication vs DR vs HA
Immutability - what cloud providers offer
Air-gapping in cloud
Encryption & KMS key custody
The cloud ransomware kill chain
Cloud-specific ransomware techniques
Backup architectures
AWS backup landscape
Azure backup landscape
GCP backup landscape
Database DR patterns
Restoration drills
The Encryption Key Custody pattern
Cyber insurance
Third-party backup/DR platforms
Tabletop exercises
Maturity stages
AWS / Azure / GCP comparison
Common pitfalls
Further reading
FAQ

Why this is a security control, not just ops

For most of computing's history, backup was an operations problem. The threats were tape failures, disk failures, building fires, accidental rm -rf. The control objective was availability - can we resume serving customers? Backup teams reported to infrastructure leaders, ran on quarterly tape-rotation cadences, and were measured on whether the nightly job ran clean.

Ransomware changed all of it. Once attackers learned that a victim with viable backups will not pay, the rational economic move for the attacker became to destroy or encrypt the backups before the production data. Modern ransomware operators now spend days to weeks in a target environment specifically to find, enumerate, and corrupt every backup destination they can reach. The attack on production is the last step, not the first.

That single shift moves backup squarely into the security org. The control objective is no longer "can we recover from a server fire?" - it is "can we recover from a thinking adversary who has spent two weeks inside our cloud account learning where our backups live, who has access to delete them, and how to break the chain of custody on the encryption keys?"

The practical implications:

Immutability is the new MFA. Just as multi-factor authentication moved from "nice to have" to "the floor of basic hygiene" over a decade, immutability - a write-once, read-many guarantee that even a privileged identity cannot bypass - moved from "premium feature" to "table stakes" in the ransomware era. A backup that can be deleted by anyone, including by your own root account or global admin, is not a defensive control. It is, at best, a recovery convenience.
The backup destination is a security boundary. It needs its own IAM model, its own network controls, its own logging, its own detection rules, its own incident-response runbooks. Treating it as just another bucket in production is how programs die.
Restoration is a tested capability, not a documented one. The auditor question "do you have backups?" has been replaced by "when did you last restore, at what scale, in how long, against what threat model?"
Cyber-insurance carriers now require it. The market that used to accept "we use AWS, we have backups" now demands documented immutability, separate-account isolation, MFA on the recovery role, and tested restore. (See cyber insurance below.)

If your org still has backup reporting into pure infrastructure with no security review of the destination architecture, you have a 2018 program defending a 2026 threat model. Fixing it is among the highest-leverage moves any cloud security team can make.

The 3-2-1 rule (and the 3-2-1-1-0 update)

The classic 3-2-1 rule, coined by photographer Peter Krogh in 2005 for protecting digital negatives, became the lingua franca of backup practitioners because it survived every technology change since:

3 copies of the data (the original + 2 backups)
2 different storage media
1 copy stored offsite

It held up well for two decades of tape and disk. Cloud and ransomware exposed two gaps: nothing in the rule prevented an attacker from deleting all three copies if they had the right credentials, and nothing required that the copies actually be readable. Veeam popularized the updated formulation, now widely adopted:

3-2-1-1-0:

3 - copies of the data
2 - different storage media (or in cloud, different storage classes / services)
1 - copy offsite (cross-region, cross-account, cross-cloud, or on-prem)
1 - copy immutable or air-gapped - cannot be modified or deleted within retention
0 - errors. Every restore is verified. If you don't restore, you don't have a backup.

The "1" for immutability and the "0" for verification are the security additions. Both are non-negotiable in a ransomware-aware architecture. The remaining three (copies / media / offsite) translate cleanly to cloud - though the meaning of "different media" deserves a footnote.

Different media in cloud, practically: the spirit of the original rule was to avoid a single technology failure wiping all copies. In cloud, the equivalent is to avoid a single service or account failure / compromise wiping all copies. Two S3 buckets in the same account are not "two media." S3 in one account + S3 in a separate account with cross-region replication + Glacier Deep Archive on a Vault Lock is closer in spirit - three services, three trust boundaries, three failure modes.

RTO and RPO

The two numbers every backup conversation eventually returns to. They are business decisions disguised as engineering parameters; the security team's job is to translate the business intent into a defensible target, then deliver against it.

RTO - Recovery Time Objective

How long after a disaster can the service be down before the impact becomes unacceptable? Measured in elapsed time from outage to restoration. A trading platform may demand RTO of seconds; a quarterly batch job can live with RTO of days. RTO drives the architecture of recovery - hot standby vs warm vs cold, replicated databases vs nightly restore.

RPO - Recovery Point Objective

How much data, measured in time, can you afford to lose? An RPO of 5 minutes means you need snapshots or replication at minute-level granularity. An RPO of 24 hours means a nightly backup is sufficient. RPO drives the frequency of backup - and is the budget line most often quietly relaxed when the math gets uncomfortable.

Setting RTO / RPO per data class

One number for the whole org is an anti-pattern; the cost of an RPO of seconds for marketing brochures is wasted. The pragmatic approach is to classify systems into tiers, define RTO / RPO per tier, and document which systems belong in each:

Tier	Typical RTO	Typical RPO	Pattern	Cost
Tier 0 - life safety / regulated	Seconds to minutes	Seconds (zero acceptable)	Active-active multi-region, synchronous replication	2-3× single-region
Tier 1 - revenue critical	Minutes to 1 hour	Minutes	Active-passive multi-region, async replication + immutable backup	1.5-2×
Tier 2 - business critical	4-24 hours	1 hour	Cross-region snapshots, restore on demand	~1.2×
Tier 3 - important	1-3 days	24 hours	Daily backups, standard recovery	~1.05×
Tier 4 - bulk / archive	1 week+	7 days	Weekly backup to cold storage	~1.01×

The honest conversation with the business is the cost slope on the left side of the table. An RTO of minutes for everything sounds great until the bill arrives; an RTO of days for everything is cheap until a tier-1 system has an actual incident. Tiering is the answer; the security team is responsible for forcing the conversation, not for the answer.

Backup vs replication vs DR vs HA - four distinct disciplines

These four terms get conflated constantly. They are different things, each with a different threat model. The mature program runs all four; the immature program runs one of them and calls it the others.

Discipline	Protects against	Does NOT protect against	Typical mechanism
High availability (HA)	Single-instance / single-AZ failure	Region failure, data corruption, ransomware	Multi-AZ deployment, load balancer, auto-scaling
Replication	Single-region infrastructure failure	Corruption, deletion, ransomware - all faithfully copied	Cross-region async, storage-tier replication
Disaster recovery (DR)	Loss of a region / site / data center	Logical corruption that replicated before failover	Warm or hot standby in second region, ASR-style orchestration
Backup	Corruption, deletion, ransomware, time-travel needs	Real-time failures (RTO measured in hours, not seconds)	Point-in-time snapshots, immutable retention

The conflations that cause real outages:

"We use S3 cross-region replication, so we have a backup." No - CRR replicates deletions and overwrites. If ransomware encrypts the primary, CRR encrypts the secondary. Replication is not backup.
"We use multi-AZ RDS, so we have DR." No - Multi-AZ is HA. A regional outage takes both AZs with it. Cross-region read replicas or snapshots are DR.
"We have automated RDS backups, so we have ransomware protection." Maybe - depends on whether those backups are in a separate account, whether retention is locked, and whether the prod IAM role can delete them. Without those, an attacker with sufficient RDS permissions can delete the snapshots before the encryption attack.
"We have a DR site, so we have backup." No - a DR site protects against site loss but inherits any logical corruption that propagated before failover. The hot standby of a ransomware-encrypted database is a ransomware-encrypted database.

Immutability - what cloud providers actually offer

Immutability is the property that, once written, an object cannot be modified or deleted until a defined retention period expires - even by the account root, even by a global admin. It is the single most important technical control against cloud-native ransomware. The major clouds each implement it differently; the details matter.

AWS - S3 Object Lock

S3 Object Lock enforces write-once-read-many (WORM) at the object level. Configured per object or via a default bucket-level policy; requires versioning enabled on the bucket. Two modes:

Governance mode - retention can be shortened or removed by IAM principals with the s3:BypassGovernanceRetention permission. Useful for accidental-deletion protection; bypassable by sufficiently privileged attackers. Do not rely on Governance mode for ransomware defense.
Compliance mode - retention cannot be shortened or removed by anyone, including the AWS account root, until the retention period expires. Object cannot be deleted. This is the only setting that satisfies the "even root can't delete" property required for serious ransomware resilience.
Legal hold - independent of retention; pins an object until the hold is removed by a principal with s3:PutObjectLegalHold. Used for litigation, investigations, or extending retention.

Pair Object Lock with Glacier Vault Lock for archive tiers - Vault Lock policies are write-once and cannot be changed after the lock is engaged (even by root). The combination is the AWS gold standard for immutable archive.

Azure - Blob Immutable Storage

Azure offers two flavors of immutability at the blob storage layer:

Time-based retention - set a retention interval; while active, blobs cannot be modified or deleted. Can be configured as locked (cannot be shortened, even by the subscription owner) or unlocked (can be shortened up to five times). Lock the policy in production.
Legal hold - pins blobs until the hold is explicitly removed by a principal with the appropriate role.
Version-level immutability - apply immutability per blob version rather than per container, enabling granular policies.

Azure Backup Recovery Services Vaults also support soft delete (default 14 days, can be extended to 180 days and made non-bypassable) and multi-user authorization (MUA) - a Resource Guard pattern that requires approval from a second tenant for destructive operations on the vault. MUA is the Azure-native answer to "we need two-person rule on backup deletion."

GCP - Cloud Storage Bucket Lock

GCS Bucket Lock applies a retention policy to a bucket; objects within cannot be deleted or overwritten until they exceed the retention period. The retention policy itself can be locked - once locked, neither the project owner nor org admin can shorten or remove it. The lock is permanent for the lifetime of the bucket; the only way out is bucket deletion, which requires the bucket to be empty (impossible while objects are within retention).

Pair with object versioning for protection against malicious overwrites, and use VPC Service Controls to draw a perimeter around the backup project preventing data exfiltration even by privileged identities.

The compliance-mode guarantee

The crucial property all three providers offer in their strongest modes: once configured, the cloud platform itself enforces the retention, and not even the account root / subscription owner / project owner can override it. This is fundamentally different from "we set up IAM so admins can't delete" - which any sufficiently privileged attacker who escalates to admin can undo. Compliance-mode immutability is enforced by the provider's control plane, outside the customer's IAM domain.

The practical implication: configure immutability with retention longer than your worst-case dwell-time-plus-recovery window. Common floor is 30 days; many regulated environments run 90+. Ransomware operators dwell for months; a 7-day immutability window is theater.

Air-gapping in cloud - the virtual air gap

Classical air-gapping means a physical separation - backup tape stored in a vault no network can reach. Cloud has no physical air gap to offer, but it can offer a virtual one: a destination so isolated by IAM, network, and KMS boundaries that the path from production to backup is effectively one-way.

The patterns that approximate an air gap

Cross-account destination. Backups land in a separate AWS account / Azure subscription / GCP project. The backup account has no IAM trust relationship that allows production identities to write or delete; the backup service in the backup account pulls from production using a read-only role assumed in the production account. Even full compromise of the production account does not grant write or delete on the backup destination.
Cross-tenant destination. One step further - backups land in a destination owned by a different cloud tenant (a different AWS Organization, Azure tenant, GCP organization). The trust between tenants is established at backup configuration time and can be torn down. Required for the most sensitive workloads and increasingly demanded by cyber-insurance underwriters.
Cross-cloud destination. Production runs in AWS, immutable backups land in Azure (or vice versa). Full compromise of one cloud provider doesn't grant access to the other. Cost and complexity are real, but the threat model - including the rare-but-real "provider account-level compromise" - is meaningfully reduced.
Offline / out-of-cloud destination. Periodic copies to on-prem or tape. The original air gap, sometimes still warranted for the most critical data. Cost is the tradeoff; for tier-0 systems it's often paid.

The common thread: production cannot reach the backup destination through a normal data path. The backup service, identified separately, comes to production with read-only credentials; production never has credentials with write or delete to the destination.

Encryption of backups & KMS key custody

Every cloud backup is encrypted. That is necessary but not sufficient. The threat model that matters is not "the storage service was breached" - it's "the attacker has credentials in your environment and is looking for ways to render your backups unrecoverable." Key compromise is one such way.

The trust-domain principle

The KMS keys that encrypt your backups must be in a separate trust domain from the keys that encrypt production. If the same key encrypts production data and the backup of production data, an attacker who compromises that key wins twice - they can read or destroy both copies. Worse, if the attacker can schedule deletion of the key (AWS KMS allows scheduled deletion with a minimum 7-day window; Azure Key Vault supports purge protection), they can render every backup encrypted by that key permanently unrecoverable in a week without ever touching the backup data itself.

The pattern

Separate accounts / subscriptions / projects for backup keys. The KMS keys used for backup encryption live in the backup account, not production. Key administration and key usage are separate roles, and neither is held by a production identity.
HSM-backed keys for the most sensitive workloads. AWS CloudHSM, Azure Managed HSM, GCP Cloud HSM. The key material is FIPS 140-2 / 140-3 Level 3 protected and cannot be exfiltrated even by the cloud provider's personnel.
Multi-region key replication for DR. If a region disappears, you still need the key to decrypt the backups elsewhere. AWS multi-region KMS keys, Azure Key Vault georeplication, GCP regional vs multi-regional keys all address this.
Purge / scheduled-deletion protection. Enable purge protection (Azure) or the maximum scheduled-deletion window (AWS, 30 days); alert immediately on any attempt to schedule key deletion. A scheduled key deletion is a high-fidelity ransomware indicator.
Break-glass key access. The identity that performs daily backup operations does not have permissions to disable or delete keys. A separate, normally-disabled break-glass role exists for emergency operations and requires a documented approval flow to activate.

This is the discipline most cloud organizations get partially right and then live with the gap until it bites. Closing the gap is rarely expensive - it is mostly a matter of moving keys into the right accounts and editing IAM. The leverage is enormous.

The cloud ransomware kill chain

Modern cloud ransomware doesn't look like the on-prem ransomware of 2017. The kill chain has adapted to the cloud's architecture - most importantly, to the fact that the cloud's control plane and data plane are both API-accessible to anyone with credentials.

The seven steps

Initial access. Compromised credentials (most often), exposed access keys in a public git repo, a vulnerable web app providing a path to an instance profile, a phished cloud-console session, a third-party SaaS connector with broader scopes than needed. The path varies; the outcome is the same - the attacker has some cloud identity.
Discovery & enumeration. aws sts get-caller-identity equivalents in every cloud, then enumeration of IAM, accounts, organizations, S3 buckets, KMS keys, EBS snapshots, RDS instances. Tools like Pacu, ScoutSuite, ROADtools automate it. The attacker is now building a map of your environment - and crucially, finding your backups.
Persistence & privilege escalation. Create additional IAM users / access keys / service principals as backdoors. Modify trust policies. Add federation. The goal is to outlive the inevitable detection of the initial compromise.
Exfiltration. Modern ransomware is double-extortion - encrypt for ransom and leak for additional pressure. Data is copied out to attacker-controlled storage. In cloud, this often goes through legitimate-looking egress points to evade DLP.
Encrypt the backups FIRST. This is the step that makes cloud-native ransomware brutal. The attacker - with the credentials they collected in step 2 - destroys, encrypts, or sabotages every backup destination they can reach. Snapshots deleted. KMS keys scheduled for deletion. S3 versions deleted (or versioning disabled and objects overwritten). Backup retention shortened. Cross-region replication destinations encrypted with attacker keys. Recovery Services Vaults soft-deleted, then hard-deleted after the soft-delete window. Whatever the attacker can reach, they break.
Encrypt production. Now the production data is encrypted, dropped, or held. EBS volumes re-encrypted with attacker keys. RDS instances dropped. S3 objects overwritten with encrypted versions. The blast radius depends on the privilege the attacker has accumulated.
Ransom note. Often delivered via the compromised cloud console, sometimes via email or Slack. Demand is sized to the perceived recovery cost; modern operators research the target's revenue and insurance posture and price accordingly.

The order of step 5 before step 6 is the most important technical fact in this entire page. If your backup architecture cannot survive step 5 - if the attacker can reach your backups with the credentials they accumulate during dwell time - then no amount of preparation in step 6 helps. The whole defensive strategy is to make step 5 fail.

Cloud-specific ransomware techniques

Specific TTPs (tactics, techniques, procedures) seen across reported incidents and red-team engagements in cloud environments:

KMS key deletion + recreation. Schedule deletion of the customer-managed keys used by S3, EBS, RDS. After the deletion window, every object encrypted with the key is permanently unrecoverable. The attacker has nothing to decrypt; the data is destroyed by the cloud platform itself. (Counter: alert on any kms:ScheduleKeyDeletion; enforce the maximum 30-day window so the security team has time to react.)
S3 versioning rollback. If versioning is enabled but Object Lock is not, an attacker with appropriate S3 permissions can delete all non-current versions, leaving only the encrypted overwrite. The "we have versioning" defense fails silently. (Counter: Object Lock in Compliance mode + MFA delete + cross-account replication.)
Snapshot deletion. EBS, RDS, ElastiCache, and other snapshots can typically be deleted by anyone with the corresponding :DeleteSnapshot permission. Most production IAM roles have far broader permissions than needed; the snapshot delete is often available. (Counter: cross-account snapshot copies into the backup account, AWS Backup vaults with Vault Lock, snapshot share permissions reviewed.)
IAM-based "ransom" (account lockout). The attacker locks the legitimate owners out of their own account - rotate root credentials, replace MFA devices, modify IAM policies to deny the legitimate admins, delete federation, modify the SAML trust. Now the attacker controls the account; legitimate owners file a support ticket and wait. The data may not even be encrypted; the access itself is the ransom. (Counter: out-of-band root credential storage, MFA on root with multiple devices in physical safe, organization-level controls preventing single-account compromise from cascading, GuardDuty / Defender for Cloud / SCC alerts on root activity.)
Storage tier downgrades. An attacker with sufficient S3 / Blob / GCS permissions can change the storage class of large datasets to a tier optimized for cold storage (Glacier Deep Archive, Azure Archive). Retrieval costs are then significant - sometimes more than the ransom. This is a quieter, harder-to-detect variant; the data is still there but expensive to bring back on a timeline. (Counter: monitor lifecycle policy changes, alert on bulk storage class transitions, ensure SCPs / Azure Policy block ad-hoc tier changes outside policy.)
Cross-region replication destination redirect. The attacker modifies the replication configuration to point to an attacker-controlled bucket, then encrypts the source. The legitimate cross-region backup is now in attacker storage. (Counter: SCP-restricted replication destination accounts, alerts on replication configuration changes.)
Backup configuration drift. The attacker disables backup jobs, shortens retention, or modifies schedules to silently degrade the backup posture during dwell time. The "we have backups" claim is now true on paper but increasingly hollow as the attack window grows. (Counter: configuration baselines monitored by CSPM/CNAPP, drift alerts on backup-related resources, separation of duties on backup configuration changes.)
Identity-provider compromise. If the IdP (Okta, Entra ID, Google Workspace) is compromised, the attacker can grant themselves any role in any connected cloud account. The blast radius is every cloud the IdP federates to. (Counter: hardware-key MFA on IdP admins, conditional access policies, separate break-glass accounts not federated, IdP-specific logging and detection.)

Map your detection coverage against these techniques explicitly. The detection engineering page covers the rule-writing side; the Cloud SOC page covers the response side.

Close-up of a hand holding a padlock against a backdrop of code — Photo on Pexels

Backup architectures

Two architectural choices dominate the design. Get these right and most other decisions become local optimizations; get them wrong and no amount of tool selection will save the program.

Pull-from-prod, not push-to-backup

The single most important pattern: the backup destination is the active party in the relationship, not the production environment. The backup service, running in a separate account, assumes a read-only role into production, reads the data, writes to its own immutable storage. Production identities have no credentials with write or delete on the backup destination. Compromise of production does not yield access to backups.

Why this is the right shape:

The principle of least privilege flows in the correct direction - the destination grants read on the source, never the source grants write on the destination.
An attacker with full production credentials still cannot reach the backup destination.
The backup service's role can be tightly scoped to the read APIs needed, with no permission to modify or delete production resources - eliminating the risk of a compromised backup service damaging production.
Auditing the relationship is straightforward - the backup account's IAM grants are the entire trust surface.

Anti-pattern: production IAM roles with write permissions on a "backup bucket" in the same account. Common, easy to set up, and useless against ransomware.

Separate backup account / subscription / project

The destination lives in its own organizational unit. No admin trust from production. Separate IAM admins, separate KMS admins, separate logging, separate monitoring, separate billing if practical. Treat it as a tenant of one.

AWS: a dedicated AWS account in a "Security" or "Backup" OU. SCP-restricted to block deletion of backup vaults, KMS keys, snapshots. AWS Backup vault policies enforce cross-account access with explicit deny on bulk operations.
Azure: a dedicated subscription, optionally a dedicated tenant for the strongest separation. MUA (multi-user authorization) Resource Guard requires approvals from a different tenant.
GCP: a dedicated project, optionally a separate folder or organization. VPC Service Controls perimeter around the project. Org-policy constraints preventing data movement out.

Sketch of the canonical architecture:

Production Account (or Sub, or Project)
   │
   ├─ Workloads, data, primary KMS keys (production trust domain)
   │
   ▼  (one-way pull, read-only role)
Backup Account (separate trust domain)
   ├─ Backup service (AWS Backup / Azure Backup / GCP Backup & DR)
   ├─ Immutable destination buckets/vaults (Object Lock / Vault Lock)
   ├─ Backup KMS keys (separate from prod keys, HSM-backed for tier-0)
   ├─ Separate IAM admins, separate KMS admins
   ├─ MFA-enforced break-glass restore role (normally disabled)
   └─ Alerting: every privileged action logged + alerted
        │
        ▼  (optional further isolation)
Out-of-cloud / cross-cloud / on-prem destination
        for tier-0 data

Layer DR on top of this - the backup architecture is independent of (and often orthogonal to) the DR architecture. A multi-region active-passive DR setup with replication serves the "regional failure" RTO; the backup architecture serves the "logical corruption" RTO. Both are needed.

AWS backup landscape

AWS has consolidated most of its backup story under AWS Backup, with several adjacent service-specific capabilities still very much in use.

The native services

AWS Backup - the centralized, policy-driven backup service. Backup plans assign schedules and retention to resources. Supports EBS, EC2 AMI, RDS, Aurora, DynamoDB, EFS, FSx, Storage Gateway, S3, VMware on AWS, and more. Vault Lock enforces compliance-mode immutability on backup vaults; once locked in Compliance mode, retention cannot be shortened or removed, even by root. Cross-account and cross-region copy supported as first-class operations.
S3 with versioning, Object Lock, replication. The pattern for object data - versioning + Compliance-mode Object Lock + cross-account replication to a hardened destination. MFA delete on the source for additional guardrail.
EBS snapshots. Per-volume, can be copied cross-account and cross-region. AWS Backup is the recommended orchestration layer rather than ad-hoc snapshot scripts. Fast Snapshot Restore for fast RTO on critical volumes.
RDS automated backups + manual snapshots. Automated retain up to 35 days; manual snapshots persist indefinitely until deleted. Cross-region copy + cross-account share for isolation. RDS Multi-AZ is HA, not backup.
Glacier Vault Lock - the archive tier with the strongest immutability primitive. Vault Lock policies are write-once, cannot be modified once locked. The right destination for long-term immutable archive of tier-1+ data.
DynamoDB point-in-time recovery (PITR) - continuous backup of DynamoDB tables for 35 days. Restore to any second in that window. Combine with on-demand backups for longer retention.
AWS Elastic Disaster Recovery (DRS) - cross-region / cross-AZ replication-based DR for servers. Continuous block-level replication; failover into AWS within minutes. Complements backup, doesn't replace it.
Backup with logically air-gapped vaults - newer AWS Backup capability that stores backup data in AWS-managed accounts the customer never touches, accessible only via the AWS Backup API. Higher cost, stronger isolation.

Reference architecture

A solid AWS pattern for a regulated org:

AWS Organizations with a dedicated Backup Account in a Security OU.
SCPs prevent deletion of AWS Backup vaults, KMS keys with backup-related tags, Glacier vaults, and Vault Lock policies - across the entire organization.
AWS Backup plans defined in the management account, applied via Organization Backup Policies to member accounts.
Backup vaults in the Backup Account, locked in Compliance mode, with retention sized to the highest data-class RPO + dwell-time buffer.
S3 buckets with versioning + Compliance Object Lock + replication into the Backup Account.
KMS keys for backups in the Backup Account, separate admins, multi-region keys for DR, alerts on kms:ScheduleKeyDeletion.
Quarterly Game-Day restore drill from the Backup Account into a clean DR account.

Azure backup landscape

Azure splits backup and DR more explicitly than AWS does - Azure Backup for backup, Azure Site Recovery (ASR) for DR, and several service-specific protections sitting around them.

The native services

Azure Backup - Recovery Services Vault for VMs, file shares, on-prem servers via MARS agent. Soft delete (default 14 days, extensible) and immutable vaults (a Recovery Services Vault can be marked immutable, preventing destructive operations during retention) are the immutability primitives. Multi-user authorization (MUA) via Resource Guard requires approval from a principal in a different tenant for high-impact operations on the vault - the Azure native answer to two-person rule.
Azure Backup Vault (distinct from Recovery Services Vault) - newer vault model used by Azure Database for PostgreSQL, Blob, Disk, Kubernetes backup, and other workload-specific scenarios.
Azure Site Recovery (ASR) - replication-based DR for Azure VMs, VMware, and Hyper-V. Continuous replication to a target region with documented RTO measured in minutes. Failback supported. ASR is the DR pillar, not the backup pillar.
Blob versioning + immutable storage policies. Per-container or per-version. Time-based retention locked at the policy level. Pair with soft delete for blobs and containers.
Managed disk snapshots. Cross-subscription / cross-region copy supported. Increasingly orchestrated through Azure Backup rather than ad-hoc.
Azure SQL - auto-backups + long-term retention (LTR). Built-in automated backups retained 7-35 days; LTR extends to 10 years with weekly / monthly / yearly granularity. Backups are encrypted with TDE keys; configure customer-managed keys for separation.
Azure Files snapshots. Up to 200 snapshots per share; soft delete additionally protects against accidental share deletion.

Reference architecture

Azure pattern for a regulated org:

Dedicated management group + subscription for backup. Optionally a separate tenant for cross-tenant MUA.
Recovery Services Vaults marked immutable, with soft delete enforced and MUA enabled via a Resource Guard in the backup tenant.
Azure Policy initiatives enforcing backup configuration across the org - alerting on drift.
Customer-managed keys (CMK) in a Key Vault with purge protection enabled, key admins in the backup subscription only.
ASR for tier-0/1 DR, complementary to Azure Backup for backup.
Blob immutable storage policies on every backup destination container, retention sized to dwell + RPO buffer.

GCP backup landscape

GCP's backup story consolidated around the Backup and DR Service (originating from Google's 2021 acquisition of Actifio), with the storage-native primitives sitting underneath.

The native services

Backup and DR Service - GCP's enterprise backup orchestration. Application-consistent backups for Compute Engine, VMware Engine, databases (Oracle, SQL Server, SAP HANA, PostgreSQL), and file systems. Centralized policy, retention, restore. The Actifio engine under Google branding.
GCS Bucket Lock - retention policies that, once locked, cannot be removed or shortened. Pair with object versioning and turbo replication across regions.
Persistent Disk snapshots + scheduled snapshot policies. Per-disk or per-resource-policy. Cross-region by default for multi-regional locations. Snapshot retention policies enforce minimum and maximum retention.
Cloud SQL automated backups + PITR. Automated backups for 7-365 days; point-in-time recovery via write-ahead-log retention. Cross-region replicas for DR.
Spanner / Bigtable / Firestore - managed databases with their own backup and restore primitives. Spanner offers multi-region instances with synchronous replication; database backups can be retained up to 1 year.
BigQuery snapshots and time-travel. 7-day time-travel by default; table snapshots persist beyond that. Cross-region copy supported for DR.
VPC Service Controls - perimeter around the backup project preventing data exfiltration to outside-perimeter destinations, even by identities with broad IAM.

Reference architecture

GCP pattern for a regulated org:

Dedicated backup folder + project in the GCP organization. Optionally a separate org for the strongest separation.
Org policy constraints prevent disabling of locked retention, prevent public bucket creation, prevent service-account key creation broadly.
GCS buckets in the backup project with locked retention policies, object versioning, and CMEK from a backup-project-only KMS keyring.
VPC Service Controls perimeter encloses the backup project, blocking cross-perimeter data movement.
Backup and DR Service appliances in the backup project, IAM grants from production projects scoped to the read APIs needed.
Multi-regional or dual-regional GCS for the strongest durability + availability story for backup data.

Database DR patterns

Databases have their own RTO / RPO economics because the data volume is usually large, the change rate is high, and consistency requirements are strict. The native patterns each cloud provides:

Read replicas across regions

Async replication of a primary to one or more read-only secondaries, optionally in other regions. Failover promotes a replica to primary. RPO is the replication lag (seconds to minutes); RTO depends on whether the promotion is automatic or manual.

RDS / Aurora - cross-region read replicas; Aurora Global Database for sub-second lag and one-minute RTO across regions.
Azure SQL - geo-replicas, Auto-failover groups for automated failover with optional read-only listener endpoints.
Cloud SQL - cross-region read replicas; Spanner for natively multi-regional strongly-consistent databases.

Point-in-time recovery (PITR)

Continuous capture of database changes (transaction logs / change feed) enabling restore to any second within the retention window. The right primitive for "we made a bad schema change three hours ago" or "an attacker dropped a table at 3am."

RDS / Aurora PITR - 5-minute granularity within the automated backup retention window (up to 35 days).
Azure SQL PITR - up to 35 days; LTR for longer retention.
Cloud SQL PITR - restore to any point within the retention window.
DynamoDB / Spanner / Firestore - service-specific PITR with up-to-second granularity (DynamoDB) or version-history-based (Spanner).

Cross-region snapshots

For the cold-start scenario - recovery from a complete region loss or full corruption that propagated through replication. The snapshot is independent of the live database; restoration is slower but the trust boundary is cleaner.

Native multi-region replication

Some managed databases offer synchronous or near-synchronous multi-region replication - Spanner is the clearest example, with externally consistent transactions across continents. Aurora Global, Azure SQL geo-replication, and Cosmos DB multi-region writes are similar at lower consistency guarantees. The right choice when RTO must be measured in seconds.

The combination that holds across providers: native replication for HA / DR, plus immutable snapshots in a separate trust domain for ransomware resilience. The replica protects against infrastructure failure; the immutable snapshot protects against logical corruption.

Restoration drills

The single most underinvested practice in cloud backup programs. The gap between "we have backups" and "we can restore at scale within RTO under stress" is enormous and only revealed by doing the restore.

Why drills are the control, not the documentation

Every cloud provider's documentation makes restoration look like a button-click. In practice, restoration at scale runs into:

Missing IAM permissions on the restore role that no one noticed because daily backups never exercised that path.
KMS keys not available in the target region.
Network configuration in the target environment that doesn't match production - security groups, subnets, NACLs, routing.
DNS pointing at the broken primary; cutover requires a parallel runbook nobody has run.
Restore-rate limits - bringing back 50 TB takes hours regardless of how fast the read APIs are documented to be.
Dependencies on services that themselves need to be restored in the right order.
The team has never done this together at this scale, and it's now 2am during a real incident.

The cadence

Monthly per-system small restore. Pick a random database, restore to a sandbox, verify integrity. 30 minutes of engineer time. Surfaces the small drift.
Quarterly full-system restore. One tier-1 system, restored from immutable backup to a clean recovery account, with the runbook driving the work. 4-8 hours. Surfaces the medium drift.
Annual full-stack drill. Multiple systems, dependencies-aware, against a defined scenario (lost region + lost prod identities + KMS key unavailable). Day-long exercise. Surfaces the systemic gaps.
Continuous validation for the most mature programs - backup data is continuously sampled and integrity-verified by a job that does not have write access anywhere. Catches silent corruption between scheduled drills.

Tabletop scenarios worth running

"All production IAM access keys are revoked. KMS key in the prod account has been scheduled for deletion. You have 30 days to recover the last 7 days of customer data into a new account."
"The backup account itself has been compromised. The cross-cloud backup in Azure is the only known-good copy. Restore tier-1 services into a clean AWS Organization within 24 hours."
"Three users report missing data on a Tuesday morning. The change happened sometime between Friday night and Monday morning. What's your point-in-time recovery strategy and how long to RTO?"
"You receive a ransom note. The attacker claims to have deleted all S3 versioning and modified your Vault Lock retention. Verify the claims; recover where possible; respond."

Capture the times, the friction points, the missing tooling. Each drill should produce a list of runbook fixes; the program matures by closing those gaps over the year before the next drill.

The Encryption Key Custody pattern

An advanced control increasingly seen in regulated and high-target organizations. The principle: the encryption keys for backups should be HSM-backed and accessible only via a break-glass process, not the same identity that runs daily backups.

The pattern

Two key sets. A working key set used by the daily backup service to encrypt newly-written backups, and a custody key set required to decrypt and read them for restore.
Custody keys are HSM-backed. AWS CloudHSM, Azure Managed HSM, GCP Cloud HSM. The key material is FIPS-validated, cannot be exfiltrated from the HSM, and key usage is itself audited inside the HSM.
Custody keys are dormant. The IAM principals with usage permission on custody keys are normally disabled. Enabling them requires a documented approval flow - typically a security ticket, two-person rule, time-bounded grant.
Working keys can encrypt, custody keys can decrypt. Asymmetric capability separation. The daily backup runner cannot read its own backups; only a restoration-authorized identity using the custody key can.
Custody keys are replicated. Loss of the custody key set is loss of all backups encrypted by it. Replicate across regions, ideally across cloud providers for the most critical data.
Custody key usage is heavily monitored. Any usage event is high-fidelity - it should either be a documented restoration or an incident. SOC alerting on first principal use.

This pattern adds operational friction; it is reserved for the highest-impact data. Not every workload needs it. For tier-0 systems handling regulated, financial, or otherwise irreplaceable data, the friction is the point - even a fully compromised cloud account cannot decrypt the backups without independent approval.

Cyber insurance - what carriers ask for now

The cyber insurance market reset between 2022 and 2025. Premiums tripled or worse, sublimits were introduced for ransomware specifically, and underwriters now ask detailed technical questions on the application. The 2026 reality is that minimum hygiene is a precondition for coverage - and certain failures lead to non-payment after a loss.

What carriers now require

Immutable backups. Documented Object Lock / Vault Lock / immutable Blob Storage / Bucket Lock with retention sized to the dwell-time threat model. Spreadsheet attestation alone is no longer accepted; carriers ask for evidence - policy documents, CSPM exports, audit reports.
Backups isolated in a separate account / subscription / tenant. Same-account backups disqualify many policies outright or carry significant exclusions.
MFA enforced on privileged identities - including specifically on accounts with backup and restore privileges, and on cloud-provider root.
EDR / endpoint protection deployed and current on all administrative endpoints.
Documented and tested restoration. The most-asked-about control of the last 18 months. Carriers want to see the drill cadence, the last drill date, and the time-to-restore metric.
Isolated recovery environment (IRE). A pre-staged clean environment to restore into without trusting the potentially-compromised production. Becoming standard for tier-1 insureds.
Incident response runbook with named contacts, retainer with an IR firm, communication plan. (See Incident Response.)
Vulnerability management posture - patch SLAs, exposed surface management, CVE prioritization workflow.

What carriers will and will not pay

Common patterns in 2024-2026 carrier behavior:

Ransom payment - covered in many policies, but increasingly sublimited (often $1-5M against larger overall policies), with growing reluctance to fund payment if alternatives exist. Some sanctioned-actor payments are flatly excluded.
Restoration costs - typically covered when the backup architecture meets stated controls. Not covered when post-loss investigation shows the controls were misrepresented on the application.
Business interruption - covered up to defined waiting periods and sublimits.
Forensics / IR firm costs - covered, often via panel firms the carrier pre-approves.
Regulatory fines - varies by jurisdiction; some are insurable, some are not.
Misrepresentation on the application - universally a path to non-payment. If your application said "immutable backups" and the post-loss investigation finds they were not immutable, expect a denial. Be honest on the application.

The pragmatic takeaway: build to the carrier's expected controls regardless of whether you're seeking coverage; they encode the floor of reasonable cloud-ransomware hygiene in 2026. If you cannot answer "yes" to immutability, isolation, MFA, EDR, and tested restore - you have program gaps, not insurance gaps.

Third-party backup & DR platforms

Native cloud backup is excellent for within-cloud, single-cloud scenarios. Third-party platforms add cross-cloud, on-prem, SaaS coverage, centralized management, and frequently a better restore UX. The major categories:

Enterprise backup platforms

Veeam - broad coverage across AWS, Azure, GCP, on-prem, Microsoft 365, Salesforce. Strong in hybrid environments. Veeam Backup for AWS / Azure / GCP integrate with cloud-native primitives.
Rubrik - converged data management with ransomware-recovery-focused capabilities. Immutable backup, anomaly detection, and threat hunting on backup metadata.
Cohesity - hyperconverged backup, with strong ransomware-detection capabilities and integration with major SIEMs.
Commvault - long-established enterprise backup; modernized cloud offerings via Metallic SaaS.

Cloud-native / SaaS backup platforms

Druva - SaaS-delivered backup for cloud workloads, SaaS apps, endpoints. No backup infrastructure to manage.
Clumio - purpose-built for AWS workloads (acquired by Commvault in 2024). Air-gapped, immutable backup as a service.
HYCU - SaaS-app and cloud-workload focused, broad app coverage.
N-able - MSP-focused backup for Microsoft 365 and other SaaS.
Acronis - endpoint and workload backup with integrated cyber-protection features.

When to add a third party on top of native

Multi-cloud. AWS Backup doesn't back up Azure; Azure Backup doesn't back up GCS. A single pane of glass across clouds is a strong reason to add a third party.
SaaS coverage. Microsoft 365, Google Workspace, Salesforce, ServiceNow, Atlassian, GitHub data is not natively backed up to your specification by the SaaS provider. (Microsoft's stance is that 365 is durable, not backed up - that distinction matters in a ransomware scenario.) Third parties fill the gap.
Cross-cloud recovery. "Recover into a different cloud" is a third-party capability; native services don't cross cloud boundaries.
Unified RBAC and audit. Centralized backup policy and audit across a heterogeneous environment.
Specialized features. Anomaly detection on backup metadata, ransomware-specific recovery workflows, isolated recovery environment orchestration.

Many mature programs run both - native services for within-cloud day-to-day backup, a third party as the cross-environment, cyber-resilience-focused overlay. The cost is real; the right answer depends on where your data actually lives and what your restore scenarios look like.

Tabletop exercises

Tabletop exercises sit alongside restoration drills - the drill validates the technical capability; the tabletop validates the decision-making, coordination, and communications under pressure. Both are necessary; neither is sufficient.

What makes a tabletop useful

A specific, plausible scenario. Not "ransomware happens" - "Your DevOps lead's AWS access keys leak in a GitHub commit on Friday at 4pm. By Monday morning, your EBS snapshots are deleted, your S3 versioning is disabled, and your KMS key in the prod account has been scheduled for deletion."
Cross-functional attendance. Security, engineering, legal, communications, finance, executive leadership. Ransomware response is a business event, not a security event.
A facilitator who injects realistic obstacles. "Your SaaS-based ticketing system is down because the IdP is compromised. How are you coordinating?" "Your CFO is on a plane. How are you getting authorization to engage the IR retainer?"
Time pressure. Two-hour exercises produce different lessons than four-hour ones. Both are useful.
Written findings. Every gap surfaced becomes a runbook fix.

Cross-reference to Incident Response for full IR program design. The backup-specific tabletop is one of several scenarios a mature IR program rotates through annually.

Maturity stages

A staging model for cloud backup and DR programs:

Stage 1 - Backups exist

Some backups are running. Retention is set. Configuration may have drifted from policy. Restoration has not been tested at scale. Backups likely share the same trust domain as production. The default state of most cloud programs at year 1-2.

Stage 2 - Restored once

At least one successful restore has been completed. The team has discovered the IAM, KMS, and runbook gaps that surface only when you actually do it. Documented runbooks exist for the most critical systems. Still no immutability and still no separation.

Stage 3 - Tested quarterly

Regular drill cadence in place. Per-system, per-tier RTO / RPO defined and tested against. Time-to-restore measured and trending. Cross-region snapshots in use. Some immutability via Object Lock / Bucket Lock / immutable Blob policies, but possibly not in compliance mode.

Stage 4 - Immutable + isolated

Compliance-mode immutability on tier-1+ data. Separate account / subscription / project for backup destinations. KMS keys in a separate trust domain. Cross-account, cross-region replication. Cyber-insurance-grade controls. Annual tabletop with cross-functional participation.

Stage 5 - Continuous validation

Backup integrity continuously verified by an isolated process. Cross-cloud copies for the most critical data. HSM-backed custody keys with break-glass access. Isolated recovery environment pre-staged. Restore-time measured continuously. The program is a competitive advantage in regulated sales.

As with most security maturity models, skipping stages is a recipe for cost and complexity without resilience. Immutability without restoration testing leaves you with verifiably-intact backups you can't actually use under pressure.

AWS, Azure, and GCP backup & DR capabilities side-by-side

The native services each cloud ships, reduced to a one-screen reference:

Capability	AWS	Azure	GCP
Centralized backup service	AWS Backup	Azure Backup (Recovery Services Vault, Backup Vault)	Backup and DR Service (Actifio)
Object-store immutability	S3 Object Lock (Governance / Compliance), MFA delete	Blob immutable storage policies (locked / unlocked, version-level)	GCS Bucket Lock (locked retention policy)
Archive-tier immutability	Glacier Vault Lock	Archive tier with immutability policy	Archive storage class with Bucket Lock
Vault-level immutability	AWS Backup Vault Lock (Compliance mode)	Immutable vault + MUA Resource Guard	Backup and DR retention enforcement
Two-person rule on delete	SCP + KMS grants + MFA; not built-in single feature	Multi-User Authorization (MUA) via Resource Guard	Org policy + IAM separation; no single MUA primitive
DR (replication-based)	Elastic Disaster Recovery (DRS), CloudEndure heritage	Azure Site Recovery (ASR)	Backup and DR Service, custom replication patterns
DB point-in-time recovery	RDS / Aurora PITR (35 days), DynamoDB PITR	Azure SQL PITR (35 days) + LTR (10 years)	Cloud SQL PITR, Spanner backups, BigQuery time-travel
Cross-region copy	AWS Backup copy, S3 CRR, snapshot copy	GRS / RA-GRS storage, ASR cross-region	Multi-regional GCS, cross-region snapshots, Backup and DR
Cross-account isolation	Multi-account via Organizations + SCPs	Multi-subscription, optionally multi-tenant for MUA	Multi-project + VPC Service Controls perimeter
Air-gapped vault offering	AWS Backup logically air-gapped vaults	Immutable RSV + MUA across tenants	Backup and DR appliance isolation
HSM-backed keys	AWS CloudHSM, KMS Cloud HSM tier	Azure Managed HSM	Cloud HSM, External Key Manager
Ransomware-specific detection	GuardDuty malware protection, S3 Malware Scanning	Defender for Cloud (CSPM + workload), Defender for Storage	Security Command Center Premium / Enterprise findings
SaaS / Microsoft 365 backup	Not native - third-party (Veeam, Druva, etc.)	Native is durability, not backup - third-party recommended	Workspace native + third-party for fuller restore options

The native services are excellent for within-cloud scenarios. They do not natively cross clouds; multi-cloud organizations still typically deploy a third-party overlay for unified visibility and cross-environment recovery. The native capabilities and the third-party platforms are complements, not substitutes, at scale.

Common pitfalls

Backups in the same account / subscription / project as production. Single compromise destroys both. The most common and most consequential mistake. Move them.
No immutability - or immutability in Governance mode only. Bypassable retention is theater. Compliance mode (or equivalent locked policy) is the floor.
Restoration never tested at scale. "We have backups" without "we have restored" is documentation, not a control. Quarterly drills minimum.
Encryption key reused for prod and backup. A single key compromise destroys both copies. Keys for backup live in a separate trust domain, period.
"We use replication, so we have backup." Replication faithfully copies deletions and encryption. It is not a backup. You need both.
Backup retention shorter than ransomware dwell time. A 7-day retention against a 90-day dwell-time attacker means you have nothing to restore. Size retention against the threat model, not against the storage budget.
Cross-region replication into an account the attacker also controls. If the destination shares trust with the source, it's not isolation. Cross-region + cross-account, both.
No alerting on backup-destination changes. Retention shortening, vault deletion attempts, KMS deletion scheduling - all high-fidelity ransomware indicators. Alert on each.
SaaS data assumed to be backed up by the SaaS vendor. Microsoft 365, Salesforce, GitHub - vendors guarantee durability, not customer-specified restoration. Use a third-party SaaS backup for any business-critical SaaS.
Runbook stored only in the system that's down. The restore runbook lives in Confluence; Confluence runs on the compromised cloud account; the runbook is unreachable. Copy runbooks to an out-of-band location - a printed binder, an offline wiki, a separate SaaS the IR team can reach.

FAQ

Is replication a backup?

No. Replication copies the current state - including deletions and corruption - from primary to secondary, usually in seconds. If ransomware encrypts your primary, replication faithfully encrypts the secondary. A backup is a point-in-time copy retained on a schedule, ideally immutable, in a trust boundary the production identities cannot reach. Replication is for availability and disaster recovery from infrastructure failure; backup is for recovery from corruption, accidental deletion, malicious deletion, and ransomware. You need both; neither substitutes for the other.

Should I pay the ransom?

Engage law enforcement (FBI / your country's equivalent) and counsel before any decision; in some jurisdictions paying certain sanctioned actors is illegal. The practical reality: paying does not reliably restore data - published industry data puts full recovery after payment well under 50%, decryption tools are often slow and buggy, and paid victims are statistically more likely to be hit again. Insurance carriers increasingly will not reimburse a ransom without documented exhaustion of alternatives. The single best determinant of not paying is having tested, immutable, isolated backups you can restore from in known RTO.

How long should I keep backups?

Drive retention from three inputs: the legal / regulatory minimum (HIPAA 6 years for some records, SOX 7 years for financial, GDPR limits maxima not minima), the business RPO and recovery scenario (a ransomware dwell time of 60-90 days means a backup younger than that may already be compromised), and cost. Common pattern: hourly snapshots for 24-48 hours, daily for 30 days, weekly for 90 days, monthly for 1 year, annual for 7 years, all with immutability locks matching retention. Move long-tail copies to cold storage (Glacier Deep Archive, Azure Archive, GCS Archive) to control cost.

Does S3 Object Lock prevent ransomware?

It prevents object deletion and overwrite for the duration of the lock - which is the most important property - but only if you use Compliance mode (Governance mode is bypassable by IAM principals with the right permission). Object Lock does not prevent attackers from deleting the bucket itself if they have S3 admin and bucket versioning is off, does not prevent KMS key disablement that renders objects unreadable, and does not prevent retention being set too short before an attack. The full pattern is: Compliance-mode Object Lock + bucket versioning + replication to a separate account + KMS keys whose admin and usage roles are in a different trust domain + MFA-delete on the source bucket.

How often should we test restoration?

Quarterly at minimum for any tier-1 system; monthly or continuous validation for the most critical. The gap between "we have backups" and "we can restore at scale within RTO" is enormous and only revealed by doing the restore. A useful cadence: per-system small restore monthly, full-stack tabletop quarterly, full-service restore annually (or after any major architecture change). The numbers to capture from each test: time to restore, data loss vs RPO, manual steps required, missing IAM, missing keys, missing tooling. Each finding feeds the runbook.

Do I need a third-party backup tool if my cloud provider has one?

It depends on three things: whether you're multi-cloud (native tools don't cross clouds), whether you need recovery into a different cloud or on-prem (cyber-resilience may require an out-of-cloud copy), and whether your RPO / RTO needs exceed what native tooling offers. Native services are excellent at within-cloud recovery and integrate with the rest of the cloud's IAM, KMS, and logging. Third parties (Veeam, Rubrik, Cohesity, Commvault, Druva, Clumio, HYCU) add cross-cloud, on-prem, and SaaS coverage, central management, and often a better restore UX. Many enterprises run both - native for the day-to-day, third party for cross-environment and cyber-resilience.

Is the backup account itself a target?

Yes - and the assumption that it is must drive the architecture. A modern ransomware operator's first move after gaining persistence in your production environment is to enumerate trust relationships and look for the backup destination. The backup account or subscription should have: no inbound trust from production, separate IAM admins from production, separate KMS keys with separate key admins, MFA enforced on every role with write or delete capability, alerts on every privileged action, and a break-glass restore-only role that is normally disabled.

Where next

Incident Response - when the backup architecture is the thing standing between you and ransom payment, response coordination matters.
Data Security & KMS - the encryption and key-custody patterns this page depends on, in depth.
Detection Engineering - detecting the seven kill-chain steps before step five succeeds.
Cloud SOC - the operational practice that runs the detections.
IAM & Identity - the cross-account trust patterns under the backup architecture.
Friday Zoom - ransomware-resilience comes up regularly. Bring your architecture; we'll workshop it.

Backup, DR & Ransomware Resilience

On this page

Why this is a security control, not just ops

The 3-2-1 rule (and the 3-2-1-1-0 update)

RTO and RPO

RTO - Recovery Time Objective

RPO - Recovery Point Objective

Setting RTO / RPO per data class

Backup vs replication vs DR vs HA - four distinct disciplines

Immutability - what cloud providers actually offer

AWS - S3 Object Lock

Azure - Blob Immutable Storage

GCP - Cloud Storage Bucket Lock

The compliance-mode guarantee

Air-gapping in cloud - the virtual air gap

The patterns that approximate an air gap

Encryption of backups & KMS key custody

The trust-domain principle

The pattern

The cloud ransomware kill chain

The seven steps

Cloud-specific ransomware techniques

Backup architectures

Pull-from-prod, not push-to-backup

Separate backup account / subscription / project

AWS backup landscape

The native services

Reference architecture

Azure backup landscape

The native services

Reference architecture

GCP backup landscape

The native services

Reference architecture

Database DR patterns

Read replicas across regions

Point-in-time recovery (PITR)

Cross-region snapshots

Native multi-region replication

Restoration drills

Why drills are the control, not the documentation

The cadence

Tabletop scenarios worth running

The Encryption Key Custody pattern

The pattern

Cyber insurance - what carriers ask for now

What carriers now require

What carriers will and will not pay

Third-party backup & DR platforms

Enterprise backup platforms

Cloud-native / SaaS backup platforms

When to add a third party on top of native

Tabletop exercises

What makes a tabletop useful

Maturity stages

Stage 1 - Backups exist

Stage 2 - Restored once

Stage 3 - Tested quarterly

Stage 4 - Immutable + isolated

Stage 5 - Continuous validation

AWS, Azure, and GCP backup & DR capabilities side-by-side

Common pitfalls

Further reading

Provider documentation

Standards & guidance

Third-party platforms

Related CSOH pages

FAQ

Is replication a backup?

Should I pay the ransom?

How long should I keep backups?

Does S3 Object Lock prevent ransomware?

How often should we test restoration?

Do I need a third-party backup tool if my cloud provider has one?

Is the backup account itself a target?

Where next