Cloud Network Security (VPC, Egress, WAF, SASE, ZTNA)

Dense network cables - the physical layer beneath every VPC — Photo by Brett Sayles on Pexels

Last updated 2026-05-17 · By Shawn Nunley · Vendor-neutral · View source on GitHub

The 30-second version: Cloud network security is what's left after you accept that there's no perimeter. The network is software-defined, every workload has a public-by-default API edge, and the most common modern attack path runs outbound - a compromised workload calling an attacker-owned domain - not inbound. The discipline is four things stacked: good VPC/VNet design (tiered subnets, hub-and-spoke, no default VPC in prod), private connectivity to cloud APIs (PrivateLink / Private Link / PSC, with endpoint policies), controlled egress (Network Firewall, FQDN allow-lists, DNS firewall), and identity-aware overlay (ZTNA, service mesh mTLS, microsegmentation).

This page is the network mechanics. The architectural pattern that sits on top - zero trust - is its own page; this one is the wires, rules, and primitives that make zero trust enforceable in AWS, Azure, and GCP.

Why cloud networking is different
VPC / VNet design fundamentals
AWS networking primitives
Azure networking primitives
GCP networking primitives
Security groups vs NACLs vs firewall
Private connectivity & service endpoints
Egress controls
DNS security
WAF
DDoS protection
Bot management & API security
Service mesh & east-west mTLS
SASE / SSE
ZTNA
Microsegmentation
eBPF & cloud networking
Cross-cloud connectivity
Flow logs & network observability
AWS / Azure / GCP side-by-side
Maturity stages
Common pitfalls
Further reading
FAQ
Where next

Why cloud networking is different

On-prem networking is built around a perimeter and a small number of choke points - a firewall at the edge, a DMZ behind it, an internal LAN behind that. The perimeter is the policy plane: in front of it, untrusted; behind it, trusted (more or less). Cloud breaks the model in three ways.

The perimeter is gone. Every VPC has an internet-routable edge unless you actively block it; every managed service (S3, Cosmos DB, Cloud Storage) has a public endpoint by default; every IAM principal can be assumed from anywhere on earth if its credentials leak. The "firewall at the edge" stops being a thing because there is no single edge - there are thousands, one per resource. Identity, not network position, becomes the trust anchor.

The network is software-defined. A CIDR block, a route table, a security group, an NSG, a transit gateway attachment - all of these are API calls. Which means they change in seconds, they leave a CloudTrail / Activity Log / Cloud Audit Log record, and they can be expressed as code (Terraform, Pulumi, CDK, Bicep, Deployment Manager). The bad news: a single misconfigured security group rule can expose a database to the internet in one API call. The good news: you can detect that misconfig the moment it happens, and you can write policy-as-code that prevents it in CI.

Egress is the new ingress. The classic on-prem attack went in through a firewall, found a foothold, moved laterally. The modern cloud attack often starts with an SSRF or a leaked credential and works out - the compromised workload calls an attacker-controlled domain to exfiltrate data or pull down a second-stage payload. The network control that stops it is outbound filtering: FQDN allow-lists, DNS sinkholing, VPC endpoint policies. Most cloud network security programs are inbound-heavy and egress-blind, which is exactly backwards from where the attacks happen.

The rest of this page is what you do about all three.

VPC / VNet design fundamentals

Before any of the security controls work, the underlying topology has to be sane. The same design choices come up across AWS, Azure, and GCP, with vocabulary differences. The patterns:

CIDR planning

Pick the IP space before you build anything. The mistakes are recoverable but expensive. Practical rules:

Use RFC1918 space. 10.0.0.0/8 is the standard choice for a multi-region multi-account program; reserve 172.16.0.0/12 and 192.168.0.0/16 for special cases (lab, on-prem overlap, partner peering).
Never overlap. The single largest source of "we can't peer these VPCs" pain is overlapping CIDRs across accounts. Maintain an IP address management (IPAM) source of truth - AWS VPC IPAM, Azure IPAM solutions, Infoblox, BlueCat, NetBox, phpIPAM - and allocate from it.
Size for growth. A /16 per VPC gives you 65k addresses; a /24 per subnet gives you 256. Cloud-native workloads (Kubernetes, especially) burn through IPs quickly because every pod takes one. Plan large.
Reserve space for transit. The transit/inspection VPC needs its own range that doesn't overlap with anything it will route between.

Subnetting strategy: tiered subnets

The reference pattern is three tiers per AZ:

Public subnet - routable to the internet via an Internet Gateway. Holds public-facing load balancers, NAT Gateways, bastion hosts (if you must have them - ZTNA is better), and nothing else. No workloads.
Private subnet - no direct internet route; egress goes through a NAT Gateway (or, better, a Network Firewall in front of NAT). Holds the application tier: EKS / AKS / GKE nodes, EC2/VM instances, Lambda functions with VPC config, container services.
Data subnet - strictest. No internet route at all (egress and ingress). Holds RDS, Aurora, Cosmos DB, Cloud SQL, ElastiCache, Memorystore. Security groups restrict access to the application tier's security group only.

The tiers serve two purposes: they make blast radius small (a compromised public-tier load balancer can't directly reach the database), and they make the routing/firewall policy legible (the route table for "private" tier always sends egress through Network Firewall; the data tier never has an egress route).

AZ / region resilience

Minimum two AZs per region for any production workload; three is the standard when the service supports it. Same subnet tiering replicated in each AZ. For multi-region: separate VPCs per region, connected via Transit Gateway (AWS), vWAN (Azure), or Network Connectivity Center (GCP), with CIDR planning that allows region-to-region traffic without overlap.

Transit hub-and-spoke

Past three or four VPCs/VNets, full-mesh peering becomes unmanageable (O(n²) connections, no central inspection point). The standard pattern is hub-and-spoke: every workload VPC peers only to a central transit/inspection VPC, which runs the Network Firewall, the egress to internet, the on-prem connectivity, and the cross-region links. AWS implements it with Transit Gateway; Azure with Virtual WAN; GCP with Network Connectivity Center. The hub becomes the policy choke-point: one Network Firewall instance inspects everything that crosses VPC boundaries.

The default VPC is not your friend

Every cloud account ships with a default VPC, in every region, with default subnets that are routable to the internet, with permissive default security groups. Disable creation of default VPCs in your landing zone - AWS Control Tower has a guardrail for this, Azure does it via Policy, GCP via Organization Policy (compute.skipDefaultNetworkCreation). The number of accidental "public Redis on the default VPC because someone forgot to specify a VPC" incidents is non-trivial.

AWS networking primitives

The objects you'll touch:

VPC - the network boundary. Regional. Contains subnets.
Subnet - AZ-scoped. Public or private based on whether its route table has a route to an Internet Gateway.
Route table - per-subnet (default) or per-resource. Defines where traffic for a destination CIDR goes - IGW, NAT, TGW, VPC endpoint, peering connection.
Internet Gateway (IGW) - the egress to internet for public subnets, and the route by which inbound public traffic reaches public ELBs / public instances.
NAT Gateway - managed source-NAT for private subnets to reach the internet outbound. Expensive ($0.045/hr + data) and stateful; per-AZ for HA.
Transit Gateway (TGW) - the hub for multi-VPC, multi-account, on-prem connectivity. Holds route tables that decide which VPCs can talk to which. The standard hub-and-spoke implementation.
VPC Peering - direct 1:1 connection between two VPCs. Non-transitive, doesn't scale. Use only for two-VPC cases; everything else should be TGW.
VPC Endpoints (interface, gateway) - private connectivity to AWS services without traversing the internet. Gateway endpoints (S3, DynamoDB) are free; Interface endpoints (PrivateLink) cost per-hour per-AZ + data.
AWS PrivateLink (VPC Endpoint Services) - you can publish your own service over PrivateLink for consumers to reach without internet. The right pattern for SaaS sold to AWS customers.
VPC Lattice - service-to-service connectivity across VPCs and accounts with identity-aware policy. Newer; takes some of the service-mesh problem space.
Cloud WAN - global network manager that abstracts TGW + Direct Connect + on-prem into a policy-driven topology.
Network Firewall - managed firewall (Suricata under the hood) for inline FQDN filtering, IDS/IPS, TLS inspection. Sits in front of NAT typically.
Direct Connect - dedicated physical link to AWS for hybrid connectivity.
Client VPN, Site-to-Site VPN - managed VPN endpoints (use ZTNA in preference for user access).

The reference architecture (multi-account): a transit/inspection account holding the TGW and the Network Firewall, workload accounts each with a VPC attached to the TGW, all internet egress flowing through the inspection VPC.

Azure networking primitives

VNet - the network boundary. Regional. Contains subnets.
Subnet - per-VNet. No AZ scoping at the subnet level (Azure puts AZ-awareness in resources, not subnets).
NSG (Network Security Group) - stateful 5-tuple ACL. Applied at NIC or subnet. AWS's "security group" equivalent.
ASG (Application Security Group) - name-based grouping of NICs to simplify NSG rule writing.
UDR (User-Defined Routes) - route tables. Required to force-tunnel egress through a firewall instead of out the default internet route.
Azure Firewall - managed L3-L7 firewall with FQDN filtering, threat intel, TLS inspection (Premium). The standard hub inspection device.
Azure DDoS Protection - Standard tier protects all public IPs in subscriptions with the plan enabled.
Application Gateway + WAF - L7 load balancer with optional WAF (OWASP CRS managed rules). Regional, with Application Gateway for Containers as the AKS-aware variant.
Azure Front Door - global L7 load balancer with WAF, similar to CloudFront + AWS WAF.
Private Link / Private Endpoint - private connectivity to Azure PaaS services (Storage, SQL DB, Cosmos DB, Key Vault, etc.) over a VNet-internal IP.
Service Endpoints - older mechanism (just routes traffic over the Azure backbone with source-IP signaling, not a true private endpoint). Mostly superseded by Private Link.
Virtual WAN (vWAN) - Azure's transit-hub equivalent. SD-WAN + branch + ExpressRoute aggregation.
VNet Peering - 1:1 connection between VNets in the same or different regions/subscriptions.
ExpressRoute - dedicated circuit to Azure.
Azure Bastion - managed jump-host. Use ZTNA where possible instead.

Azure-specific gotcha: NSGs apply to NICs and subnets, and rules at both layers are evaluated. A permissive subnet NSG can mask a restrictive NIC NSG (and vice versa); document which layer holds policy.

GCP networking primitives

GCP's networking model is meaningfully different from AWS and Azure in one important way: VPCs are global, not regional. A single VPC spans all GCP regions; subnets are regional within it. This makes some patterns simpler (no peering between regions of "the same network") and some patterns harder (the blast radius of a VPC misconfiguration is wider).

VPC - global. Holds regional subnets.
Subnet - regional. Auto-mode VPCs auto-create subnets in every region (avoid in production; use custom-mode).
Routes - global routing table per VPC (with regional or zonal overrides via instance tags).
Firewall rules - global, applied by tags or service accounts. Stateful. The closest equivalent to AWS security groups, but applied at the VPC level rather than per-NIC.
Hierarchical Firewall Policies - org / folder-scoped policies that override or apply alongside VPC rules. The right tool for "no workload anywhere may expose port 22 to the internet."
Cloud NAT - managed source-NAT. Regional, with options to disable internet egress entirely.
Cloud Router - dynamic routing for VPN, Interconnect, and Cloud NAT.
Private Service Connect (PSC) - GCP's PrivateLink equivalent. Connects consumers to producers (Google APIs, your own services, third-party SaaS) via private endpoints.
VPC Service Controls (VPC-SC) - unique to GCP. Creates a "service perimeter" around a set of GCP projects + services so that even with valid IAM credentials, data can't be moved outside the perimeter without an access policy allowing it. The strongest data-exfiltration control any cloud ships natively.
Cloud Armor - DDoS + WAF + bot management. The Google Front End is the first line of defense; Cloud Armor adds rules on top.
Network Connectivity Center (NCC) - transit-hub abstraction for multi-VPC and hybrid.
Cloud Interconnect - dedicated or partner physical connectivity.
Shared VPC (XPN) - host one VPC across many service projects. The standard multi-project landing-zone pattern.
Identity-Aware Proxy (IAP) - Google's ZTNA. Identity-aware tunneling to SSH/RDP and to web apps.

The combination most worth knowing: VPC Service Controls + Private Service Connect + Cloud NAT with disabled egress is the strongest "data can't leave" posture available in any cloud out of the box.

Security groups vs NACLs vs firewall

Three different objects with overlapping but distinct uses. Picking the wrong one is the source of most "I added a rule and it still doesn't work" tickets.

Dimension	Security Group / NSG	NACL (AWS) / Subnet NSG	Network Firewall
State	Stateful - return traffic auto-allowed	Stateless (AWS NACL) - must allow both directions	Stateful, with deep packet inspection
Scope	Per ENI / NIC	Per subnet boundary	Inline traffic flow (VPC route target)
Layer	L3-L4 (IP/port)	L3-L4 (IP/port)	L3-L7 (FQDN, regex, IDS signature, TLS)
Rule type	Allow only (deny-by-default)	Allow and deny, ordered	Allow, deny, alert, drop
Reference target	Other SG IDs (powerful) or CIDRs	CIDRs only	FQDNs, prefix lists, threat-intel feeds
Cost	Free	Free	~$0.395/hr/AZ + data (AWS NF, comparable Azure/GCP)
Best for	Default microseg, app-to-DB scoping	Broad denylist at subnet edge	Egress FQDN filtering, IDS, TLS inspection

The practical layering: security groups as the primary policy (default-deny, allow only what's needed, reference SG IDs not CIDRs), NACLs as a narrow blunt instrument (block known-bad CIDRs, hard-deny RDP/SSH from internet at the subnet edge as belt-and-suspenders), and Network Firewall on egress and east-west inspection (where FQDN allow-lists and IDS earn their keep). Don't try to do all your policy in one layer.

Private connectivity & service endpoints

One of the highest-leverage moves in cloud network security: stop reaching cloud-provider APIs (S3, Cosmos DB, Cloud Storage, KMS, Secrets Manager, etc.) over the public internet. Every major cloud has the primitive; what they cost and what they actually buy you differs.

AWS: VPC Endpoints & PrivateLink

Gateway endpoints - free; available only for S3 and DynamoDB. Use these unconditionally if you have a VPC that talks to either service.
Interface endpoints (PrivateLink) - ENIs in your VPC for any of ~150 AWS services. ~$0.01/hr/AZ + data. Use for KMS, STS, Secrets Manager, ECR, SSM, CloudWatch Logs, Lambda, SQS, SNS, and any other service you call from private subnets without internet egress.
VPC endpoint policies - IAM policy documents attached to the endpoint that scope which S3 buckets, KMS keys, or STS roles can be reached through that endpoint. This is the security feature, not the connectivity feature. A common control: VPC endpoint policy restricts STS to your own org's accounts, blocking confused-deputy exfil to attacker AWS accounts.
PrivateLink for third-party - Datadog, Snowflake, Confluent, MongoDB Atlas, dozens more SaaS publish PrivateLink endpoints. Use them; they cost the same as AWS-service PrivateLink and eliminate the third-party from your egress allow-list.

Azure: Private Link & Private Endpoint

Private Endpoint - a NIC in your VNet with a private IP, mapping to the PaaS service (Storage Account, SQL DB, Cosmos DB, Key Vault, etc.). DNS resolution is overridden so the service's public hostname resolves to the private IP.
Private Link Service - publish your own service over Private Link for tenant consumers.
Service Endpoints - the older, simpler primitive. Doesn't give a private IP - just routes traffic over the Azure backbone with source-IP signaling so the PaaS service can scope by source-VNet. Cheaper but less defense-in-depth; Private Endpoint is usually the right answer.
Private DNS zones - required for Private Endpoint to work correctly with the auto-rewritten hostnames. The most common Private Link misconfiguration is DNS not resolving inside the VNet.

GCP: Private Service Connect (PSC)

PSC for Google APIs - reach Google services (Cloud Storage, BigQuery, Cloud KMS, etc.) via a consumer VPC's private IP. Combine with VPC Service Controls and you have an exfiltration-resistant data plane.
PSC for published services - your own service exposed to consumers via PSC endpoints.
Private Google Access - older option; routes Google API traffic over the Google backbone via the subnet's existing internet route, without a public IP on the VM. Still requires the egress route; PSC is stricter.

The actual security value

Private endpoints are often sold as "cheaper than NAT egress" - and they are, for high-volume cloud-API traffic. But the security value is separate and larger: an endpoint policy lets you write a rule that says "this VPC can only reach S3 buckets in our own org's accounts," which IAM alone can't enforce (because IAM is enforced by the principal; endpoint policy is enforced by the network). Pair endpoint policies with VPC Service Controls (GCP), Resource Control Policies (AWS), or storage-account firewall rules (Azure) and you build a network-level data-exfiltration boundary.

Egress controls

This is the most-overlooked area in cloud network security. The cloud's defaults give you wide-open outbound: any private subnet with a NAT Gateway can reach any IP on the internet on any port. That's the path attackers use after they get in - SSRF to metadata service, credential theft, then beaconing out to a C2 server, then bulk data egress to an attacker-controlled S3 bucket. The controls, in order of increasing strictness and cost:

Level 0 - NAT-only (don't)

Private subnet → NAT Gateway → internet. No filtering on what destinations are allowed. The default for most cloud accounts. This is not a control; it's the absence of one. If a workload is compromised, the attacker has full outbound connectivity.

Level 1 - VPC endpoint policies

For traffic to AWS/Azure/GCP services, route it through private endpoints with restrictive endpoint policies. Scope which buckets / keys / accounts the workload can reach. This isn't a general egress control (it doesn't stop traffic to non-cloud destinations), but for the largest single egress class (cloud-API calls), it's substantial.

Level 2 - Network Firewall with FQDN allow-list

AWS Network Firewall, Azure Firewall, GCP Cloud NGFW (Palo Alto), or a third-party NGFW (Palo Alto VM-Series, Fortinet FortiGate-VM, Check Point CloudGuard, Cisco Secure Firewall Threat Defense). Place inline on egress, configure FQDN allow-listing: workloads can only egress to a curated list of domains (your own SaaS dependencies, package mirrors, OS update servers). Everything else is dropped. This is the single largest egress posture improvement available and is achievable in a few weeks for most environments.

Level 3 - TLS inspection

Network Firewall in TLS-terminating mode (with a per-workload root CA installed). Lets the firewall see inside the encrypted traffic for IDS/IPS, data-loss prevention, and DNS-over-HTTPS detection. Privacy and certificate-pinning workloads break under it; pick which traffic to inspect carefully.

Level 4 - Egress proxy with auth

Workloads must authenticate to an HTTP proxy (Squid with auth, smokescreen, sysdig egress, Cloudflare Gateway, Zscaler Internet Access) to reach the internet. Adds identity to the egress flow: a stolen credential without proxy access can't egress at all. The strongest pattern, but the most operational tax.

Level 5 - VPC Service Controls (GCP only)

The strongest data-egress control any cloud offers. Defines a service perimeter around a set of projects and Google services; data inside cannot flow to projects outside, even with valid IAM credentials. The right answer for any regulated workload (FedRAMP, HIPAA, PCI) running on GCP. AWS doesn't have a native equivalent (Resource Control Policies + VPC endpoint policies approximate it); Azure doesn't either (Private Link + storage firewall + Defender for Cloud's exfil-detection approximate it).

Egress filtering for Kubernetes

The above describes network-level egress. Inside the cluster, Cilium NetworkPolicy (with L7 visibility), Calico, or Kubernetes NetworkPolicy can scope egress per pod. Pair with cluster-level egress (cluster traffic egresses via a single NAT or Network Firewall) and the result is per-pod FQDN allow-listing. See the Kubernetes page for the details.

DNS security

DNS is the cheapest exfiltration channel and the most common C2 covert channel. Block it badly and you break the internet for your workloads; ignore it and attackers love you. The native controls:

Route 53 Resolver DNS Firewall - block, allow, or alert on DNS queries from your VPCs against managed threat-intel lists (newly-observed-domains, malware, C2) or custom domain lists. The single highest-value DNS control on AWS; turn it on with the AWS-managed rule groups by default.
Azure DNS Private Resolver + DNS security extensions - newer; integrates with Defender for DNS and Azure Firewall's DNS proxy for FQDN filtering.
Cloud DNS + DNS security policies - GCP's equivalent, with Cloud DNS response policies for sinkholing and allow/block lists.
Third-party DNS - Cisco Umbrella (Talos threat intel), Cloudflare Gateway DNS, Quad9, NextDNS, Infoblox BloxOne. Right for hybrid environments where the same DNS policy needs to apply across cloud + on-prem + endpoints.

DNS exfiltration detection

The signature is high entropy in subdomains (aHVnZXBheWxvYWQ.evil.example), high query volume to one zone, queries to recently-registered domains. Most modern detection platforms (GuardDuty for AWS, Defender for DNS for Azure, Chronicle for GCP, plus Wiz, CrowdStrike, SentinelOne on the workload side) include DNS-exfil detectors. Make sure DNS query logs flow to them - VPC DNS query logs are off by default on most clouds.

WAF

A Web Application Firewall sits in front of your HTTP services and applies L7 rules - block SQL injection patterns, OWASP Top 10 attack signatures, rate limits, bot signatures, geographic blocks. Useful, but bounded: a WAF stops the script-kiddie OWASP-payload attacks and most untargeted scanning, not a determined attacker with bespoke payloads. The market segments:

Cloud-native

AWS WAF - attaches to CloudFront, ALB, API Gateway, AppSync, App Runner. Managed rule groups from AWS and Marketplace (Fortinet, F5, Imperva, Cyber Security Cloud). The AWSManagedRulesCommonRuleSet + AWSManagedRulesKnownBadInputsRuleSet + AWSManagedRulesBotControlRuleSet is the starting baseline.
Azure WAF on Application Gateway or Azure Front Door, with the OWASP CRS managed rule set and Microsoft Threat Intelligence rule set.
Cloud Armor - GCP's WAF, attached to global external HTTPS load balancers. Includes preconfigured OWASP CRS rules and reCAPTCHA Enterprise integration for bot defense.

CDN-native (likely already in front of your app)

Cloudflare WAF - strong managed rulesets, the largest threat-intel feed in the WAF market, custom rules in a wirefilter-like DSL. Pairs with Cloudflare Bot Management and Rate Limiting.
Akamai App & API Protector (formerly Kona Site Defender) - enterprise-grade, deep Akamai-edge integration.
Fastly Next-Gen WAF - Signal Sciences-derived; strong on API-shaped traffic and developer ergonomics.
Imperva WAF, F5 BIG-IP Advanced WAF, Barracuda WAF - long-established players.

Picking one

If you already have a CDN, use its WAF. Stacking three WAFs (Cloudflare → AWS WAF → app-level ModSecurity) means three teams tuning rules and triple the false-positive surface for one ruleset's worth of protection. The right pattern: one WAF as the policy plane, ideally co-located with the CDN, with the cloud-native WAF as a thin fall-back for direct origin protection (block all non-CDN-origin traffic).

Rate limiting and CAPTCHA

The most effective WAF rule is usually a rate limit, not a signature. Per-IP, per-session, per-API-key, with a CAPTCHA challenge above a threshold. AWS WAF, Cloud Armor, Cloudflare, Fastly, and Akamai all support this; tune it before tuning the signature rules.

DDoS protection

The volumetric attack surface - L3/L4 floods, HTTP floods, slowloris, amplification - sits in front of the WAF. Modern attacks regularly cross 1 Tbps; the largest in 2025 hit 5+ Tbps. The options:

AWS Shield - Standard protects every AWS resource at no extra cost (basic infrastructure-layer mitigation). Advanced ($3000/month + data) adds dedicated DDoS response team, cost-protection (you don't pay for the surge in CloudFront/Route 53/ALB data charges from an attack), and Health-based detection. Worth the price only for revenue-critical public surfaces.
Azure DDoS Protection - basic protection is free at the platform level; Network and IP protection plans add tuning and rapid-response team access.
Cloud Armor - GCP's L7 DDoS protection. Always-on Adaptive Protection uses ML to detect anomalous patterns; Cloud Armor Edge Security Policies block at the Google network edge before reaching your origin.
Cloudflare DDoS Protection - unmetered DDoS mitigation across all plans; one of the largest mitigation networks in operation.
Akamai Prolexic - enterprise-grade BGP-routed DDoS scrubbing service. Right for organizations with non-HTTP critical services (gaming, financial trading, large enterprises with dedicated peering).
Imperva, F5 (Silverline), Radware, NETSCOUT (Arbor) - additional enterprise scrubbing providers.

For most cloud-native organizations, Cloudflare or the cloud's own offering (Shield Standard + Cloud Armor + Azure DDoS) covers the volumetric layer; layer the WAF on top.

Bot management & API security

Bot management

Distinguishing automated traffic from human is its own discipline. The signal is hundreds of behavioral, fingerprinting, and reputational features - not a simple ruleset. The players:

HUMAN (formerly PerimeterX BotDefender) - strong on ad fraud, account takeover, credential stuffing.
Cloudflare Bot Management - large data corpus from Cloudflare's edge; tight integration with their WAF and rate-limit.
DataDome - invisible-CAPTCHA approach, strong API and mobile-app bot defense.
Akamai Bot Manager - enterprise scale, deep edge integration.
Imperva Advanced Bot Protection, F5 Distributed Cloud Bot Defense - additional enterprise options.

For most workloads, the WAF's bot-control rule group plus per-route rate limits is enough. Dedicated bot-management makes sense when account takeover, scraping, or ad fraud is a measurable line item.

API security

APIs have a different threat model from web apps: BOLA (broken object-level authorization), excessive data exposure, mass assignment, missing rate limits per consumer. The OWASP API Security Top 10 is the reference. The dedicated tools:

Salt Security, Traceable, Noname Security (now part of Akamai), Cequence, Apiiro - runtime API observability + ML-based abuse detection.
WAF + API gateway - for many environments, AWS API Gateway / Azure APIM / Apigee / Kong / Tyk with WAF integration covers 80% of the API-security need at a fraction of the cost of a dedicated tool.
Shift-left - Stack Hawk, 42Crunch, Apiiro, Escape: scan OpenAPI specs and API endpoints in CI for BOLA/BFLA and authn/authz gaps. The most cost-effective intervention is usually here, not at runtime.

Service mesh & east-west mTLS

Inside a Kubernetes cluster (and increasingly across clusters), service-to-service traffic needs the same protections as user-to-service: encryption, identity, authorization, observability. Service meshes provide these as platform features without each app re-implementing them.

Istio - the most-deployed open-source mesh. Envoy sidecars (or ambient mode without sidecars), mTLS by default, AuthorizationPolicy for service-to-service ACLs, integrates with SPIFFE/SPIRE for workload identity.
Linkerd - CNCF graduated mesh. Rust-based proxy (linkerd2-proxy), lower resource overhead than Istio, simpler config. mTLS by default. The right choice when "I want mesh but not Istio."
Cilium Service Mesh - sidecarless, eBPF-based. mTLS, L7 policy, identity-aware. The technically-strongest option for new deployments; pairs naturally with Cilium NetworkPolicy.
AWS App Mesh - Envoy-based, managed control plane. Note AWS announced deprecation in 2024 in favor of VPC Lattice for cross-VPC service mesh.
GCP Anthos Service Mesh - managed Istio with Google's policy and observability layers.
AKS Istio add-on - Microsoft's supported Istio integration for AKS.
HashiCorp Consul Connect - mesh that spans Kubernetes and VMs, with the Consul service catalog underneath.

The mTLS-everywhere claim

"Mesh gives us mTLS everywhere" is the most common pitch and the most common over-claim. It does - within the mesh. Traffic that leaves the mesh (to a managed database, to a third-party API, from an unmeshed VM) is unencrypted unless you wire it through a mesh egress gateway. Plan for the boundary, not just the interior.

When you don't need a mesh

For a single cluster, single team, modest service count, and a language with good native TLS/JWT libraries - you may not need a mesh at all. The operational tax (sidecar resource overhead, debug complexity, mesh upgrades) is real. Cilium's sidecarless mode and Istio ambient mode both make this lighter; ambient + L4 policy is plausibly enough for many teams.

SASE / SSE

SASE (Secure Access Service Edge) is a 2019-Gartner-coined bundle of: SD-WAN + SWG (Secure Web Gateway) + CASB (Cloud Access Security Broker) + ZTNA + FWaaS (Firewall-as-a-Service), all delivered from a cloud-provider's globally-distributed PoPs. SSE (Security Service Edge) is the same minus SD-WAN - security-only, no networking layer. The players in 2026:

Zscaler - ZIA (web gateway), ZPA (ZTNA), ZDX (digital experience). The category-defining vendor; deep enterprise mindshare.
Netskope - strong CASB heritage, modern SSE platform.
Palo Alto Prisma Access - Palo Alto's cloud-delivered NGFW + ZTNA.
Cloudflare One - Cloudflare's SASE: Access (ZTNA), Gateway (SWG / DNS / Egress), CASB, Browser Isolation. Strong developer ergonomics; cheap for SMB / mid-market.
Cisco Umbrella + Cisco+ Secure Connect - DNS-led origin, broadening into full SASE.
Cato Networks - built SASE from scratch as a single platform; strong on SD-WAN integration.
Fortinet FortiSASE - extends FortiGate / FortiClient into the SASE shape.
Versa Networks, iboss, Forcepoint ONE, Check Point Harmony SASE - additional SASE/SSE platforms.

What SASE actually replaces

The MPLS-and-branch-firewall world. If you have offices that backhaul traffic to a central firewall, SASE shrinks that to "all users connect to the nearest SASE PoP, policy is applied there, traffic egresses from there." For a cloud-native, no-offices company, SASE is mostly the SWG + DNS + ZTNA layer; the FWaaS and SD-WAN portions are less load-bearing.

ZTNA

Zero Trust Network Access - the modern replacement for client VPN. Instead of "VPN in, you're on the corporate network," ZTNA brokers per-application access based on user identity, device posture, and contextual signals. The app doesn't get a public IP; the user doesn't get a network. Both connect to the broker. The market:

Cloudflare Access - identity-aware reverse proxy in front of any HTTP app or SSH/RDP target. Tight integration with Cloudflare's edge, easy IdP setup, generous free tier.
Zscaler Private Access (ZPA) - the enterprise standard. App connectors initiate outbound to ZPA cloud; no inbound exposure.
Palo Alto Prisma Access - ZTNA inside Palo Alto's SASE platform.
Twingate - developer-friendly, simple deployment, agentless options for SaaS.
Tailscale - WireGuard-based mesh VPN with ACLs and identity integration. Often the lowest-friction option for engineering teams; production-grade with proper ACLs and tag-based policy.
NetFoundry - Ziti-based programmable ZTNA, strong for app-embedded use cases.
Banyan Security (SonicWall) - device-trust-strong ZTNA.
Cloud-native:
- Azure App Proxy / Entra ID Application Proxy - reverse-proxy internal apps through Entra ID with conditional access. Free with Entra ID P1/P2.
- GCP Identity-Aware Proxy (IAP) - front any GCP web app or SSH endpoint with Google identity; trivial setup if you're GCP-native.
- AWS Verified Access - newer AWS service for VPN-less app access with identity and device-trust signals.

The pattern to replace VPN with

For most cloud-native shops: Cloudflare Access or Tailscale (engineering ergonomics), Azure App Proxy or GCP IAP (free with the cloud you already use), or Zscaler ZPA (enterprise scale). The bastion-host-on-a-public-subnet pattern is the wrong answer in 2026; ZTNA does the same job with identity-aware policy and audit logs.

Microsegmentation

Fine-grained policy on east-west traffic between workloads - beyond what security groups or NSGs can express. Two flavors:

Host-based (agent on every workload)

Illumio - the category leader; label-based policy across hybrid environments, application-dependency-mapping that makes the rest of the work tractable.
Akamai Guardicore Segmentation (formerly Guardicore Centra) - strong process-level visibility, ransomware-containment use case.
ColorTokens, vArmour, Cisco Secure Workload (formerly Tetration) - enterprise alternatives.

Network-based (policy in the cloud / cluster fabric)

Cloud-native - security groups (AWS), NSGs (Azure), VPC firewall rules (GCP). Adequate for most environments if used disciplined; default-deny + reference-by-SG-ID.
Calico - Kubernetes NetworkPolicy, with Calico Enterprise adding global network policy, threat-feed integration, and observability.
Cilium - eBPF-based CNI with L3/L4/L7 NetworkPolicy, identity-aware policy, no kube-proxy required. The technical state-of-art for Kubernetes microsegmentation.
Kubernetes NetworkPolicy - the standard. Default deny-all egress + allow-list per pod is the minimum bar; most managed Kubernetes services support it via their CNI.

The depend-on-labels reality

Microsegmentation is only as good as the labels/tags you've applied to workloads. The work of microseg is 80% inventory and label hygiene, 20% rule-writing. Tools that auto-map dependencies and recommend rules (Illumio Core, Guardicore, Cilium Hubble) collapse that 80% considerably; tools that don't will sit unused.

eBPF & cloud networking

eBPF (extended Berkeley Packet Filter) lets you safely run sandboxed programs in the Linux kernel without changing kernel source or loading kernel modules. For networking, it means programmable packet processing in the data path - at line rate, without sidecar tax. The networking-relevant projects:

Cilium - eBPF CNI for Kubernetes. Replaces kube-proxy, gives identity-aware L3-L7 NetworkPolicy, native service mesh (with or without sidecars), Hubble for observability. The default CNI on GKE Dataplane V2, EKS Auto Mode, and Azure CNI Powered by Cilium.
Tetragon - Cilium's sibling for runtime security: process, file, and network observability at kernel level. Sees what eBPF-enabled networking sees, plus the host actions.
Falco - CNCF runtime-detection tool with eBPF data sources. Network-relevant rules detect lateral movement, unusual connections, port scanning from inside the cluster.
Pixie (now part of New Relic / open source) - eBPF-based observability for Kubernetes. Auto-instruments HTTP, gRPC, DNS, MySQL, PostgreSQL traffic without sidecars or code changes.
Inspektor Gadget, bpftrace, kubectl-trace - the diagnostic tools you reach for in incident response when you need packet-level visibility without deploying anything new.

The strategic point: eBPF is the data plane for cloud-native networking and security for the next decade. Programs that have invested in Cilium + Hubble + Tetragon end up with a coherent stack covering CNI, NetworkPolicy, service mesh, and runtime detection on the same engine - instead of four point products with overlapping agents.

Cross-cloud connectivity

For multi-cloud and hybrid programs, how the clouds (and on-prem) reach each other is its own discipline. Options:

Internet + VPN - IPsec tunnels over public internet. Cheapest, lowest performance, often the right answer for non-critical paths.
Cloud-native private links to provider backbones - AWS Direct Connect, Azure ExpressRoute, GCP Cloud Interconnect. Direct circuits to the cloud provider's edge. Best performance, highest cost.
Network-as-a-Service / NaaS exchanges:
- Megaport - cloud-on-ramp marketplace with Megaport Cloud Router for cross-cloud routing without on-prem.
- Equinix Fabric - virtual circuits between Equinix data centers and major clouds.
- Console Connect (PCCW Global), Packet Fabric, InterCloud - additional NaaS providers.
SD-WAN - Cisco Viptela, VMware VeloCloud (now Broadcom), Versa, Aryaka, Silver Peak / Aruba EdgeConnect, Fortinet Secure SD-WAN. Right when you have branch offices in the mix.
Overlay networks - Tailscale, Twingate, Nebula, Zerotier, Netmaker. Use for app-level connectivity (workload-to-workload across clouds), not for site-to-site at scale.

The performance trap: cross-cloud egress is expensive ($0.05-$0.12/GB on most clouds outbound to the internet, sometimes more to specific destinations), and the path matters. Direct interconnect between AWS and GCP via Megaport in a colo can be substantially cheaper at volume than internet egress; do the math at GB/month scale.

Flow logs & network observability

The audit trail of every connection in your cloud is the single highest-value piece of data you control. Get it on, get it stored, get it queryable.

VPC Flow Logs (AWS) - per-ENI, per-subnet, or per-VPC. Fields include source/dest IP, port, protocol, packets, bytes, action (ACCEPT/REJECT). Custom format adds VPC-ID, instance-ID, TCP flags, more. Send to S3 (cheap, queried via Athena) or CloudWatch Logs (more expensive, real-time).
NSG Flow Logs (Azure) - per-NSG. Version 2 adds bytes/packets per flow. Stored in Storage Accounts; Traffic Analytics builds a dashboard.
VPC Flow Logs (GCP) - per-subnet, configurable sampling rate. Streams to Cloud Logging; export to BigQuery for analysis.
VPC Traffic Mirroring (AWS), Azure Virtual Network TAP, GCP Packet Mirroring - full packet capture for selected workloads. Use sparingly (expensive at scale); reach for it when you need DPI for IR or compliance.
Third-party packet/flow tools - Arkime (formerly Moloch), ntopng, Corelight (Zeek), ExtraHop Reveal(x), Vectra. NDR (network detection and response) products that consume mirrored or captured traffic for behavioral analytics.
Hubble (Cilium), Pixie - Kubernetes-specific flow observability with eBPF.

Practical defaults: enable VPC Flow Logs on every production VPC at full fidelity, store in cheap object storage with 1-2 year retention, query via Athena/Log Analytics/BigQuery for IR, and stream a sampled subset (or filtered "rejected" flows + high-value source/destination flows) to your SIEM. Cost is meaningful but not prohibitive; the IR value is enormous.

AWS, Azure, and GCP side-by-side

The native network-security capabilities each cloud ships, reduced to a one-screen reference:

Capability	AWS	Azure	GCP
Network boundary	VPC (regional)	VNet (regional)	VPC (global, with regional subnets)
Stateful host firewall	Security Group	NSG (NIC + subnet) + ASG	VPC firewall rules + Hierarchical Firewall Policies
Stateless subnet ACL	NACL	(NSG at subnet - stateful)	(VPC firewall - stateful)
Managed L7 firewall	AWS Network Firewall, Cloud NGFW	Azure Firewall Premium	Cloud NGFW (Palo Alto), Cloud Armor
Transit hub	Transit Gateway, Cloud WAN	Virtual WAN	Network Connectivity Center
Private cloud-API endpoints	VPC Endpoints (Gateway + Interface PrivateLink)	Private Link / Private Endpoint	Private Service Connect
Egress NAT	NAT Gateway (per AZ)	NAT Gateway (regional), VNG	Cloud NAT (regional)
DNS security	Route 53 Resolver DNS Firewall	DNS Private Resolver + Defender for DNS	Cloud DNS response policies
WAF	AWS WAF (CloudFront / ALB / API GW)	App Gateway WAF, Front Door WAF	Cloud Armor
DDoS	Shield Standard (free), Shield Advanced	DDoS Protection (Network / IP)	Cloud Armor + adaptive protection
ZTNA / app proxy	AWS Verified Access	Entra ID App Proxy	Identity-Aware Proxy (IAP)
Exfil control perimeter	VPC endpoint policies + RCPs (approximate)	Storage firewall + Private Link (approximate)	VPC Service Controls (best in class)
Flow logs	VPC Flow Logs (S3 / CW)	NSG Flow Logs + Traffic Analytics	VPC Flow Logs (Logging / BigQuery)
Packet mirroring	Traffic Mirroring	Virtual Network TAP	Packet Mirroring
Dedicated hybrid link	Direct Connect	ExpressRoute	Cloud Interconnect

The non-obvious takeaways: GCP's VPC Service Controls is the strongest single egress/exfil control any cloud offers natively - if you have a regulated workload and a choice of cloud, this is a real differentiator. Azure's NSG-at-NIC-and-subnet model gives you a layering option AWS doesn't have, but adds confusion to debugging. AWS has the broadest set of network primitives but the least opinionated defaults; you do the assembly.

Maturity stages

A staging model for a cloud network-security program:

Stage 1 - Flat VPC

Default VPC (or one big VPC). Permissive security groups (0.0.0.0/0 all over). No flow logs. No NAT-egress filtering. Public-IP-on-instance is common. Most early-stage cloud environments live here longer than they should.

Stage 2 - Segmented

Custom VPCs with tiered subnets (public/private/data). Security groups follow least-privilege and reference SG IDs, not CIDRs. Flow logs enabled and stored. Private endpoints for the most-used cloud APIs (S3, KMS). NACLs as a backup deny-list. WAF in front of public surfaces.

Stage 3 - Egress-controlled

Hub-and-spoke topology via Transit Gateway / vWAN / NCC. Network Firewall inline on egress with FQDN allow-list. DNS firewall enabled with managed threat-intel rule groups. Most cloud-API traffic on PrivateLink with endpoint policies. CSPM continuously alerts on policy drift.

Stage 4 - Microsegmented

Kubernetes NetworkPolicy default-deny + Cilium / Calico per-namespace policy. Service mesh mTLS on east-west. Host-based microsegmentation (Illumio / Guardicore) for hybrid workloads. ZTNA replaces VPN. SASE/SSE in front of user traffic. Flow logs flow to NDR with behavioral analytics.

Stage 5 - Continuous-verified

Identity-aware policy at every hop. VPC Service Controls (GCP) or equivalent network-level exfil boundary. Policy-as-code in CI prevents the misconfigurations from ever shipping. Network behavior baselines and anomaly detection auto-isolate compromised workloads. Egress policy is per-workload, not per-VPC.

Most production cloud environments live at Stage 2 with aspirations toward Stage 3. Stage 4 takes intentional investment; Stage 5 is rare and usually limited to regulated workloads where it earns its keep.

Common pitfalls

NAT-only egress with no filter. The default. Workload compromised → attacker has unrestricted outbound. Network Firewall with FQDN allow-list is the single biggest improvement you can ship in a sprint.
Default VPC in production. Permissive defaults, present-in-every-region, often unattributed in IPAM. Disable creation org-wide; if one exists in a prod account, it's almost certainly accidental.
Security groups with 0.0.0.0/0 on anything but a public load balancer. The bread-and-butter finding of every CSPM. Default-deny, allow only what's needed, prefer SG-ID references over CIDRs.
VPC peering everything to everything. Becomes O(n²) connections, no transit policy, no central inspection. Hub-and-spoke via TGW / vWAN / NCC by the third VPC.
No flow logs. When IR happens you can't answer "did this host ever talk to this IP?" Turn them on in every prod VPC; cost is small relative to value.
Public RDS / Cosmos DB / Cloud SQL. Managed databases default-public on some configurations. Always: private-subnet, security-group-scoped to the app tier, no public IP, encryption in transit and at rest.
Bastion on a public IP. "We need SSH access to the VPC" → bastion on public subnet with port 22 open → bastion-credential theft → full VPC pivot. ZTNA (IAP, Cloudflare Access, Tailscale, AWS Verified Access) is the modern answer.
One WAF in front of another. Cloudflare → AWS WAF → ModSecurity in three teams' rule libraries with three sets of false positives. Pick one policy plane.
TLS inspection turned on for everything. Breaks certificate-pinning apps, mobile SDKs, and anything that does its own validation. Inspect by exception, not by default.
Forgetting Kubernetes networking. Cluster security groups are not enough - pods inside a cluster need NetworkPolicy (default-deny + per-namespace allow). Without it, one compromised pod can talk to every other pod in the cluster.

FAQ

What makes cloud network security different from on-prem network security?

Three things. First, the perimeter is gone - every workload has a public-by-default API surface unless you actively block it, so identity and policy become the perimeter. Second, the network is software-defined - a CIDR, a route table, a security group, and a firewall rule are all API calls, which means they change as fast as code deploys and they leave audit logs. Third, egress is the new ingress - the most common modern attack pattern is an exploited workload calling out to an attacker-controlled domain, so outbound filtering matters as much as inbound. Cloud network security is less about firewalls between zones and more about identity-aware policy at every hop, private connectivity to cloud APIs, and continuous flow-log analysis.

Do I need private endpoints (PrivateLink, Private Service Connect) or are NAT Gateway and TLS enough?

TLS protects data in flight; it does not stop a compromised workload from exfiltrating to a third-party S3 bucket, a public DNS-over-HTTPS resolver, or an attacker-owned API. Private endpoints (VPC Endpoints / AWS PrivateLink, Azure Private Link, GCP Private Service Connect) let you reach cloud-provider APIs and selected SaaS services without traversing the internet, and - critically - they let you write VPC endpoint policies that scope which accounts, principals, and resources can be reached. Combined with a NAT Gateway that has Network Firewall in front of it, you get a deny-by-default egress posture. Skip the endpoints and you are relying on IAM alone to stop data movement, which has a worse track record than a defense-in-depth design.

Security groups, NACLs, or a network firewall - when do I use which?

Use all three, in layers. Security groups (AWS) and NSGs (Azure) are stateful, applied at the instance/ENI level, and the right tool for default microsegmentation between workloads - they evaluate fast and scale to thousands of rules. NACLs are stateless, applied at the subnet boundary, and only useful for broad deny-listing (block known-bad CIDRs, hard-deny SSH from the internet at the subnet edge). A network firewall (AWS Network Firewall, Azure Firewall, Palo Alto Cloud NGFW, Fortinet, Check Point CloudGuard) sits inline on egress and east-west traffic to do FQDN filtering, deep packet inspection, TLS inspection, and IDS/IPS - none of which security groups can do. The cost gradient runs the same direction as the capability gradient; spend on firewalls only where you need their L7 features.

Is SASE / SSE worth it for a cloud-native company with no offices?

Maybe. SASE was built for the hub-and-spoke MPLS world where users sat in branches and applications sat in a datacenter; for a fully cloud-native, remote-first company, ZTNA (Cloudflare Access, Zscaler ZPA, Tailscale, Twingate, Google IAP, Azure App Proxy) plus an SWG (Zscaler ZIA, Netskope, Cloudflare Gateway) often gives you what you actually need without the full SASE bundle. Where SASE pays off: hybrid workforce with offices, lots of unmanaged SaaS that needs CASB, regulated workloads that need data-loss prevention on egress, and any environment where you need a single policy plane across user-to-app and app-to-internet. If your security model is already "identity is the perimeter, every app is behind an identity-aware proxy, and workloads egress through Network Firewall with FQDN allow-lists," you may have built most of SASE without buying it.

What is the difference between microsegmentation and just using security groups?

Security groups give you network-level segmentation between workloads in a VPC, scoped by IP, port, and (in AWS) other security-group IDs. That is a form of microsegmentation, and for many environments it is enough. True microsegmentation tools (Illumio, Guardicore/Akamai, ColorTokens, vArmour, Cisco Secure Workload) add three things: identity-aware policy (the workload's label, role, or process - not its IP), east-west visibility across hybrid environments (on-prem + multi-cloud + Kubernetes in one map), and policy at the host level (so a host-based agent enforces even if the cloud security group is misconfigured). For a single-cloud, Kubernetes-heavy environment, Cilium NetworkPolicy + cloud security groups often replaces the commercial microsegmentation tool. For a hybrid bank or hospital, the commercial tool earns its keep.

How important are flow logs?

More important than they look. VPC Flow Logs (AWS, GCP), NSG Flow Logs (Azure), and the equivalents are the audit trail of every connection - source, destination, port, bytes, action. They answer "did this workload ever talk to this CIDR?" which is the question you need answered in every incident, and they are the input to behavioral analytics (GuardDuty, Defender for Cloud, Chronicle, Wiz, Vectra, ExtraHop) that detect lateral movement, C2 traffic, and data exfiltration patterns. Costs scale with volume; the practical pattern is enable flow logs at full fidelity in all production VPCs, store in cheap storage (S3 / Blob / GCS) with a 1-2 year retention, and stream a sample (or all of it) to your detection platform. Skip flow logs and your detection coverage collapses to whatever the agents on the hosts can see.

Should I use a cloud-native WAF or a third-party (Cloudflare, Akamai, Imperva)?

Depends on what sits in front of your app. If everything is already behind Cloudflare or Akamai (because you needed their CDN, DDoS, and bot management), use their WAF - it is the cheapest, most-mature option and you avoid double-decrypting TLS. If you are AWS-native and your apps front directly via ALB or CloudFront, AWS WAF with managed rule groups (AWSManagedRulesCommonRuleSet, AWSManagedRulesKnownBadInputsRuleSet, plus a bot-control rule group) is good enough for most workloads and cheaper than a third-party. The pattern to avoid: stacking three WAFs (Cloudflare → AWS WAF → app-level ModSecurity) and paying three rule-tuning teams. Pick one as the policy plane; let the others pass through.

Where next

Zero Trust - the architecture pattern these network controls implement.
Landing Zones - the multi-account foundation that puts VPCs and TGW in the right shape from day one.
IAM & Identity - identity-aware policy is the other half of modern network security.
Kubernetes - service mesh, NetworkPolicy, Cilium, and eBPF networking in detail.
Cloud SOC - where flow logs and NDR signals get worked.
Friday Zoom - VPC design, egress, and SASE come up regularly. Drop in.