Multicluster zone and region failover
Configure zone and region-aware failover for global services in a multicluster ambient mesh.
About this guide
In a multicluster ambient mesh, locality (region, zone, network) drives load balancing and failover decisions. This guide shows you how to configure zone and region-aware failover for global services, covering both L4 failover with ztunnel and L7 failover with waypoints using explicit failover priority.
For conceptual information about multicluster load balancing and failover, see the load balancing and failover overview.
Before you begin
To demonstrate zone and region-aware failover, this guide requires the following cluster topology:
- Two clusters deployed in different regions (for example,
us-east-1andus-west-2) - Multiple nodes per cluster in different zones within each region (for example,
us-east-1aandus-east-1bin cluster 1)
This topology allows you to test both cross-zone failover within a region and cross-region failover between clusters.
- Two clusters deployed in different regions (for example,
Ensure that the following locality labels are set on nodes in each cluster. For guidance on setting locality labels, see the Kubernetes topology and Istio locality documentation.
topology.kubernetes.io/regiontopology.kubernetes.io/zone
Save the kubeconfig contexts of each cluster where you installed the multicluster mesh as environment variables.
export context1=<cluster1_context> export context2=<cluster2_context>Set up a multicluster ambient mesh with the Gloo Operator or Helm.
Step 1: Verify locality labels
Verify that locality labels are set on nodes and that endpoints inherit locality correctly.
Check node labels in each cluster.
for ctx in ${context1} ${context2}; do echo "=== Cluster: $ctx ===" kubectl --context=$ctx get nodes -o custom-columns=\ 'NAME:.metadata.name,REGION:.metadata.labels.topology\.kubernetes\.io/region,ZONE:.metadata.labels.topology\.kubernetes\.io/zone' doneExample output:
=== Cluster: cluster1 === NAME REGION ZONE node-1 us-east-1 us-east-1a node-2 us-east-1 us-east-1b === Cluster: cluster2 === NAME REGION ZONE node-1 us-west-2 us-west-2a node-2 us-west-2 us-west-2bReview the remote peering gateways to verify that they have locality labels. In an east-west gateway setup, the mesh uses these gateway labels to determine the locality of all remote endpoints exposed through that gateway.
kubectl --context=${context1} get deploy -n istio-system istio-eastwestgateway -o yaml | grep -A3 "topology.istio.io" kubectl --context=${context2} get deploy -n istio-system istio-eastwestgateway -o yaml | grep -A3 "topology.istio.io"Example output showing locality labels on the gateway in cluster 1:
topology.istio.io/network: cluster1-network topology.kubernetes.io/region: us-east-1 topology.kubernetes.io/zone: us-east-1aExample output showing locality labels on the gateway in cluster 2:
topology.istio.io/network: cluster2-network topology.kubernetes.io/region: us-west-2 topology.kubernetes.io/zone: us-west-2a
Step 2: Deploy global services across clusters
Deploy the httpbin sample app as a global service in both clusters to test multicluster load balancing and failover.
Multicluster topology: The following diagram shows the multicluster setup with global services. Each cluster has local endpoints, and the global service hostname (in-ambient.httpbin.mesh.internal) is accessible from both clusters. Locality labels (region, zone) drive load balancing and failover decisions.
graph LR
subgraph Cluster2["Cluster 2 (us-west-2)"]
direction TB
Client2[client-in-ambient]
Ztunnel2[Client ztunnel]
Backend2["in-ambient pod<br/>(us-west-2a)"]
Client2 -->|"Global hostname<br/>(in-ambient.httpbin.<br/>mesh.internal)"| Ztunnel2
Ztunnel2 --> Backend2
end
subgraph Cluster1["Cluster 1 (us-east-1)"]
direction TB
Client1[client-in-ambient]
Ztunnel1[Client ztunnel]
Backend1["in-ambient pod<br/>(us-east-1a)"]
Client1 -->|"Global hostname<br/>(in-ambient.httpbin.<br/>mesh.internal)"| Ztunnel1
Ztunnel1 --> Backend1
end
Ztunnel2 -.->|Failover when<br/>local unavailable| Backend1
Ztunnel1 -.->|Failover when<br/>local unavailable| Backend2
Deploy the
in-ambienthttpbin sample app in both clusters. This manifest creates thehttpbinnamespace with anin-ambientbackend service.for ctx in ${context1} ${context2}; do kubectl --context=$ctx apply -f https://raw.githubusercontent.com/solo-io/doc-examples/main/istio/sample-apps/in-ambient.yaml doneDeploy the
client-in-ambientclient app in both clusters.for ctx in ${context1} ${context2}; do kubectl --context=$ctx apply -f https://raw.githubusercontent.com/solo-io/doc-examples/main/istio/sample-apps/client-in-ambient.yaml doneVerify that the pods are running in both clusters.
for ctx in ${context1} ${context2}; do echo "=== Cluster: $ctx ===" kubectl --context=$ctx get pods -n httpbin doneExample output:
=== Cluster: cluster1 === NAME READY STATUS RESTARTS AGE client-in-ambient-6b5c96c4f8-x2j9k 1/1 Running 0 30s in-ambient-7d8f9b6c54-abc12 1/1 Running 0 45s === Cluster: cluster2 === NAME READY STATUS RESTARTS AGE client-in-ambient-6b5c96c4f8-y3k0l 1/1 Running 0 30s in-ambient-7d8f9b6c54-def34 1/1 Running 0 45sLabel the
httpbinnamespace to add the apps to the ambient mesh.for ctx in ${context1} ${context2}; do kubectl --context=$ctx label ns httpbin istio.io/dataplane-mode=ambient doneLabel the
in-ambientservice withsolo.io/service-scope=globalto expose it as a global service across clusters.for ctx in ${context1} ${context2}; do kubectl --context=$ctx label service in-ambient -n httpbin solo.io/service-scope=global doneVerify that the global ServiceEntry with a hostname in the format
in-ambient.httpbin.mesh.internalis created in theistio-systemnamespace for the labeled services. This defaultmesh.internalhostname makes the endpoint for your service available across the multicluster mesh.for ctx in ${context1} ${context2}; do echo "=== Cluster: $ctx ===" kubectl --context=$ctx get serviceentry -n istio-system | grep in-ambient doneExample output:
=== Cluster: cluster1 === autogen.httpbin.in-ambient ["in-ambient.httpbin.mesh.internal"] STATIC 30s === Cluster: cluster2 === autogen.httpbin.in-ambient ["in-ambient.httpbin.mesh.internal"] STATIC 30s
Step 3: Test default multicluster L4 failover with ztunnel
Test the default load balancing behavior, which uses PreferNetwork mode to prefer local endpoints.
Default failover behavior (PreferNetwork mode): The default PreferNetwork mode prioritizes endpoints in the same network, routing all traffic to local endpoints when they are healthy. The following diagram shows failover behavior when local endpoints in cluster 1 become unavailable - ztunnel automatically fails over to endpoints in cluster 2, even though they are in a different region and network.
graph LR
subgraph Cluster1["Cluster 1 (us-east-1)"]
Client1[client-in-ambient]
Ztunnel1[Client ztunnel<br/>PreferNetwork mode]
Backend1["in-ambient pod<br/>(us-east-1)<br/>✗ Unavailable"]
end
subgraph Cluster2["Cluster 2 (us-west-2)"]
Backend2["in-ambient pod<br/>(us-west-2)<br/>✓ Healthy"]
end
Client1 -->|Request to global hostname| Ztunnel1
Ztunnel1 -.->|Local unavailable| Backend1
Ztunnel1 -->|Failover to remote| Backend2
linkStyle 0 stroke:#2068F3,stroke-width:2px
linkStyle 1 stroke:#999,stroke-width:2px
linkStyle 2 stroke:#2068F3,stroke-width:2px
Send requests from the client in cluster 1 to the global service. With the default
PreferNetworkmode, traffic prefers endpoints in the same cluster network before routing to remote cluster networks.kubectl --context=${context1} exec -n httpbin deploy/client-in-ambient -- sh -c " for i in \$(seq 1 10); do curl -s in-ambient.httpbin.mesh.internal:8000/hostname done"Example output, in which the
/hostnameendpoint returns the pod hostname showing which cluster handled each request. WithPreferNetworktraffic distribution, responses come primarily from the local cluster.in-ambient-7d8f9b6c54-abc12 in-ambient-7d8f9b6c54-abc12 in-ambient-7d8f9b6c54-abc12 in-ambient-7d8f9b6c54-abc12 ...Simulate a failure by scaling down the
in-ambientservice in cluster 1.kubectl --context=${context1} scale deployment in-ambient -n httpbin --replicas=0Send requests again from the client in cluster 1 to the global service, and verify that traffic now fails over to cluster 2.
kubectl --context=${context1} exec -n httpbin deploy/client-in-ambient -- sh -c " for i in \$(seq 1 5); do curl -s in-ambient.httpbin.mesh.internal:8000/hostname done"Example output, in which responses now come from the cluster 2 pod after failover:
in-ambient-8e9f0c7d65-xyz98 in-ambient-8e9f0c7d65-xyz98 in-ambient-8e9f0c7d65-xyz98 in-ambient-8e9f0c7d65-xyz98 in-ambient-8e9f0c7d65-xyz98Scale the
in-ambientservice back up in cluster 1.kubectl --context=${context1} scale deployment in-ambient -n httpbin --replicas=1
Step 4: Configure zone-aware traffic distribution
Configure traffic distribution to prefer endpoints in the same zone, then same region. To demonstrate zone-aware failover, you need multiple replicas spread across different zones within cluster 1.
Zone-aware traffic distribution (PreferClose mode): The following diagram illustrates how the PreferClose mode prioritizes endpoints. Traffic prefers endpoints in the same zone first, then the same region, and only fails over to other regions when no closer endpoints are available.
graph TB
subgraph Cluster1["Cluster 1 (us-east-1)"]
Client1[client-in-ambient<br/>us-east-1a]
Ztunnel1[Client ztunnel<br/>PreferClose mode]
Backend1a["in-ambient pod<br/>(us-east-1a)<br/>Priority 1: Same zone"]
Backend1b["in-ambient pod<br/>(us-east-1b)<br/>Priority 2: Same region"]
Client1 -->|Request to global hostname| Ztunnel1
Ztunnel1 -->|Prefer| Backend1a
Ztunnel1 -.->|Fallback| Backend1b
end
subgraph Cluster2["Cluster 2 (us-west-2)"]
Backend2["in-ambient pod<br/>(us-west-2a)<br/>Priority 3: Different region"]
end
Ztunnel1 -.->|Last resort| Backend2
style Backend1a fill:#2068F3,color:#fff
Scale the
in-ambientdeployment in cluster 1 to 2 replicas so that pods are scheduled in different zones (us-east-1a and us-east-1b).kubectl --context=${context1} scale deployment in-ambient -n httpbin --replicas=2Verify that the pods are running in different zones.
kubectl --context=${context1} get pods -n httpbin -l app=in-ambient -o wideExample output showing pods in different zones:
NAME READY STATUS RESTARTS AGE IP NODE in-ambient-7d8f9b6c54-abc12 1/1 Running 0 30s 10.0.1.5 node-us-east-1a in-ambient-7d8f9b6c54-def34 1/1 Running 0 30s 10.0.2.8 node-us-east-1bAnnotate the
in-ambientservice with thenetworking.istio.io/traffic-distribution=PreferCloseannotation. ThePreferClosemode prioritizes endpoints in the same zone first, then the same region, and only fails over to other regions when no closer endpoints are available.for ctx in ${context1} ${context2}; do kubectl --context=$ctx annotate service in-ambient -n httpbin \ networking.istio.io/traffic-distribution=PreferClose --overwrite doneSend requests and verify that traffic prefers endpoints in the same zone as the client.
kubectl --context=${context1} exec -n httpbin deploy/client-in-ambient -- sh -c " for i in \$(seq 1 10); do curl -s in-ambient.httpbin.mesh.internal:8000/hostname done" | sort | uniq -cExample output showing all traffic going to the pod in the same zone (us-east-1a) as the client:
10 in-ambient-7d8f9b6c54-abc12Delete the pod in the same zone to simulate a zone-level failure.
POD_SAME_ZONE=$(kubectl --context=${context1} get pod -n httpbin -l app=in-ambient -o jsonpath='{.items[0].metadata.name}') kubectl --context=${context1} delete pod -n httpbin $POD_SAME_ZONESend requests and verify that traffic fails over to endpoints in the same region but different zone (us-east-1b).
kubectl --context=${context1} exec -n httpbin deploy/client-in-ambient -- sh -c " for i in \$(seq 1 10); do curl -s in-ambient.httpbin.mesh.internal:8000/hostname done" | sort | uniq -cExample output showing traffic now going to the pod in us-east-1b:
10 in-ambient-7d8f9b6c54-def34Scale down the in-ambient deployment in cluster 1 to simulate all endpoints in the region becoming unavailable.
kubectl --context=${context1} scale deployment in-ambient -n httpbin --replicas=0Send requests and verify that traffic fails over to endpoints in cluster 2 (different region).
kubectl --context=${context1} exec -n httpbin deploy/client-in-ambient -- sh -c " for i in \$(seq 1 10); do curl -s in-ambient.httpbin.mesh.internal:8000/hostname done" | sort | uniq -cExample output showing traffic now going to the pod in cluster 2 (us-west-2):
10 in-ambient-8e9f0c7d65-ghi78Scale the deployment back to 2 replicas to restore the endpoints.
kubectl --context=${context1} scale deployment in-ambient -n httpbin --replicas=2
Step 5: Add a waypoint for L7 multicluster failover
Create waypoint proxies for L7 policy enforcement and HTTP-aware failover.
Traffic flow with waypoints in multicluster: The following diagram shows how traffic flows when waypoints are deployed in each cluster. The client’s ztunnel routes to the local waypoint, which then performs L7 load balancing to backend endpoints across clusters. The PreferClose setting configured on the service in the previous step continues to apply, but is now enforced at the waypoint instead of the ztunnel.
graph TB
subgraph Cluster1["Cluster 1 (us-east-1)"]
Client1[client-in-ambient]
Ztunnel1[Client ztunnel]
Waypoint1["Waypoint proxy (L7)"]
Backend1["in-ambient pod<br/>(us-east-1a)"]
Client1 -->|Request to global hostname| Ztunnel1
Ztunnel1 -->|HBONE| Waypoint1
Waypoint1 --> Backend1
end
subgraph Cluster2["Cluster 2 (us-west-2)"]
Waypoint2["Waypoint proxy (L7)"]
Backend2["in-ambient pod<br/>(us-west-2a)"]
end
Waypoint1 -.->|L7 failover| Backend2
style Waypoint1 fill:#2068F3,color:#fff
style Waypoint2 fill:#2068F3,color:#fff
Create a waypoint Gateway in both clusters.
for ctx in ${context1} ${context2}; do kubectl --context=$ctx apply -f- <<EOF apiVersion: gateway.networking.k8s.io/v1 kind: Gateway metadata: name: httpbin-waypoint namespace: httpbin spec: gatewayClassName: istio-waypoint listeners: - name: mesh port: 15008 protocol: HBONE allowedRoutes: namespaces: from: Same EOF doneLabel the
httpbinnamespaces to use the waypoints.for ctx in ${context1} ${context2}; do kubectl --context=$ctx label namespace httpbin istio.io/use-waypoint=httpbin-waypoint doneWait for the waypoints to be deployed.
for ctx in ${context1} ${context2}; do kubectl --context=$ctx -n httpbin rollout status deployment/httpbin-waypoint doneVerify that traffic now flows through the waypoint by sending a request from the client in cluster 1.
kubectl --context=${context1} exec -n httpbin deploy/client-in-ambient -- \ curl -s in-ambient.httpbin.mesh.internal:8000/hostname
Step 6: Apply DestinationRule for explicit multicluster failover priority
Create a DestinationRule with explicit failover priority and outlier detection for HTTP-aware failover.
DestinationRule failover with explicit priority: The following diagram shows how the DestinationRule with failoverPriority controls multicluster failover. The waypoint routes to endpoints based on the priority order (zone first, then region), and uses HTTP-aware outlier detection to quickly eject unhealthy endpoints.
graph TB
subgraph Cluster1["Cluster 1 (us-east-1)"]
Client1[client-in-ambient]
Ztunnel1[Client ztunnel]
Waypoint1["Waypoint proxy<br/>Enforces L7 DestinationRule<br/>(failoverPriority: zone, region)"]
Backend1["in-ambient pod<br/>(us-east-1a)<br/>✗ Unavailable"]
Client1 -->|Request to global hostname| Ztunnel1
Ztunnel1 --> Waypoint1
Waypoint1 -.->|Unhealthy| Backend1
end
subgraph Cluster2["Cluster 2 (us-west-2)"]
Backend2a["in-ambient pod<br/>(us-west-2a)<br/>✓ Healthy"]
Backend2b["in-ambient pod<br/>(us-west-2b)<br/>✓ Healthy"]
end
Waypoint1 -->|Failover to<br/>next region| Backend2a
Waypoint1 -->|Failover to<br/>next region| Backend2b
style Waypoint1 fill:#2068F3,color:#fff
Apply the following DestinationRule in both clusters, which configures:
- Failover priority: Routes to endpoints in the same zone first, then same region, then other regions.
- Outlier detection: Eject endpoints after 5 consecutive 5xx errors, with a 3-minute base ejection time.
for ctx in ${context1} ${context2}; do kubectl --context=$ctx apply -f- <<EOF apiVersion: networking.istio.io/v1 kind: DestinationRule metadata: name: in-ambient-failover namespace: httpbin spec: host: in-ambient.httpbin.mesh.internal trafficPolicy: loadBalancer: localityLbSetting: enabled: true failoverPriority: - topology.kubernetes.io/zone - topology.kubernetes.io/region simple: ROUND_ROBIN outlierDetection: consecutive5xxErrors: 5 interval: 10s baseEjectionTime: 3m maxEjectionPercent: 50 EOF doneVerify that the DestinationRule is applied.
kubectl --context=${context1} get destinationrule -n httpbin kubectl --context=${context2} get destinationrule -n httpbinScale down the
in-ambientservice in cluster 1 to zero replicas to simulate a failure.kubectl --context=${context1} scale deployment in-ambient -n httpbin --replicas=0Send requests to
in-ambientto verify that traffic fails over to cluster 2, according to the failover priority.kubectl --context=${context1} exec -n httpbin deploy/client-in-ambient -- sh -c " for i in \$(seq 1 5); do curl -s in-ambient.httpbin.mesh.internal:8000/hostname done"Example output, in which responses now come from the cluster 2 pod after failover:
in-ambient-8e9f0c7d65-xyz98 in-ambient-8e9f0c7d65-xyz98 in-ambient-8e9f0c7d65-xyz98 in-ambient-8e9f0c7d65-xyz98 in-ambient-8e9f0c7d65-xyz98Review the waypoint logs to observe failover events.
kubectl --context=${context1} logs -n httpbin deploy/httpbin-waypoint | tail -20Example output:
[2025-03-06T17:15:23.456Z] "GET /hostname HTTP/1.1" 200 - via_upstream - "-" 0 32 5 4 "-" "curl/7.88.1" "abc123-def456" "in-ambient.httpbin.mesh.internal:8000" "10.10.0.15:8080" inbound-vip|8000|http|in-ambient.httpbin.mesh.internal 10.10.0.14:45678 10.96.45.123:8000 10.10.0.14:45678 - defaultScale the
in-ambientservice back up in cluster 1.kubectl --context=${context1} scale deployment in-ambient -n httpbin --replicas=2
Cleanup
You can optionally remove the resources that you created in this guide.
for ctx in ${context1} ${context2}; do
kubectl --context=$ctx delete destinationrule in-ambient-failover -n httpbin
kubectl --context=$ctx delete gateway httpbin-waypoint -n httpbin
kubectl --context=$ctx delete -f https://raw.githubusercontent.com/solo-io/doc-examples/main/istio/sample-apps/client-in-ambient.yaml
kubectl --context=$ctx delete -f https://raw.githubusercontent.com/solo-io/doc-examples/main/istio/sample-apps/in-ambient.yaml
doneNext steps
For information about ztunnel outlier detection settings, see ztunnel outlier detection.