Ztunnel
Learn how L4 load balancing, outlier detection, and failover work for in-mesh traffic between services.
Cluster scenarios
The following diagrams give examples of how ztunnel handles load balancing, outlier detection, and failover in single cluster and multicluster setups.
Single-cluster scenario
- The request routes from the client app to the backend app.
- The client and backend services are both enrolled in the ambient mesh and run in the same cluster.
- The backend service has multiple replicas distributed across different zones or regions within the cluster.
- Optionally, the
spec.trafficDistributionornetworking.istio.io/traffic-distributionsetting is configured on the backend service to control locality-based load balancing.
graph LR
Client[Client pod] -->|1. Request| Ztunnel[ztunnel<br/>L4 proxy]
Ztunnel -->|2. Load balancing| Backend1[Backend pod<br/>Zone A<br/>✓ Healthy]
Ztunnel -->|2. Load balancing| Backend2[Backend pod<br/>Zone B<br/>✓ Healthy]
Ztunnel -.->|"Outlier detection: ejected"| Backend3[Backend pod<br/>Zone C<br/>✗ Unhealthy]
Multicluster scenario
- The request routes from the client app to the backend app.
- The client and backend services are both enrolled in the ambient mesh.
- The backend app is exposed as a global service with the
solo.io/service-scope=globalannotation. - The
networking.istio.io/traffic-distributionannotation is set on the global service. Note that other service-scope annotations likesolo.io/service-scope=global-only,solo.io/service-scope=segment,solo.io/service-scope=cluster, orsolo.io/service-takeover: true, behave in the same way. Only the number of endpoints that can be addressed differs, such as only the endpoints within the same cluster or segment.
graph LR
subgraph Cluster1["Cluster 1"]
Client[Client pod]
Ztunnel[ztunnel<br/>L4 proxy]
Backend1[Backend pod<br/>✓ Healthy]
Client -->|1. Request| Ztunnel
Ztunnel -->|"2. Local endpoint<br/>(global service)"| Backend1
end
subgraph Cluster2["Cluster 2"]
Backend2[Backend pod<br/>✓ Healthy]
Backend3[Backend pod<br/>✗ Unhealthy]
end
Ztunnel -->|"2. Cross-cluster<br/>(global service)"| Backend2
Ztunnel -.->|"Outlier detection: ejected"| Backend3
Locality determination
Istiod determines locality for local endpoints. The ztunnel uses the traffic distribution setting on the backend service to choose endpoints by locality.
Locality for remote endpoints is determined as follows:
- For east-west gateway setups with load balancer services: Labels on remote gateways. The label on the remote gateway is assumed to be the locality of all remote endpoints.
- Flat network: Locality labels on the WorkloadEntries. WorkloadEntries are exchanged between all clusters.
Locality labels of endpoints are then compared to the locality of the client’s ztunnel.
Load balancing
Load balancing can be configured via traffic distribution on different levels. The following precedence order applies:
- Service
spec.trafficDistribution(highest precedence, Kubernetes-native, supportsPreferCloseonly) - Service
networking.istio.io/traffic-distributionannotation (Istio-extended, supports all modes) - ServiceEntry
networking.istio.io/traffic-distributionannotation (Istio-extended, supports all modes). Use this when you use a ServiceEntry. UNSPECIFIED_MODE(default, lowest precedence). In this case, the behavior of thePreferNetworkmode is used.
Default: If you do not explicitly set a traffic distribution mode, PreferNetwork is used.
To review the available traffic distribution modes and how each mode prioritizes endpoints, see Traffic distribution modes and endpoint priority.
Outlier detection
Ztunnel performs L4 health checks via TCP and eventually detects when an endpoint is no longer available. However, this detection might take some time, and during this period, you might see 500 or 503 errors.
In the Solo distribution of Istio, outlier detection is enabled by default on the ztunnel, such as EWMA and circuit breaking. This outlier detection is still L4 only; the ztunnel does not use HTTP status codes. For HTTP-based ejection, such as after consecutive 5xx responses, use a waypoint and a DestinationRule.
Ztunnel ignores DestinationRules, which can only be enforced if a waypoint is configured. For extended information about ztunnel’s outlier detection behavior, see the ztunnel outlier detection reference.
Failover
Layer 4 (L4) ztunnel failover is determined by the traffic distribution setting. When ztunnel detects an unhealthy endpoint, it closes the connection to that endpoint and opens a new TCP connection to a different endpoint. The new endpoint is selected according to the traffic distribution setting on the backend.
If you need better failover to avoid intermittent 500 HTTP errors, you can define an Istio DestinationRule with outlier detection and failover. However, note that this requires a waypoint proxy for enforcement. For more information, see the outlier detection and failover behavior for waypoint proxies.
Best practice: Rely on traffic distribution for L4 failover, and ztunnel outlier detection built into the Solo distribution of Istio. Add a waypoint with a DestinationRule if you need faster or HTTP-aware failover.