Skip to content
You are viewing the documentation for Solo Enterprise for Istio, formerly known as Gloo Mesh (OSS APIs). This version of the documentation is currently under development. Select latest from the version drop down or go to the landing page of the latest stable version.

Ztunnel

Page as Markdown

Learn how L4 load balancing, outlier detection, and failover work for in-mesh traffic between services.

Cluster scenarios

The following diagrams give examples of how ztunnel handles load balancing, outlier detection, and failover in single cluster and multicluster setups.

Single-cluster scenario

  • The request routes from the client app to the backend app.
  • The client and backend services are both enrolled in the ambient mesh and run in the same cluster.
  • The backend service has multiple replicas distributed across different zones or regions within the cluster.
  • Optionally, the spec.trafficDistribution or networking.istio.io/traffic-distribution setting is configured on the backend service to control locality-based load balancing.
    graph LR
    Client[Client pod] -->|1. Request| Ztunnel[ztunnel<br/>L4 proxy]
    Ztunnel -->|2. Load balancing| Backend1[Backend pod<br/>Zone A<br/>✓ Healthy]
    Ztunnel -->|2. Load balancing| Backend2[Backend pod<br/>Zone B<br/>✓ Healthy]
    Ztunnel -.->|"Outlier detection: ejected"| Backend3[Backend pod<br/>Zone C<br/>✗ Unhealthy]
  

Multicluster scenario

  • The request routes from the client app to the backend app.
  • The client and backend services are both enrolled in the ambient mesh.
  • The backend app is exposed as a global service with the solo.io/service-scope=global annotation.
  • The networking.istio.io/traffic-distribution annotation is set on the global service. Note that other service-scope annotations like solo.io/service-scope=global-only, solo.io/service-scope=segment, solo.io/service-scope=cluster, or solo.io/service-takeover: true, behave in the same way. Only the number of endpoints that can be addressed differs, such as only the endpoints within the same cluster or segment.
    graph LR
    subgraph Cluster1["Cluster 1"]
        Client[Client pod]
        Ztunnel[ztunnel<br/>L4 proxy]
        Backend1[Backend pod<br/>✓ Healthy]
        Client -->|1. Request| Ztunnel
        Ztunnel -->|"2. Local endpoint<br/>(global service)"| Backend1
    end

    subgraph Cluster2["Cluster 2"]
        Backend2[Backend pod<br/>✓ Healthy]
        Backend3[Backend pod<br/>✗ Unhealthy]
    end

    Ztunnel -->|"2. Cross-cluster<br/>(global service)"| Backend2
    Ztunnel -.->|"Outlier detection: ejected"| Backend3
  

Locality determination

Istiod determines locality for local endpoints. The ztunnel uses the traffic distribution setting on the backend service to choose endpoints by locality.

Locality for remote endpoints is determined as follows:

  • For east-west gateway setups with load balancer services: Labels on remote gateways. The label on the remote gateway is assumed to be the locality of all remote endpoints.
  • Flat network: Locality labels on the WorkloadEntries. WorkloadEntries are exchanged between all clusters.

Locality labels of endpoints are then compared to the locality of the client’s ztunnel.

Load balancing

Load balancing can be configured via traffic distribution on different levels. The following precedence order applies:

  1. Service spec.trafficDistribution (highest precedence, Kubernetes-native, supports PreferClose only)
  2. Service networking.istio.io/traffic-distribution annotation (Istio-extended, supports all modes)
  3. ServiceEntry networking.istio.io/traffic-distribution annotation (Istio-extended, supports all modes). Use this when you use a ServiceEntry.
  4. UNSPECIFIED_MODE (default, lowest precedence). In this case, the behavior of the PreferNetwork mode is used.

Default: If you do not explicitly set a traffic distribution mode, PreferNetwork is used.

To review the available traffic distribution modes and how each mode prioritizes endpoints, see Traffic distribution modes and endpoint priority.

Outlier detection

Ztunnel performs L4 health checks via TCP and eventually detects when an endpoint is no longer available. However, this detection might take some time, and during this period, you might see 500 or 503 errors.

In the Solo distribution of Istio, outlier detection is enabled by default on the ztunnel, such as EWMA and circuit breaking. This outlier detection is still L4 only; the ztunnel does not use HTTP status codes. For HTTP-based ejection, such as after consecutive 5xx responses, use a waypoint and a DestinationRule.

Ztunnel ignores DestinationRules, which can only be enforced if a waypoint is configured. For extended information about ztunnel’s outlier detection behavior, see the ztunnel outlier detection reference.

Failover

Layer 4 (L4) ztunnel failover is determined by the traffic distribution setting. When ztunnel detects an unhealthy endpoint, it closes the connection to that endpoint and opens a new TCP connection to a different endpoint. The new endpoint is selected according to the traffic distribution setting on the backend.

If you need better failover to avoid intermittent 500 HTTP errors, you can define an Istio DestinationRule with outlier detection and failover. However, note that this requires a waypoint proxy for enforcement. For more information, see the outlier detection and failover behavior for waypoint proxies.

Best practice: Rely on traffic distribution for L4 failover, and ztunnel outlier detection built into the Solo distribution of Istio. Add a waypoint with a DestinationRule if you need faster or HTTP-aware failover.