Skip to content
You are viewing the documentation for Solo Enterprise for Istio, formerly known as Gloo Mesh (OSS APIs). This version of the documentation is currently under development. Select latest from the version drop down or go to the landing page of the latest stable version.

Ztunnel (L4)

Page as Markdown

Set up and test basic L4 load balancing and failover with ztunnel in an ambient mesh.

About this guide

In an ambient mesh, ztunnel handles Layer 4 (L4) load balancing for in-mesh traffic. When traffic flows between services without a waypoint in the path, ztunnel distributes connections across available backend endpoints using round-robin load balancing.

This guide shows you how to deploy a service with multiple replicas, observe the default load balancing behavior, and test failover when an endpoint becomes unavailable. For conceptual information about how L4 load balancing works with ztunnel, see the load balancing and failover overview.

Before you begin

Set up an ambient mesh in one cluster by using the Gloo Operator or Helm.

Step 1: Deploy sample apps

Deploy a client app and a backend service with multiple replicas to test L4 load balancing.

Traffic flow: The following diagram shows how traffic flows from the client through ztunnel to the backend service replicas. ztunnel performs L4 (TCP) load balancing and distributes connections across all available endpoints using round-robin.

    graph LR
    Client[client-in-ambient] -->|1. Request| Ztunnel[ztunnel<br/>L4 load balancer]
    Ztunnel -->|2. Round-robin| Backend1[in-ambient pod 1]
    Ztunnel -->|2. Round-robin| Backend2[in-ambient pod 2]
    Ztunnel -->|2. Round-robin| Backend3[in-ambient pod 3]
  
  1. Deploy the in-ambient httpbin sample app. This manifest creates the httpbin namespace with an in-ambient backend service. The app is already labeled for inclusion in the ambient mesh with istio.io/dataplane-mode: ambient.

    kubectl apply -f https://raw.githubusercontent.com/solo-io/doc-examples/main/istio/sample-apps/in-ambient.yaml
  2. Deploy the client-in-ambient app in the same namespace. This app is also already labeled for the ambient mesh.

    kubectl apply -f https://raw.githubusercontent.com/solo-io/doc-examples/main/istio/sample-apps/client-in-ambient.yaml
  3. Scale the in-ambient deployment to 3 replicas so that you can observe load balancing across multiple endpoints.

    kubectl scale deployment in-ambient -n httpbin --replicas=3
  4. Verify that all pods are running.

    kubectl get pods -n httpbin

    Example output, in which in-ambient runs as 3 replicas:

    NAME                                 READY   STATUS    RESTARTS   AGE
    client-in-ambient-6b5c96c4f8-x2j9k   1/1     Running   0          30s
    in-ambient-7d8f9b6c54-abc12          1/1     Running   0          45s
    in-ambient-7d8f9b6c54-def34          1/1     Running   0          20s
    in-ambient-7d8f9b6c54-ghi56          1/1     Running   0          20s

Step 2: Test L4 load balancing

Send requests from the client to the backend service to observe the default round-robin load balancing behavior.

  1. Send multiple curl requests from the client to the in-ambient service. The /hostname endpoint returns the pod hostname of the replica that handled the request. Verify that the requests are distributed across all three replicas in a round-robin pattern.

    kubectl exec -n httpbin deploy/client-in-ambient -- sh -c "
    for i in \$(seq 1 12); do
      curl -s http://in-ambient:8000/hostname
    done"

    Example output:

    in-ambient-7d8f9b6c54-abc12
    in-ambient-7d8f9b6c54-def34
    in-ambient-7d8f9b6c54-ghi56
    in-ambient-7d8f9b6c54-abc12
    ...
  2. Review the ztunnel logs to verify traffic flow. You can see connection events showing traffic from the client to the backend pods.

    kubectl logs -n istio-system -l app=ztunnel --tail=20 | grep "in-ambient"

    Example output:

    2025-03-06T16:52:48.095517Z  info  access  connection complete  src.addr=10.10.0.14:40292
    src.workload="client-in-ambient-6b5c96c4f8-x2j9k" src.namespace="httpbin"
    src.identity="spiffe://cluster.local/ns/httpbin/sa/client-in-ambient"
    dst.addr=10.10.0.15:8080 dst.service="in-ambient.httpbin.svc.cluster.local"
    dst.workload="in-ambient-7d8f9b6c54-abc12" dst.namespace="httpbin"
    direction="outbound" bytes_sent=78 bytes_recv=45 duration="12ms"

Step 3: Test L4 failover

Test how ztunnel handles failover when a backend endpoint becomes unavailable.

Failover behavior: The following diagram shows how ztunnel handles failover when one backend endpoint fails. Traffic automatically redistributes to the remaining healthy endpoints.

    graph LR
    Client[client-in-ambient] -->|Request| Ztunnel[ztunnel<br/>L4 load balancer]
    Ztunnel -->|Traffic| Backend1[in-ambient-...-abc12<br/>✓ Healthy]
    Ztunnel -->|Traffic| Backend2[in-ambient-...-def34<br/>✓ Healthy]
    Ztunnel -.->|No traffic| Backend3[in-ambient-...-ghi56<br/>✗ Failed]
  
  1. Get one of the in-ambient pod names to simulate a failure.

    FAILING_POD=$(kubectl get pods -n httpbin -l app=in-ambient -o jsonpath='{.items[0].metadata.name}')
    echo "Will make pod unavailable: $FAILING_POD"
  2. Make the pod unavailable by blocking incoming traffic on port 8080. This simulates a pod that becomes unresponsive due to network issues or process hangs.

    kubectl exec -n httpbin $FAILING_POD -- sh -c "apt-get update -qq && apt-get install -y -qq iptables > /dev/null 2>&1 && iptables -A INPUT -p tcp --dport 8080 -j DROP"
  3. Immediately send requests to observe the failover behavior. During the initial detection window, you might see some failed requests as ztunnel detects the unhealthy endpoint. After detection completes, traffic is distributed only among the remaining healthy endpoints.

    kubectl exec -n httpbin deploy/client-in-ambient -- sh -c "
    for i in \$(seq 1 12); do
      curl -s --max-time 2 http://in-ambient:8000/hostname || echo 'request failed'
    done"

    Example output showing the failover transition:

    • The first pod (for example, -ghi56) that you made unavailable is not shown in successful responses.
    • Initial requests show request failed as ztunnel detects the unhealthy endpoint via TCP health checks.
    • After detection completes, traffic flows only to the 2 healthy pods (for example, -abc12 and -def34).
    in-ambient-7d8f9b6c54-abc12
    request failed
    in-ambient-7d8f9b6c54-def34
    in-ambient-7d8f9b6c54-abc12
    in-ambient-7d8f9b6c54-def34
    request failed
    in-ambient-7d8f9b6c54-abc12
    in-ambient-7d8f9b6c54-def34
    in-ambient-7d8f9b6c54-abc12
    in-ambient-7d8f9b6c54-def34
    in-ambient-7d8f9b6c54-abc12
    in-ambient-7d8f9b6c54-def34
  4. Restore the pod by removing the iptables rule. This simulates recovery of the unhealthy endpoint.

    kubectl exec -n httpbin $FAILING_POD -- iptables -F
  5. Send requests again to confirm that load balancing is restored across all three replicas.

    kubectl exec -n httpbin deploy/client-in-ambient -- sh -c "
    for i in \$(seq 1 9); do
      curl -s http://in-ambient:8000/hostname
    done" | sort | uniq -c

    Example output, in which the number at the beginning of each line is the count from uniq -c that shows how many requests each replica handled:

    3 in-ambient-7d8f9b6c54-abc12
    3 in-ambient-7d8f9b6c54-def34
    3 in-ambient-7d8f9b6c54-ghi56

Step 4 (optional): Observe ztunnel outlier detection

In the Solo distribution of Istio, ztunnel includes built-in outlier detection that helps identify and deprioritize unhealthy endpoints. This feature uses Exponentially Weighted Moving Average (EWMA) and circuit breaking to improve failover behavior. For more information about ztunnel outlier detection configuration options, including how to adjust EWMA and circuit breaker settings, see ztunnel outlier detection.

Review the ztunnel logs for outlier detection activity.

kubectl logs -n istio-system -l app=ztunnel --tail=50 | grep -i "health\|ewma\|circuit"

Example output showing detection of the unhealthy pod:

  • health_check shows connection failures to the pod you made unavailable (for example, ghi56).
  • ewma shows the health score decreasing as failures are detected.
  • circuit_breaker shows the endpoint being marked unhealthy and deprioritized.
2025-03-11T18:32:15.421Z  warn  health_check  connection failed to endpoint
dst.addr=10.10.0.15:8080 dst.workload="in-ambient-7d8f9b6c54-ghi56"
dst.service="in-ambient.httpbin.svc.cluster.local" error="connection timeout"

2025-03-11T18:32:15.422Z  info  ewma  updating endpoint health score
dst.workload="in-ambient-7d8f9b6c54-ghi56" previous_score=1.0 new_score=0.65

2025-03-11T18:32:16.103Z  warn  circuit_breaker  endpoint marked unhealthy
dst.workload="in-ambient-7d8f9b6c54-ghi56" consecutive_failures=3 status="open"

Cleanup

You can optionally remove the resources that you created in this guide. If you want to continue to the other load balancing and failover guides, you can keep the namespace and apps for use in those guides as well.

kubectl delete namespace httpbin

Next steps