Skip to content
You are viewing the documentation for Solo Enterprise for Istio, formerly known as Gloo Mesh (OSS APIs). This version of the documentation is currently under development. Select latest from the version drop down or go to the landing page of the latest stable version.

Overview

Page as Markdown

Review built-in and optional observability tools to monitor the health of your service mesh.

Solo UI

The Solo UI provides an at-a-glance view of the configuration, health, and traffic in your Istio environment. Key features include a visual service graph with live metrics, a resource overview for all Istio and Kubernetes resources across your clusters, and global service visibility in multicluster setups.

To get started, launch and explore the Solo UI.

Request traces

You can configure Istio to send request traces to an external Jaeger instance to observe traffic in your mesh. To get started, enable request traces.

Metrics

The Solo UI includes a built-in telemetry pipeline that scrapes Istio metrics from istiod, ztunnel, and waypoint proxy pods, and stores them in ClickHouse. The Solo UI reads directly from ClickHouse to power the service graph and metrics views. You can also connect Grafana to ClickHouse to build custom dashboards, or query ClickHouse directly.

For the list of default metrics collected, see Metrics. For information on querying ClickHouse or connecting Grafana, see ClickHouse data store.

When reviewing metrics for the ambient data path, keep the following in mind.

  • Metrics are per proxy, such as a ztunnel, sidecar, or waypoint. The client is any workload that has Istio ambient data plane mode or is sidecar-injected. For example, a metric might show that a specific client is failing to reach the workload endpoints behind that ztunnel.
  • Both the source reporter and the destination reporter in ztunnel emit metrics. When a request fails, only the source reporter reports metrics; the destination reporter does not. In failed requests, fields such as destination_app and destination_cluster appear as unknown. To determine the intended destination of a failed request, enable tracing.
  • Endpoint health is per-ztunnel. The health status of all endpoints that the ztunnel serves is available in metrics for that ztunnel. For more information, check out ztunnel outlier detection.
  • The response_flags field in the metric indicates the type of failure (for example, CONNECT).

Access logs

Leverage the default Envoy access log collector to record logs for the Istio ingress gateway and Istio-enabled workloads in your service mesh. You can then review these logs to troubleshoot issues as needed, or scrape these logs to view them in your platform logging system.

To get started, enable the default Envoy access log collector in your Istio installation.

Layer 7 telemetry for ztunnels

When you use community Istio for your ambient mesh, ztunnel is configured to generate TCP metrics, logs, and traces for all service traffic in the ambient mesh. In addition, when you use the Solo distribution of Istio with an Enterprise-level Solo Enterprise for Istio license, Layer 7 (L7) HTTP and HTTP/2 telemetry data is enabled by default without using waypoint proxies. For an overview of how this telemetry data is collected, as well as examples of the enriched data you can find in access logs, traces, and metrics, see Layer 7 observability for ztunnels.

Validating multicluster setups

Both before and after you link clusters into a multicluster mesh, you can use the istioctl multicluster check command, along with other observability checks, to verify multiple aspects of multicluster ambient mesh support and status.

istioctl multicluster check

You can use the istioctl multicluster check --precheck command to check the individual readiness of each cluster before running istioctl multicluster link to link them in a multicluster mesh, and run it again after linking to confirm that the connections were successful. This command performs checks listed in the following sections, which you can review to understand what each check validates. Additionally, if any of the checks fail, run the command with the --verbose option, and review the following troubleshooting recommendations.

istioctl multicluster check --verbose --contexts="$context1,$context2"

For more information about this command, see the CLI reference.

Incompatible environment variables

Checks whether the ENABLE_PEERING_DISCOVERY=true and optionally K8S_SELECT_WORKLOAD_ENTRIES=true environment variables are set incorrectly or are not supported for multicluster ambient mesh.

Example verbose output:

--- Incompatible Environment Variable Check ---

✅ Incompatible Environment Variable Check: K8S_SELECT_WORKLOAD_ENTRIES is valid ("")
✅ Incompatible Environment Variable Check: ENABLE_PEERING_DISCOVERY is valid ("true")
✅ Incompatible Environment Variable Check: all relevant environment variables are valid

If this check fails, check your environment variables in your istiod configuration, such as by running helm get values --kube-context ${CLUSTER_CONTEXT} istiod -n istio-system -o yaml, and update your configuration.

License validity

Checks whether the license in use by istiod is valid for multicluster ambient mesh. Multicluster capabilities require an Enterprise level license for Solo Enterprise for Istio.

Example verbose output:

--- License Check ---

✅ License Check: license is valid for multicluster

If your license does not support multicluster ambient mesh, contact your Solo account representative.

CNI DNS capture

Checks whether ambient DNS capture is enabled in the istio-cni-config ConfigMap. DNS capture is required for workloads to resolve global hostnames (.mesh.internal and custom Segment domains) used to route traffic to services across clusters. The check returns a warning if AMBIENT_DNS_CAPTURE is not set or is not true.

Example verbose output:

--- CNI DNS Capture Check ---

✅ CNI DNS Capture Check: AMBIENT_DNS_CAPTURE is enabled

If this check fails or warns, verify the value in the ConfigMap:

kubectl get configmap istio-cni-config -n istio-system \
  -o jsonpath='{.data.AMBIENT_DNS_CAPTURE}'

To enable DNS capture, set values.cni.ambient.dnsCapture=true in the istiod Helm chart values and upgrade your installation.

Pod health

Checks the health of the pods in the cluster. All istiod, ztunnel, and east-west gateway pods across the checked clusters must be healthy and running for the multicluster mesh to function correctly.

Example verbose output:

--- Pod Check (istiod) ---

NAME                        READY     STATUS      RESTARTS     AGE
istiod-6d9cdf88cf-l47tf     1/1       Running     0            10m18s

✅ Pod Check (istiod): all pods healthy


--- Pod Check (ztunnel) ---

NAME              READY     STATUS      RESTARTS     AGE
ztunnel-dvlwk     1/1       Running     0            10m6s

✅ Pod Check (ztunnel): all pods healthy


--- Pod Check (eastwest gateway) ---

NAME                                READY     STATUS      RESTARTS     AGE
istio-eastwest-857b77fc5d-qgnrl     1/1       Running     0            9m33s

✅ Pod Check (eastwest gateway): all pods healthy

To check any unhealthy pods, run the following commands. Consider checking the pod logs, and review Debug Istio.

kubectl get po -n istio-system
kubectl get po -n istio-eastwest

East-west gateway status

Checks the status of the east-west gateways in the cluster. When an east-west gateway is created, the gateway controller creates a Kubernetes service to expose the gateway. Once this service is correctly attached to the gateway and has an address assigned, the east-west gateway has a Programmed status of true.

Example verbose output:

--- Gateway Check ---

Gateway: istio-eastwest
Addresses:
- 172.18.7.110
Status: programmed ✅

✅ Gateway Check: all eastwest gateways programmed

If the Programmed status is not true, an issue might exist with the address allocation for the service. Check the east-west gateway with a command such as kubectl get svc -n istio-eastwest, and verify that your cloud provider can correctly allocate addresses to the service.

Remote peer gateway status

Checks the status of the remote peer gateways in the cluster, which represent the other peered clusters in the multicluster setup. These remote gateways configure the connection between the local cluster’s istiod control plane, and the peered clusters’ remote networks to enable xDS communication between peers. When the initial network connection between istiod and a remote peer is made, the gateway’s gloo.solo.io/PeerConnected status updates to true. Then, when the full xDS sync occurs between peers, the gateway’s gloo.solo.io/PeeringSucceeded status also updates to true. This check ensures that both statuses are true, and that the topology.istio.io/cluster label is set on the gateway.

Example verbose output:

--- Peers Check ---

Cluster: cluster2
Addresses:
- 172.18.7.130
Conditions:
- Accepted: True
- Programmed: True
- gloo.solo.io/PeerConnected: True
- gloo.solo.io/PeeringSucceeded: True
- gloo.solo.io/PeerDataPlaneProgrammed: True
Status: connected ✅

✅ Peers Check: all clusters connected

If the connection is severed between the peers, the gloo.solo.io/PeerConnected status becomes false. A failed connection between peers can be due to either a misconfiguration in the peering setup, or a network issue blocking port 15008 on the remote cluster, which is the cross-network HBONE port that the east-west gateway listens on. Review the steps you took to link clusters together, such as the steps outlined in the Helm default network guide. Additionally, review any firewall rules or network policies that might block access through port 15008 on the remote cluster.

Intermediate certificate compatibility

Confirms the certificate compatibility between peered clusters. This check reads the root-cert.pem from the istio-ca-root-cert configmap in the istio-system namespace, and uses x509 certificate validation to confirm the root cert is compatible with all of the clusters’ ca-cert.pem intermediate certificate chains from the cacerts secret.

Example verbose output:

--- Intermediate Certs Compatibility Check ---

ℹ  Intermediate Certs Compatibility Check: cluster cluster1 root certificate SHA256 sum: 6d18f32e134824c158d97f32618657c45d5a83839f838ada751757139481537e
ℹ  Intermediate Certs Compatibility Check: cluster cluster2 root certificate SHA256 sum: 6d18f32e134824c158d97f32618657c45d5a83839f838ada751757139481537e
✅ Intermediate Certs Compatibility Check: cluster cluster1 has compatible intermediate certificates with cluster cluster2 
✅ Intermediate Certs Compatibility Check: cluster cluster2 has compatible intermediate certificates with cluster cluster1 
✅ Intermediate Certs Compatibility Check: all clusters have compatible intermediate certificates

If this check fails because the root certs are not valid for each peered clusters’ intermediate certificate chain, you can check the istiod logs for TLS errors when attempting to communicate with a peered cluster, such as the following:

2025-12-04T22:09:22.474517Z     warn    deltaadsc       disconnected, retrying in 24.735483751s: delta stream: rpc error: code = Unavailable desc = connection error: desc = "error reading server preface: remote error: tls: unknown certificate authority"       target=peering-cluster2

Ensure each cluster has a cacerts secret in the istio-system namespace. To regenerate invalid certificates for each cluster, follow the example steps in Create a shared root of trust.

Network configuration

Confirms the network configuration of the multicluster mesh. For multicluster peering setups that do not use a flat network topology, each cluster must occupy a unique network. The network name must be defined with the label topology.istio.io/network and set on both the istio-system namespace and the istio-eastwest gateway resource. The same network name must also be set as the NETWORK environment variable on the ztunnel daemonset. Each remote gateway that represents that cluster must have the topology.istio.io/network label equal to the network of the remote cluster.Each remote gateway must also have the topology.istio.io/cluster label set to the cluster ID of the remote cluster. This label is required for the peering controller to federate workload entries correctly. If a remote gateway is missing the topology.istio.io/cluster label, the check returns an error. If the label references a cluster ID not included in –contexts, the check returns an informational message.

Example verbose output:

--- Network Configuration Check ---

✅ Cluster cluster1 has network: cluster1
✅ Eastwest gateway istio-eastwest/istio-eastwest has correct network label: cluster1
✅ Cluster cluster2 has network: cluster2
✅ Eastwest gateway istio-eastwest/istio-eastwest has correct network label: cluster2
✅ Remote gateway istio-eastwest/istio-remote-peer-cluster2 references network cluster2 (clusters: [cluster2])
✅ Remote gateway istio-eastwest/istio-remote-peer-cluster1 references network cluster1 (clusters: [cluster1])
✅ Network Configuration Check: all network configurations are valid

Mismatched network identities cause errors in cross-cluster communication, which leads to error logs in ztunnel pods that indicate a network timeout on the outbound communication. Notably, the destination address on these errors is a 240.X.X.X address, instead of the correct remote peer gateway address. You can run kubectl logs -l app=ztunnel -n istio-system --tail=10 --context ${CLUSTER_CONTEXT} | grep -iE "error|warn" to review logs such as the following:

2025-11-18T16:14:53.490573Z     error   access  connection complete     src.addr=240.0.2.27:46802 src.workload="ratings-v1-5dc79b6bcd-zm8v6" src.namespace="bookinfo" src.identity="spiffe://cluster.local/ns/bookinfo/sa/bookinfo-ratings" dst.addr=240.0.9.43:15008 dst.hbone_addr=240.0.9.43:9080 dst.service="productpage.bookinfo.mesh.internal" dst.workload="autogenflat.portfolio1-soloiopoc-cluster1.bookinfo.productpage-v1-54bb874995-hblwp.ee508601917c" dst.namespace="bookinfo" dst.identity="spiffe://cluster.local/ns/bookinfo/sa/bookinfo-productpage" direction="outbound" bytes_sent=0 bytes_recv=0 duration="10001ms" error="connection timed out, maybe a NetworkPolicy is blocking HBONE port 15008: deadline has elapsed"

To troubleshoot these issues, be sure that you use unique network names to represent each cluster, and that you correctly labeled the cluster’s istio-system namespace with that network name, such as by running kubectl label namespace istio-system --context ${CLUSTER_CONTEXT} topology.istio.io/network=${CLUSTER_NAME}. You can also relabel the east-west gateway in the cluster, and the remote peer gateways in other clusters that represent this cluster.

Stale workload entries

In flat network setups, checks for any outdated workload entries that must be removed from the multicluster mesh. Stale workload entries might exist from pods that were deleted, but the autogenerated entries for those workloads were not correctly cleaned up. If you do not use a flat network topology, no autogenerated workload entries exist to be validated, and this check can be ignored.

Example verbose output for a non-flat network setup:

--- Stale Workloads Check ---

⚠  Stale Workloads Check: no autogenflat workload entries found

If you use a flat network topology, and this check fails with stale workload entries, run kubectl get workloadentries -n istio-system | grep autogenflat to list the autogenerated workload entries in the remote cluster, and compare the list to the output of kubectl get pods in the source cluster for those workloads. You can safely manually delete the stale workload entries in the remote cluster for pods that no longer exist in the source cluster, such as by running kubectl get workloadentries -n istio-system <entry_name>.

Metrics

You can also check metrics that are built into the Solo distribution of Istio to verify multiple aspects of the multicluster peering status.

Each peering metric has the labels source and peer, which appear as fields in the metric. The source is the local istiod instance in the cluster where the metric is emitted, and peer is the peered remote cluster. The convergence time metrics are important for understanding how quickly configuration propagates to peer clusters. A high convergence time can indicate slow propagation, connectivity issues, or that the peer or network is under load.

  1. Port-forward to the istiod pod in each cluster to access its metrics endpoint.

    kubectl port-forward -n istio-system deploy/istiod 15014:15014 --context ${context1}
  2. In a separate terminal, query the metrics endpoint and filter for peer metrics.

    curl -s http://localhost:15014/metrics | grep '^peer_'
  3. Repeat for each cluster in the mesh, updating the --context flag.

The following peer metrics are available.

MetricDescription
peer_connection_stateThe connection state of peered remote clusters (1 = connected, 0 = disconnected).
peer_convergence_time_bucketThe cumulative count of convergence times, which measures the delay between sending an xDS request to a peer cluster and receiving an ACK or NACK. This metric is captured in seconds for the following intervals (buckets): 0.01, 0.1, 0.5, 1, 3, 5, 10, 20, 30.
peer_convergence_time_countThe total number of xDS requests to peer clusters for which an ACK or NACK was received since istiod was last started.
peer_convergence_time_sumThe sum of all convergence times in seconds since istiod was last started.
peer_xds_config_size_bytes_bucketThe distribution of xDS configuration sizes received from peer clusters.
peer_xds_config_size_bytes_countThe number of xDS configurations received from peer clusters.
peer_xds_config_size_bytes_sumThe sum of all xDS configuration sizes received from peer clusters since the last start of the Istio proxy.

If you use Grafana to monitor Istio performance, you can also check out the Grafana dashboards in the Solo Communities of Practice (COP) repository. For example, you can use the istio-peering-dashboard to monitor and verify peering connection between clusters, and the istio-global-services-dashboard to monitor locality-aware traffic distribution and endpoint health across clusters, networks, zones, and regions.

Further debugging

For additional guidance around debugging your multicluster ambient mesh, check out the Istio troubleshooting guide.