Gloo provides management tools on top of the open source Istio service mesh. If Istio is in a troubled state, you might notice errors in your Gloo environment. For help using the Istio diagnostic tools, see the Istio documentation.

  1. Check the health of the Istio custom resources that relate to the issue you experience. For example, if you see unexpected routing behavior, you might check the destination rules.

      kubectl get istio-io -A --context ${REMOTE_CONTEXT}
      
  2. Check the health of the Istio pods and services in each cluster.

      kubectl get pods,svc -A --context ${REMOTE_CONTEXT} | grep istio
      
  3. Get the logs for the istiod control plane for the cluster where you notice service mesh issues. You can optionally save the output to a local file so that you can read and analyze it more easily.

      kubectl logs --context ${REMOTE_CONTEXT} -n istio-system deploy/istiod | > istiod-logs.txt
      
  4. If the issue is with a sidecar-enabled workload, try debugging the Envoy proxy container. The configuration file for the sidecar in each of your Istio pods can be hundreds of lines long. Follow along with the Solo blog, Navigating Istio Config: A look into Istio’s toolkit, to learn how to use istioctl to focus on the most common configuration areas.

  5. If you installed the Solo Enterprise for Istio management plane, review metrics that are collected by default for Istio components, such as istiod, ztunnel, and waypoint proxies, in Prometheus. When reviewing metrics for the ambient data path, keep the following in mind:

    • Metrics are per proxy, such as a ztunnel, sidecar, or waypoint. The client is any workload that has Istio ambient data plane mode or is sidecar-injected. For example, a metric might show that a specific client is failing to reach the workload endpoints behind that ztunnel.
    • Both the source reporter and the destination reporter in ztunnel emit metrics. When a request fails, only the source reporter reports metrics; the destination reporter does not. In failed requests, fields such as destination_app and destination_cluster appear as unknown. To determine the intended destination of a failed request, enable tracing.
    • Endpoint health is per-ztunnel. The health status of all endpoints that the ztunnel serves is available in metrics for that ztunnel. For more information, check out ztunnel outlier detection.
    • The response_flags field in the metric indicates the type of failure (for example, CONNECT).
  6. In a multicluster ambient mesh setup, you can also use istioctl commands in conjunction with the default metrics to follow a packet across clusters in the ambient data path. For example, the path might follow: client → local ztunnel → east-west gateway → remote ztunnel → server. You can:

    • Run istioctl zc connections on east-west gateways to get a populated table of current connections. For example: istioctl zc connections --node <node-name>.
    • Run istioctl zc endpoints to view remote endpoints (IPs and ports) for multicluster services. For example: istioctl zc endpoints --service my-service --service-namespace default.
    • Run istioctl multicluster check --verbose to get diagnostic details of the connected clusters in the ambient mesh, including east-west gateway endpoints, known peer clusters and remote peer gateway addresses, and service sharing summaries. For example: istioctl multicluster check --verbose --contexts="alpha,beta,gamma".
  7. If you use Grafana to monitor Istio performance, check out the Grafana performance monitoring dashboard in the Solo Communities of Practice (COP) repository.

  8. Review ways from the community to debug your service mesh: