Debug Istio
Diagnose issues with your Istio service mesh.
Gloo provides management tools on top of the open source Istio service mesh. If Istio is in a troubled state, you might notice errors in your Gloo environment. For help using the Istio diagnostic tools, see the Istio documentation.
Check the health of the Istio custom resources that relate to the issue you experience. For example, if you see unexpected routing behavior, you might check the destination rules.
kubectl get istio-io -A --context ${REMOTE_CONTEXT}Check the health of the Istio pods and services in each cluster.
kubectl get pods,svc -A --context ${REMOTE_CONTEXT} | grep istioGet the logs for the
istiodcontrol plane for the cluster where you notice service mesh issues. You can optionally save the output to a local file so that you can read and analyze it more easily.kubectl logs --context ${REMOTE_CONTEXT} -n istio-system deploy/istiod | > istiod-logs.txtIf the issue is with a sidecar-enabled workload, try debugging the Envoy proxy container. The configuration file for the sidecar in each of your Istio pods can be hundreds of lines long. Follow along with the Solo blog, Navigating Istio Config: A look into Istio’s toolkit, to learn how to use
istioctlto focus on the most common configuration areas.If you installed the Solo Enterprise for Istio management plane, review metrics that are collected by default for Istio components, such as istiod, ztunnel, and waypoint proxies, in Prometheus. When reviewing metrics for the ambient data path, keep the following in mind:
- Metrics are per proxy, such as a ztunnel, sidecar, or waypoint. The client is any workload that has Istio ambient data plane mode or is sidecar-injected. For example, a metric might show that a specific client is failing to reach the workload endpoints behind that ztunnel.
- Both the source reporter and the destination reporter in ztunnel emit metrics. When a request fails, only the source reporter reports metrics; the destination reporter does not. In failed requests, fields such as
destination_appanddestination_clusterappear asunknown. To determine the intended destination of a failed request, enable tracing. - Endpoint health is per-ztunnel. The health status of all endpoints that the ztunnel serves is available in metrics for that ztunnel. For more information, check out ztunnel outlier detection.
- The
response_flagsfield in the metric indicates the type of failure (for example,CONNECT).
In a multicluster ambient mesh setup, you can also use
istioctlcommands in conjunction with the default metrics to follow a packet across clusters in the ambient data path. For example, the path might follow: client → local ztunnel → east-west gateway → remote ztunnel → server. You can:- Run
istioctl zc connectionson east-west gateways to get a populated table of current connections. For example:istioctl zc connections --node <node-name>. - Run
istioctl zc endpointsto view remote endpoints (IPs and ports) for multicluster services. For example:istioctl zc endpoints --service my-service --service-namespace default. - Run
istioctl multicluster check --verboseto get diagnostic details of the connected clusters in the ambient mesh, including east-west gateway endpoints, known peer clusters and remote peer gateway addresses, and service sharing summaries. For example:istioctl multicluster check --verbose --contexts="alpha,beta,gamma".
- Run
If you use Grafana to monitor Istio performance, check out the Grafana performance monitoring dashboard in the Solo Communities of Practice (COP) repository.
Note that COP tools are provided as helpful starting resources that are maintained by the community. These tools are not guaranteed to work in your environment, and are not part of product SLAs.Review ways from the community to debug your service mesh:
- Ambient mesh troubleshooting documentation
- Istio diagnostic tools documentation
- Istio debugging video
- Istio CLI cheatsheet for Ops teams, a blog post from a Solo engineer