Metrics
Learn about the Istio metrics that the telemetry pipeline collects by default and how they are enriched.
About
The telemetry collector uses a built-in Prometheus receiver to scrape metrics from Istio components in the cluster: the istiod control plane, ztunnel daemonset, and waypoint proxy pods. The prometheus.io/scrape: "true" and prometheus.io/port: "<port_number>" pod annotations are used to identify scrapeable targets and their ports. Istio-specific workloads are automatically deployed with these annotations.
The k8sattributes processor enriches metrics with Kubernetes metadata such as pod name, deployment name, namespace, and node name. The transform/health_metrics processor then adds reporter_workload, reporter_namespace, and reporter_cluster labels to waypoint and ztunnel health metrics, making it easier to filter by the reporting component in the Solo UI.
When using the Solo distribution of Istio for your ambient mesh, L7 attributes such as source and destination workload IDs, response codes, and HTTP methods are automatically added to ztunnel metrics. This gives you HTTP observability into your ambient mesh even if no waypoint proxies are deployed. For more information, see Layer 7 observability.
To reduce data volume, the filter/istio_metrics processor limits the metrics/istio pipeline to a curated set of Istio metrics. All collected metrics are written to ClickHouse, which the Solo UI reads to power the service graph and metrics views.
Default metrics in the pipeline
The metrics/istio pipeline collects the following metrics from the cluster. To reduce cardinality, only a subset of labels is collected for each metric. For more information, see Metric labels.
Istio proxy metrics
| Metric | Description |
|---|---|
istio_requests_total | The number of requests that were processed for an Istio proxy. For this metric to be collected at Layer 7 for ztunnels, you must set L7_ENABLED=true. |
istio_request_duration_milliseconds | The time it takes for a request to reach its destination in milliseconds. For this metric and the bucket, count, and sum submetrics to be collected at Layer 7 for ztunnels, you must set L7_ENABLED=true. |
istio_request_duration_milliseconds_bucket | The time it takes for a request to reach its destination in milliseconds, captured in histogram buckets. |
istio_request_duration_milliseconds_count | The total number of Istio requests since the Istio proxy was last started. |
istio_request_duration_milliseconds_sum | The sum of all request durations since the last start of the Istio proxy. |
istio_tcp_sent_bytes_total | The total number of bytes sent in responses. |
istio_tcp_received_bytes_total | The total number of bytes received in requests. |
istio_tcp_connections_opened_total | The total number of connections opened to an Istio proxy. |
istio_tcp_connections_closed_total | The total number of connections closed to an Istio proxy. |
Ztunnel metrics
| Metric | Description |
|---|---|
istio_outlier_detection_endpoints | The total number of backend pod endpoints for a workload. |
istio_outlier_detection_endpoints_unhealthy | The number of backend pod endpoints that ztunnel detects as unhealthy and does not route to. |
Waypoint and gateway metrics
| Metric | Description |
|---|---|
envoy_cluster_membership_healthy | The number of healthy endpoints in an Envoy cluster. |
envoy_cluster_membership_total | The total number of endpoints in an Envoy cluster. |
envoy_cluster_outlier_detection_ejections_active | The number of currently ejected endpoints detected by outlier detection. |
Istiod metrics
| Metric | Description |
|---|---|
pilot_info | Static metric that exposes istiod build and version information as labels. |
pilot_meshconfig_validation_status | Returns 0 when the current mesh configuration is valid and 1 when it is invalid. |
pilot_meshnetworks_validation_status | Returns 0 when the current mesh networks configuration is valid and 1 when it is invalid. |
pilot_proxy_convergence_time | The time it takes between applying a configuration change and the Istio proxy receiving the configuration change. |
pilot_xds_pushes | The number of xDS configuration pushes that istiod has sent to proxies, by type. |
pilot_xds_push_time | The time istiod takes to push xDS configuration to proxies. |
peer_connection_state * | The connection state of peered remote clusters (1 = connected, 0 = disconnected). |
peer_convergence_time_bucket * | The cumulative count of convergence times, which measures the delay between sending an xDS request to a peer cluster and receiving an ACK or NACK. This metric is captured in seconds for the following intervals (buckets): 0.01, 0.1, 0.5, 1, 3, 5, 10, 20, 30. |
peer_convergence_time_count * | The total number of xDS requests to peer clusters for which an ACK or NACK was received since istiod was last started. |
peer_convergence_time_sum * | The sum of all convergence times in seconds since istiod was last started. |
peer_xds_config_size_bytes_bucket * | The distribution of xDS configuration sizes received from peer clusters. |
peer_xds_config_size_bytes_count * | The number of xDS configurations received from peer clusters. |
peer_xds_config_size_bytes_sum * | The sum of all xDS configuration sizes received from peer clusters since the last start of istiod. |
* Emitted by the Solo distribution of Istio’s multicluster peering controller. These metrics are not currently displayed in the Solo UI, but can be observed through external observability tools that scrape istiod.
Metrics labels
To reduce cardinality in the telemetry pipeline, only the following labels are collected for each metric.
| Metric group | Labels |
|---|---|
| Istio | [“cluster”, “collector_pod” , “connection_security_policy”, “destination_cluster”, “destination_principal”, “destination_service”, “destination_workload”, “destination_workload_id”, “destination_workload_namespace”, “gloo_mesh”, “namespace”, “pod_name”, “reporter”, “response_code”, “source_cluster”, “source_principal”, “source_workload”, “source_workload_namespace”, “version”, “workload_id”] |
| Istio outlier detection | ["destination_cluster", "destination_network", "destination_workload", "destination_workload_namespace", "destination_workload_type"] |
| Peering | ["source", "peer"] |
| Telemetry pipeline | [“app”, “cluster”, “collector_name”, “collector_pod”, “component”, “exporter”, “namespace”, “pod_template_generation”, “processor”, “service_version”] |
View metrics
Pipeline metrics
Metrics collected by the metrics/istio pipeline are written to ClickHouse and read by the Solo UI. You can also query them directly from ClickHouse for debugging or custom analysis. For steps, see ClickHouse data store.
To collect the full set of Istio metrics beyond the curated pipeline, deploy a standalone Prometheus instance. For steps and scrape configuration, see Prometheus. After Prometheus is set up, you can visualize metrics using pre-built Grafana dashboards. For steps, see Grafana.
Istiod metrics
Metrics that are not in the pipeline, such as peer_* metrics, can be accessed directly from the istiod process via its metrics endpoint on port 15014.
Port-forward to the istiod pod in the cluster.
kubectl port-forward -n istio-system deploy/istiod 15014:15014 --context ${context1}In a separate terminal, query the metrics endpoint. The following example filters for peer metrics. Replace the prefix to filter for other metrics.
curl -s http://localhost:15014/metrics | grep '^peer_'To check a different cluster, stop the port-forward and repeat with a different
--context.