Skip to content
You are viewing the documentation for Solo Enterprise for Istio, formerly known as Gloo Mesh (OSS APIs).

Metrics

Page as Markdown

Learn about the Istio metrics that the telemetry pipeline collects by default and how they are enriched.

About

The telemetry collector uses a built-in Prometheus receiver to scrape metrics from Istio components in the cluster: the istiod control plane, ztunnel daemonset, and waypoint proxy pods. The prometheus.io/scrape: "true" and prometheus.io/port: "<port_number>" pod annotations are used to identify scrapeable targets and their ports. Istio-specific workloads are automatically deployed with these annotations.

The k8sattributes processor enriches metrics with Kubernetes metadata such as pod name, deployment name, namespace, and node name. The transform/health_metrics processor then adds reporter_workload, reporter_namespace, and reporter_cluster labels to waypoint and ztunnel health metrics, making it easier to filter by the reporting component in the Solo UI.

When using the Solo distribution of Istio for your ambient mesh, L7 attributes such as source and destination workload IDs, response codes, and HTTP methods are automatically added to ztunnel metrics. This gives you HTTP observability into your ambient mesh even if no waypoint proxies are deployed. For more information, see Layer 7 observability.

To reduce data volume, the filter/istio_metrics processor limits the metrics/istio pipeline to a curated set of Istio metrics. All collected metrics are written to ClickHouse, which the Solo UI reads to power the service graph and metrics views.

Default metrics in the pipeline

The metrics/istio pipeline collects the following metrics from the cluster. To reduce cardinality, only a subset of labels is collected for each metric. For more information, see Metric labels.

Istio proxy metrics

MetricDescription
istio_requests_totalThe number of requests that were processed for an Istio proxy. For this metric to be collected at Layer 7 for ztunnels, you must set L7_ENABLED=true.
istio_request_duration_millisecondsThe time it takes for a request to reach its destination in milliseconds. For this metric and the bucket, count, and sum submetrics to be collected at Layer 7 for ztunnels, you must set L7_ENABLED=true.
istio_request_duration_milliseconds_bucketThe time it takes for a request to reach its destination in milliseconds, captured in histogram buckets.
istio_request_duration_milliseconds_countThe total number of Istio requests since the Istio proxy was last started.
istio_request_duration_milliseconds_sumThe sum of all request durations since the last start of the Istio proxy.
istio_tcp_sent_bytes_totalThe total number of bytes sent in responses.
istio_tcp_received_bytes_totalThe total number of bytes received in requests.
istio_tcp_connections_opened_totalThe total number of connections opened to an Istio proxy.
istio_tcp_connections_closed_totalThe total number of connections closed to an Istio proxy.

Ztunnel metrics

MetricDescription
istio_outlier_detection_endpointsThe total number of backend pod endpoints for a workload.
istio_outlier_detection_endpoints_unhealthyThe number of backend pod endpoints that ztunnel detects as unhealthy and does not route to.

Waypoint and gateway metrics

MetricDescription
envoy_cluster_membership_healthyThe number of healthy endpoints in an Envoy cluster.
envoy_cluster_membership_totalThe total number of endpoints in an Envoy cluster.
envoy_cluster_outlier_detection_ejections_activeThe number of currently ejected endpoints detected by outlier detection.

Istiod metrics

MetricDescription
pilot_infoStatic metric that exposes istiod build and version information as labels.
pilot_meshconfig_validation_statusReturns 0 when the current mesh configuration is valid and 1 when it is invalid.
pilot_meshnetworks_validation_statusReturns 0 when the current mesh networks configuration is valid and 1 when it is invalid.
pilot_proxy_convergence_timeThe time it takes between applying a configuration change and the Istio proxy receiving the configuration change.
pilot_xds_pushesThe number of xDS configuration pushes that istiod has sent to proxies, by type.
pilot_xds_push_timeThe time istiod takes to push xDS configuration to proxies.
peer_connection_state *The connection state of peered remote clusters (1 = connected, 0 = disconnected).
peer_convergence_time_bucket *The cumulative count of convergence times, which measures the delay between sending an xDS request to a peer cluster and receiving an ACK or NACK. This metric is captured in seconds for the following intervals (buckets): 0.01, 0.1, 0.5, 1, 3, 5, 10, 20, 30.
peer_convergence_time_count *The total number of xDS requests to peer clusters for which an ACK or NACK was received since istiod was last started.
peer_convergence_time_sum *The sum of all convergence times in seconds since istiod was last started.
peer_xds_config_size_bytes_bucket *The distribution of xDS configuration sizes received from peer clusters.
peer_xds_config_size_bytes_count *The number of xDS configurations received from peer clusters.
peer_xds_config_size_bytes_sum *The sum of all xDS configuration sizes received from peer clusters since the last start of istiod.

* Emitted by the Solo distribution of Istio’s multicluster peering controller. These metrics are not currently displayed in the Solo UI, but can be observed through external observability tools that scrape istiod.

Metrics labels

To reduce cardinality in the telemetry pipeline, only the following labels are collected for each metric.

Metric groupLabels
Istio[“cluster”, “collector_pod” , “connection_security_policy”, “destination_cluster”, “destination_principal”, “destination_service”, “destination_workload”, “destination_workload_id”, “destination_workload_namespace”, “gloo_mesh”, “namespace”, “pod_name”, “reporter”, “response_code”, “source_cluster”, “source_principal”, “source_workload”, “source_workload_namespace”, “version”, “workload_id”]
Istio outlier detection["destination_cluster", "destination_network", "destination_workload", "destination_workload_namespace", "destination_workload_type"]
Peering["source", "peer"]
Telemetry pipeline[“app”, “cluster”, “collector_name”, “collector_pod”, “component”, “exporter”, “namespace”, “pod_template_generation”, “processor”, “service_version”]

View metrics

Pipeline metrics

Metrics collected by the metrics/istio pipeline are written to ClickHouse and read by the Solo UI. You can also query them directly from ClickHouse for debugging or custom analysis. For steps, see ClickHouse data store.

To collect the full set of Istio metrics beyond the curated pipeline, deploy a standalone Prometheus instance. For steps and scrape configuration, see Prometheus. After Prometheus is set up, you can visualize metrics using pre-built Grafana dashboards. For steps, see Grafana.

Istiod metrics

Metrics that are not in the pipeline, such as peer_* metrics, can be accessed directly from the istiod process via its metrics endpoint on port 15014.

  1. Port-forward to the istiod pod in the cluster.

    kubectl port-forward -n istio-system deploy/istiod 15014:15014 --context ${context1}
  2. In a separate terminal, query the metrics endpoint. The following example filters for peer metrics. Replace the prefix to filter for other metrics.

    curl -s http://localhost:15014/metrics | grep '^peer_'
  3. To check a different cluster, stop the port-forward and repeat with a different --context.