About the telemetry pipeline
Learn about the Gloo telemetry pipeline architecture, its components, and default pipelines that you can choose from.
You can gain insights into the health and performance of your cluster components by using the Gloo telemetry pipeline. Built on top of the OpenTelemetry open source project, the Gloo telemetry pipeline helps you to collect and export telemetry data, such as metrics, logs, traces, and Gloo insights, and to visualize this data by using Gloo observability tools.
Review the information on this page to learn more about the Gloo telemetry pipeline and how to use it in your cluster.
Setup
The Gloo telemetry pipeline is set up by default when you install Solo Enterprise for Istio.
To see the receivers, processors, and exporters that are set up by default for you, run the following commands:
kubectl get configmap gloo-telemetry-gateway-config -n gloo-mesh -o yaml
kubectl get configmap gloo-telemetry-collector-config -n gloo-mesh -o yamlDisable the telemetry pipeline
If you want to disable the Gloo telemetry pipeline, follow the Upgrade guide and add the following configuration to your Helm values file:
telemetryCollector:
enabled: false
telemetryGateway:
enabled: falseCustomize the pipeline
You can customize the Gloo telemetry pipeline and set up additional receivers, processors, and exporters in your pipeline. The Gloo telemetry pipeline is set up with pre-built pipelines that use a variety of receivers, processors, and exporters to collect and store telemetry data in your cluster. You can enable and disable these pipelines as part of your Helm installation.
Because the Gloo telemetry pipeline is built on top of the OpenTelemetry open source project, you also have the option to add your own custom receivers, processors, and exporters to the pipeline. For more information, see the pipeline architecture information in the OpenTelemetry documentation.
To see the receivers, processors, and exporters that are set up by default for you, run the following commands:
kubectl get configmap gloo-telemetry-gateway-config -n gloo-mesh -o yaml
kubectl get configmap gloo-telemetry-collector-config -n gloo-mesh -o yamlTo add more telemetry data to the Gloo telemetry pipeline, see Customize the pipeline.
Shard telemetry collectors
By default, the telemetry collector runs as a daemon set in your Gloo Mesh environment. In some organizations, security or architecture restrictions might prevent you from running the collector pod on every node in the cluster. In this case, you might want to shard the telemetry collector as a stateful set instead. This method allows the collector to be able to continually process a high level of metrics, without requiring the collector pod to deploy as a daemon set.
To shard the telemetry collector, follow the Upgrade guide and add the following configuration to your Helm values file:
telemetryCollector:
enabled: true
mode: statefulset
replicaCount: 2
telemetryCollectorCustomization:
sharding:
enabled: trueArchitecture
The Gloo telemetry pipeline is decoupled from the Gloo agents and management server core functionality, and consists of two main components: the Gloo telemetry collector agent and telemetry gateway.
Flip through the cards to see how these components are set up in a single and multicluster environment.
In single cluster setups, only a Gloo telemetry collector agent is deployed to the cluster. The agent is configured to scrape metrics from workloads in the cluster, and to enrich the data, such as by adding workload IDs that you can later use to filter metrics. In addition, it receives other telemetry data, such as traces. Depending on the type of telemetry data, the collector agent then forwards this data to other observability tools, such as Jaeger as shown in the following image.
You also have the option to set up your own exporter to forward telemetry data to an observability tool of your choice. To see an example for how to export data to Datadog, see Forward metrics to Datadog.


In multicluster setups, Gloo telemetry collector agents are deployed to the management cluster and each workload cluster. The agents are configured to scrape metrics from workloads in the cluster, and to enrich the data, such as by adding workload IDs that you can later use to filter metrics. In addition, they receive other telemetry data, such as traces and Gloo analyzer logs.
A Gloo telemetry gateway is also deployed to the management cluster and exposed with a Kubernetes load balancer service. The collector agents in each workload cluster send all their telemetry data to the telemetry gateway’s service endpoint. You can choose from a set of pre-built pipelines to configure how the telemetry gateway forwards telemetry data within the cluster.
You also have the option to forward telemetry data to an observability tool of your choice by adding custom exporters to either the telemetry gateway or to each collector agent. The option that is right for you depends on the size of your environment, the amount of telemetry data that you want to export, and the compute resources that are available to the Gloo telemetry pipeline components. To see an example for how to export data to Datadog, see Forward metrics to Datadog.


prometheus.io/port: "<port_number>" pod annotations to the workloads that expose metrics. This port is automatically used by the Gloo collector agent, Gloo telemetry gateway, and Prometheus to scrape the metrics from these workloads. You can change the port by changing the pod annotation. However, keep in mind that changing the default scraping ports might lead to unexpected results, because Solo Enterprise for Istio processes might depend on the default setting.Learn more about the telemetry data that is collected in the Gloo telemetry pipeline.
When you enable the Gloo telemetry pipeline, the collector agents and, if applicable, the telemetry gateway are configured to collect metrics in your Solo Enterprise for Istio environment.
Gloo telemetry collector agents scrape metrics from workloads in your cluster, such as the Gloo agents and management server, the Istio control plane, ztunnels, waypoint proxies, and the workloads’ sidecar proxies. To determine the workloads that need to be scraped and find the port where metrics are exposed, the prometheus.io/scrape: "true" and prometheus.io/port: "<port_number>" pod annotations are used. All Gloo components that expose metrics and all Istio-specific workloads are automatically deployed with these annotations.
The telemetry agents then enrich the metrics with Layer 7 attributes. For example, the ID of the source and destination workload is added to the metrics so that you can filter the metrics for the workload that you are interested in. When using the Solo distribution of Istio for your ambient mesh, Layer 7 attributes are also automatically added to metrics that the ztunnel emits. This feature gives you HTTP observability into your ambient mesh, even if no waypoints are used.
The built-in Prometheus server is set up to scrape metrics from the Gloo telemetry collector agents. In multicluster setups, the Prometheus server also scrapes metrics from the Gloo telemetry gateway in the management cluster that receives metrics from all workload clusters.
Observability tools, such as the Gloo UI, read metrics from Prometheus and visualize this data so that you can monitor the health of the Solo Enterprise for Istio components and Istio workloads, and to receive alerts if an issue is detected. For more information, see the Prometheus overview.
You can configure the Gloo telemetry pipeline to collect metadata about the compute instances, such as virtual machines, that the workload cluster is deployed to so that you can visualize your Solo Enterprise for Istio setup across your cloud provider infrastructure network. The metadata is added as labels to metrics and exposed on the Gloo telemetry collector agent (single cluster), or sent to the Gloo telemetry gateway (multicluster) where they can be scraped by the built-in Prometheus server. You can then use the Prometheus expression browser to analyze these metrics.
For more information, see Collect compute instance metadata.
You can configure the Gloo telemetry pipeline to collect traces that your workloads emit and to forward these traces to your own Jaeger instance. Note that workloads must be instrumented to emit traces before traces can be collected by the pipeline.
To add traces to the Gloo telemetry pipeline, you must configure the collector agents to pick up the traces and forward them to your Jaeger platform.
For more information, see Add Istio request traces.
Solo Enterprise for Istio comes with an insights engine that automatically analyzes your Istio setups for health issues. These issues are displayed in the UI along with recommendations to harden your Istio setups. The insights give you a checklist to address issues that might otherwise be hard to detect across your environment.
If you follow the Get started guide, insights are automatically enabled in your Solo Enterprise for Istio environment. The Gloo analyzer component in the Gloo agent monitors and analyzes Istio workloads and generates analyzer results. The Gloo agent writes these analyzer results as logs to Redis Streams (single cluster), or forwards the logs to the Gloo telemetry gateway (multicluster) where they are written to Redis.
The Gloo UI uses the analyzer results in Redis and executes queries on Prometheus metrics to create Istio insights. These insights are then stored in memory so that the Gloo UI can read and display them to the user.
For more information, see Add Istio insights.
Built-in telemetry pipelines
The Gloo telemetry pipeline is set up with default pipelines that you can enable to collect telemetry data in your cluster.
| Telemetry data | Collector agent pipeline | Description |
|---|---|---|
| Compute metadata | metrics/otlp_relay | The metrics/otlp_relay pipeline collects metadata about the compute instances, such as virtual machines, that the workload cluster is deployed to, and adds the metadata as labels on metrics. The metrics are exposed on the Gloo telemetry collector agent where they can be scraped by the built-in Prometheus server. For more information, see Collect compute instance metadata. |
| Insights | logs/analyzer | The logs/analyzer pipeline is enabled by default and collects analyzer results from the Gloo analyzer component. Analyzer results are stored in Redis Streams where they can be picked up by the insights engine in the Gloo management server to generate insights later. If you follow the get started guide, the insights engine is already enabled in your environment, and analyzer results are collected by the Gloo telemetry pipeline. For more information, see Add Istio insights. |
| Logs | logs/ui | The logs/ui pipeline collects logs from Solo Enterprise for Istio components, such as the Gloo management server or agent. These logs are then displayed in the Gloo UI. |
| Metrics | metrics/ui | The metrics/ui pipeline is enabled by default and collects the metrics that are required for the Gloo UI graph. Metrics in the collector agent are then scraped by the built-in Prometheus server so that they can be provided to Gloo observability tools. To view the metrics that are captured with this pipeline, see Default metrics in the pipeline. |
| Gloo Gateway 2.x metrics | metrics/ggv2 | The metrics/ggv2 pipeline collects metrics that are required to render Gloo Gateway 2.x information, such as the control and data plane installation statuses, in the Gloo UI. Metrics in the collector agent are then scraped by the built-in Prometheus server so that they can be provided to Gloo observability tools. To enable the required processor, receiver, and exporter for the metrics/ggv2 pipeline, see Set up the UI in the Gloo Gateway 2.x documentation. |
| Traces | traces/istio | The traces/istio pipeline collects request traces from Istio-enabled workloads and sends them to your custom Jaeger instance. For more information, see Add Istio request traces. |
| Telemetry data | Collector agent pipeline | Gateway pipeline | Description |
|---|---|---|---|
| Compute metadata | metrics/otlp_relay | N/A | The metrics/otlp_relay pipeline collects metadata about the compute instances, such as virtual machines, that the cluster is deployed to, and adds the metadata as labels on metrics. The metrics are sent to the Gloo telemetry gateway where they can be scraped by the built-in Prometheus server. For more information, see Collect compute instance metadata. |
| Insights | logs/analyzer | logs/redis_stream | The logs/analyzer pipeline is enabled by default, and collects analyzer results from the Gloo analyzer component and forwards them to the Gloo telemetry gateway. Analyzer results are then stored in Redis by using the logs/redis_stream pipeline so that they can be picked up by the insights engine in the Gloo management server to generate insights. If you follow the get started guide, the insights engine is already enabled in your environment, and analyzer results are collected by the Gloo telemetry pipeline. For more information, see Add Istio insights. |
| Logs | logs/ui | N/A | The logs/ui pipeline collects logs from Solo Enterprise for Istio components, such as the Gloo management server or agent. These logs are then displayed in the Gloo UI. |
| Metrics | metrics/ui | metrics/prometheus | The metrics/ui pipeline is enabled by default and collects the metrics that are required for the Gloo UI graph. Metrics in the collector agent are then scraped by the built-in Prometheus server so that they can be provided to Gloo observability tools. To view the metrics that are captured with this pipeline, see Default metrics in the pipeline. |
| Gloo Gateway 2.x metrics | metrics/ggv2 | metrics/prometheus | The metrics/ggv2 pipeline collects metrics that are required to render Gloo Gateway 2.x information, such as the control and data plane installation statuses, in the Gloo UI. Metrics in the collector agent are then scraped by the built-in Prometheus server so that they can be provided to Gloo observability tools. To enable the required processor, receiver, and exporter for the metrics/ggv2 pipeline, see Set up the UI in the Gloo Gateway 2.x documentation. |
| Traces | traces/istio | traces/jaeger | The traces/istio pipeline collects request traces from Istio-enabled workloads and sends them to your custom Jaeger instance by using the traces/jaeger pipeline. For more information, see Add Istio request traces. |
Default metrics in the pipeline
By default, the Gloo telemetry pipeline is configured to scrape the metrics that are required for the Gloo UI from various workloads in your cluster by using the metrics/ui and metrics/prometheus pipelines. The built-in Prometheus server is configured to scrape metrics from the Gloo collector agent (single cluster), or Gloo telemetry gateway and collector agent (multicluster). To reduce cardinality in the Gloo telemetry pipeline, only a few labels are collected for each metric. For more information, see Metric labels.
Review the metrics that are available in the Gloo telemetry pipeline. You can set up additional receivers to scrape other metrics, or forward the metrics to other observability tools, such as Datadog, by creating your own custom exporter for the Gloo telemetry gateway. To find an example setup, see Forward metrics to Datadog.
Istio proxy, ztunnel, and waypoint proxy metrics
| Metric | Description |
|---|---|
istio_response_bytes | The number of bytes that are returned in the HTTP response. |
istio_request_bytes | The number of bytes that were sent in the HTTP request. |
istio_requests_total | The number of requests that were processed for an Istio proxy. For this metric to be collected at Layer 7 for ztunnels, you must set L7_ENABLED=true. |
istio_request_duration_milliseconds | The time it takes for a request to reach its destination in milliseconds. For this metric and the bucket, count, and sum submetrics to be collected at Layer 7 for ztunnels, you must set L7_ENABLED=true. |
istio_request_duration_milliseconds_bucket | The time it takes for a request to reach its destination in milliseconds. |
istio_request_duration_milliseconds_count | The total number of Istio requests since the Istio proxy was last started. |
istio_request_duration_milliseconds_sum | The sum of all request durations since the last start of the Istio proxy. |
istio_tcp_sent_bytes | The number of bytes that are sent in a response at a particular moment in time. |
istio_tcp_sent_bytes_total | The total number of bytes that are sent in a response. |
istio_tcp_received_bytes | The number of bytes that are received in a request at a particular moment in time. |
istio_tcp_received_bytes_total | The total number of bytes that are received in a request. |
istio_tcp_connections_opened | The number of open connections to an Istio proxy at a particular moment in time. |
istio_tcp_connections_opened_total | The total number of open connections to an Istio proxy. |
Istiod metrics
| Metric | Description |
|---|---|
pilot_proxy_convergence_time | The time it takes between applying a configuration change and the Istio proxy receiving the configuration change. |
Solo Enterprise for Istio management server metrics
| Metric | Description |
|---|---|
gloo_mesh_build_snapshot_metric_time_sec | The time in seconds for the management server to generate an output snapshot for connected agents. |
gloo_mesh_garbage_collection_time_sec | The time it takes for the garbage collector to clean up unused resources in seconds, such as after the custom resource translation. |
gloo_mesh_reconciler_time_sec_bucket | The time the management server needs to sync with the agents in the workload clusters to apply the translated resources. This metric is captured in seconds for the following intervals (buckets): 1, 2, 5, 10, 15, 30, 50, 80, 100, 200. |
gloo_mesh_redis_relation_err_total | The number of errors that occurred during a read or write operation of relationship data to Redis. |
gloo_mesh_redis_sync_err_total | The number of times the management server could not read from or write to the Redis instance. |
gloo_mesh_redis_write_time_sec | The time it takes in seconds for the management server to write to the Redis database. |
gloo_mesh_relay_client_delta_pull_time_sec | The time it takes for a agent to receive a delta output snapshot from the management server in seconds. |
gloo_mesh_relay_client_delta_pull_err | The number of errors that occurred while sending a delta output snapshot to a connected agent. |
gloo_mesh_relay_client_delta_push_last_loop_timestamp_seconds | The unix timestamp (in seconds) of the last time the agent created a delta snapshot. This metric is generated, even if the snapshot was empty and not sent to the management server. |
gloo_mesh_relay_client_delta_push_time_sec | The time it takes for a agent to send a delta input snapshot to the management server in seconds. |
gloo_mesh_relay_client_delta_push_err | The number of errors that occurred while sending a delta input snapshot from the agent to the management server. |
gloo_mesh_relay_client_last_delta_pull_received_timestamp_seconds | The unix timestamp (in seconds) of the last time the agent received a delta snapshot from the management server. |
gloo_mesh_relay_client_last_delta_push_timestamp_seconds | The unix timestamp (in seconds) of the last time the agent pushed a delta snapshot (either non-empty, or the initial snapshot). |
gloo_mesh_relay_client_last_server_communication_pull_stream_timestamp_seconds | The unix timestamp (in seconds) of the last time the agent received a response from the management server. |
gloo_mesh_snapshot_upserter_op_time_sec | The time it takes for a snapshot to be updated and/or inserted in the management server local memory in seconds. |
gloo_mesh_safe_mode_active | Indicates whether safe mode is enabled in the management server. For more information, see Redis safe mode options. |
gloo_mesh_translation_time_sec_bucket | The time the management server needs to translate Gloo resources into Istio or Envoy resources. This metric is captured in seconds for the following intervals (buckets): 1, 2, 5, 10, 15, 20, 25, 30, 45, 60, and 120. |
gloo_mesh_translator_concurrency | The number of translation operations that the management server can perform at the same time. |
object_write_fails_total | The total number of failures that occurred when attempting to write an Istio object to storage. For example, this metric increases if invalid Istio configuration is rejected by the Istio control plane istiod. Write failures can occur during an upsert, delete, or status upsert action. |
relay_pull_clients_connected | The number of agents that are connected to the management server. |
relay_push_clients_warmed | The number of agents that are ready to accept updates from the management server. |
translation_error | The number of translation errors that were reported by the management server. |
translation_warning | The number of translation warnings that were reported by the management server. |
Telemetry pipeline metrics
| Metric | Description |
|---|---|
otelcol_processor_refused_metric_points | The number of metrics that were refused by the telemetry pipeline processor. For example, metrics might be refused to prevent collector agents from being overloaded in the case of insufficient memory resources. |
otelcol_receiver_refused_metric_points | The number of metrics that were refused by the telemetry pipeline receiver. For example, metrics might be refused to prevent collector agents from being overloaded in the case of insufficient memory resources. |
otelcol_processor_refused_spans | The metric spans that were refused by the memory_limiter in the telemetry pipeline to prevent collector agents from being overloaded. |
otelcol_exporter_queue_capacity | The amount of telemetry data that can be stored in memory while waiting on a worker in the collector agent to become available to send the data. |
otelcol_exporter_queue_size | The amount of telemetry data that is currently stored in memory. If the size is equal or larger than otelcol_exporter_queue_capacity, new telemetry data is rejected. |
otelcol_loadbalancer_backend_latency | The time the collector agents need to export telemetry data. |
otelcol_exporter_send_failed_spans | The number of telemetry data spans that could not be sent to a backend. |
Metrics labels
To reduce cardinality in the telemetry pipeline, only the following labels are collected for each metric.
| Metric group | Labels |
|---|---|
| Istio | [“cluster”, “collector_pod” , “connection_security_policy”, “destination_cluster”, “destination_principal”, “destination_service”, “destination_workload”, “destination_workload_id”, “destination_workload_namespace”, “gloo_mesh”, “namespace”, “pod_name”, “reporter”, “response_code”, “source_cluster”, “source_principal”, “source_workload”, “source_workload_namespace”, “version”, “workload_id”] |
| Telemetry pipeline | [“app”, “cluster”, “collector_name”, “collector_pod”, “component”, “exporter”, “namespace”, “pod_template_generation”, “processor”, “service_version”] |
Observability tools
The Gloo observability pipeline comes with several observability tools that help you monitor the health of Gloo and Istio components and the workloads in your cluster.
Solo Enterprise for Istio health
View the configuration and status of Gloo custom resources. You can also view the health of the clusters that are registered with the Gloo management server. To monitor the health of your Solo Enterprise for Istio components, such as the Gloo management server or Gloo telemetry collector agent, use the Gloo UI log viewer to view, filter, search, or download logs for these components.
Use the Prometheus expression browser to run PromQL queries to analyze and aggregate Solo Enterprise for Istio metrics. To view metrics that are collected by default, see Gloo management server metrics. To view the alerts that are automatically set up for you, see Alerts.
Ingress gateway
View the components of your gateway setup. To monitor the traffic to your gateway, you can access the Gloo UI Graph.
Check the Istio insights that the Gloo analyzer collects for your gateways and reports in the Gloo UI. These insights can help determine the security posture of your setup, the gateway health, and production readiness. The insights give you a checklist to address issues that might otherwise be hard to detect across your environment.
The Gloo telemetry pipeline collects Istio metrics from the ingress gateway proxy and exposes those metrics so that the built-in Prometheus server can scrape them. To view the metrics that are collected by default, see Istio proxy metrics. You can access these metrics by running PromQL queries in the Prometheus expression browser. To find example queries that you can run, see Ingress gateway queries.
You can enable request tracing for the ingress gateway and add these traces to the Gloo telemetry pipeline so that they can be forwarded to a custom Jaeger instance. For more information about how to set up tracing, and how to enable Jaeger, see Add Istio request traces.
Leverage the default Envoy access log collector to record logs for the apps that send requests to the Istio ingress gateway. You can review these logs to troubleshoot issues as-needed, or scrape these logs to view them in your larger platform logging system.
Service mesh
View your service mesh workloads. To monitor the traffic to your service mesh workloads, you can access the Gloo UI Graph.
Check the Istio insights that the Gloo analyzer collects for your service mesh workloads and reports in the Gloo UI. These insights can help determine the security posture of your workloads, their health, and production readiness. The insights give you a checklist to address issues that might otherwise be hard to detect across your environment.
The Gloo telemetry pipeline collects Istio metrics from the Istio-enabled workloads and exposes those metrics so that the built-in Prometheus server can scrape them. To view the metrics that are collected by default, see Istio proxy metrics. You can access these metrics by running PromQL queries in the Prometheus expression browser. To find example queries that you can run, see Service mesh workload queries.
You can enable request tracing for Istio-enabled workloads and add these traces to the Gloo telemetry pipeline so that they can be forwarded to a custom Jaeger instance. For more information about how to set up tracing, and how to enable Jaeger, see Add Istio request traces.
Leverage the default Envoy access log collector to record logs for the apps that send requests to Istio-enabled workloads in your service mesh. You can review these logs to troubleshoot issues as-needed, or scrape these logs to view them in your larger platform logging system.