Gloo OpenTelemetry metrics pipeline

When you install Gloo, you can enable the Gloo OpenTelemetry (OTel) pipeline to collect metrics in your Gloo environment. Metrics provide important information about the performance and health of your API Gateway, such as the time a request takes to be routed to your app or the number of successful and failed requests that your API Gateway processed. In addition, you can use the metrics to monitor the health of your Gloo Platform environment, such as the number of failed translations or workload clusters that experience issues with connecting to the Gloo management server. You can use these measures to detect failures and troubleshoot bottlenecks.

After metrics are collected and enriched, the built-in Prometheus server scrapes the metrics and provides them to other observability tools, such as the Gloo UI or the Gloo Platform operations dashboard. You can use these tools to monitor the health of Gloo Gateway and Gloo Platform, and to receive alerts if an issue is detected. For more information, see the Observability tools.

Architecture overview

The Gloo OTel metrics pipeline is decoupled from the Gloo agents and management server core functionality. Gloo telemetry collector agents are designed to collect metrics from various sources, and enrich the data so that you can use them in an observability platform of your choice.

The following image shows how metrics are collected in your Gloo environment when using the Gloo OTel telemetry collector pipeline.

Figure: Overview of how metrics are sent from the ingress gateway to the Prometheus server with the Gloo OTel telemetry pipeline.
  1. A Gloo telemetry collector agent is deployed as a daemonset in the cluster. The collector agent scrapes metrics from workloads in your cluster, such as the Gloo agents, the Istio control plane istiod, or the gateway proxies. The agent then enriches and converts the metrics. For example, the ID of the source and destination workload is added to the metrics so that you can filter the metrics for the workload that you are interested in.
  2. The collector agent sends the scraped metrics to the Gloo telemetry gateway via gRPC push procedures.
  3. The Prometheus server scrapes the metrics from the Gloo telemetry gateway.

Overview of scraped metrics

By default, the Gloo telemetry collector agents are configured to scrape only the metrics that are required to visualize traffic in the Gloo UI Graph and to determine the health of Gloo Platform components.

If you want to scrape more metrics, you must edit the Gloo telemetry collector agent configmap.

To find an overview of the metrics that are being scraped, choose between the following options.

Setup

Follow the steps in Set up the pipeline to set up the Gloo OpenTelemetry pipeline and explore ways to customize or troubleshoot the pipeline.