Metrics pipeline options

When you install Gloo, you can choose how you want to collect metrics in your Gloo environment. Metrics provide important information about the performance and health of your API Gateway, such as the time a request takes to be routed to your app or the number of successful and failed requests that your API Gateway processed. In addition, you can use the metrics to monitor the health of your Gloo Platform environment, such as the number of failed translations or workload clusters that experience issues with connecting to the Gloo management server. You can use these measures to detect failures and troubleshoot bottlenecks.

Gloo offers the following options to collect metrics in your environment:

After metrics are collected and enriched, the built-in Prometheus server scrapes the metrics and provides them to other observability tools, such as the Gloo UI or the Gloo Platform operations dashboard. You can use these tools to monitor the health of Gloo Gateway and Gloo Platform, and to receive alerts if an issue is detected. For more information, see the Observability tools.

Legacy: Gloo agents and management server (Deprecated)

You can collect metrics for the ingress gateway by using the built-in capabilities of the Gloo agents and Gloo management server.

The legacy pipeline is deprecated and is planned to be removed in Gloo Gateway version 2.4. For a highly available and scalable telemetry solution that is decoupled from the Gloo agent and management server core functionality, migrate to the Gloo OpenTelemetry pipeline. See Gloo OpenTelemetry (OTel) collector for more information.

Architecture overview

The following image shows how metrics are sent from the ingress gateway proxy in your workload clusters to the Prometheus server in the management cluster.

Figure: Overview of how metrics are sent from the ingress gateway proxy to the Prometheus server with the legacy metrics pipeline.
  1. As requests are sent or received by the gateway proxy, metrics are immediately sent to the Gloo agent via gRPC push procedures.
  2. The agent enriches the data. For example, the agent adds the ID of the source and destination workload that you can use to filter the metrics for the workload that you are interested in. Then, the agent uses gRPC push procedures to forward these metrics to the Gloo management server.
  3. The built-in Prometheus server scrapes the metrics from the Gloo management server. Scraped metrics are available in the Gloo UI and in the Prometheus UI.

Pros and cons

Review the pros and cons of using this approach to decide whether this approach is the right one for your environment.

✅ Recommended for small environments or proof of concepts (POC).
❌ Strongly coupled with Gloo core functionality. Large amounts of metrics can impact the performance of Gloo agents and management server.
❌ Must be scaled by increasing the number of Gloo agent and management server pods.

Setup

The legacy metrics pipeline is automatically set up when you follow the get started guide. If you did not follow the guide or disabled the legacy metric, see Legacy metrics pipeline for instructions on how to set up the pipeline.

Gloo OpenTelemetry (OTel) collector

With Gloo OpenTelemetry (OTel), you can deploy a metrics pipeline to your Gloo environment that is decoupled from the Gloo agents and management server core functionality. Gloo telemetry collector agents are designed to collect metrics from various sources, and enrich the data so that you can use them in an observability platform of your choice.

Architecture overview

The following image shows how metrics are collected in your Gloo environment when using the Gloo OTel telemetry collector pipeline.

Figure: Overview of how metrics are sent from the ingress gateway to the Prometheus server with the Gloo OTel telemetry pipeline.
  1. A Gloo telemetry collector agent is deployed as a daemonset in the cluster. The collector agent scrapes metrics from workloads in your cluster, such as the Gloo agents, the Istio control plane istiod, or the gateway proxies. The agent then enriches and converts the metrics. For example, the ID of the source and destination workload is added to the metrics so that you can filter the metrics for the workload that you are interested in.
  2. The collector agent sends the scraped metrics to the Gloo telemetry gateway via gRPC push procedures.
  3. The Prometheus server scrapes the metrics from the Gloo telemetry gateway.

Overview of scraped metrics

By default, the Gloo telemetry collector agents are configured to scrape only the metrics that are required to visualize traffic in the Gloo UI Graph and to determine the health of Gloo Platform components.

If you want to scrape more metrics, you must edit the Gloo telemetry collector agent configmap.

To find an overview of the metrics that are being scraped, choose between the following options.

Pros and cons

Review the pros and cons of using this approach to decide whether this approach is the right one for your environment.

✅ Recommended for large environments.
✅ Decoupled from the Gloo agents and management server.
✅ Scale individual Gloo metrics components with your workloads.
✅ Built-in metrics transformations, and better retry and multi-layer caching for more data resilience.
✅ Reuse Gloo telemetry collector agents for other data sources in or outside your Gloo environment.
✅ Monitor the health of your metrics pipeline by using scraped metrics from the Gloo telemetry gateway and collector agents.
❌ Depending in your setup, might require more cluster resources than the legacy metrics pipeline.

Setup

Follow the steps in Set up the pipeline to set up the Gloo OpenTelemetry pipeline and explore ways to customize or troubleshoot the pipeline.