Metrics pipeline options

When you install Gloo, you can choose how you want to collect metrics in your Gloo environment. Metrics provide important information about the performance and health of your service mesh, such as the time a request takes to be routed to your app, or the number of successful and failed requests that were sent to your apps. In addition, you can use the metrics to monitor the health of your Gloo Platform environment, such as the number of failed translations or workload clusters that experience issues with connecting to the Gloo management server. You can use these measures to detect failures and troubleshoot bottlenecks.

Gloo offers the following options to collect metrics in your environment:

After metrics are collected and enriched, the built-in Prometheus server scrapes the metrics and provides them to other observability tools, such as the Gloo UI or the Gloo Platform operations dashboard. You can use these tools to monitor the health of Gloo Mesh Enterprise and Gloo Platform, and to receive alerts if an issue is detected. For more information, see the Observability tools.

Default: Gloo agents and management server

You can collect metrics for the services in your mesh by using the built-in capabilities of the Gloo agents and Gloo management server.

Architecture overview

The following image shows how metrics are sent from the workload's sidecar proxies to the Prometheus server in the management cluster.

Figure: Overview of how metrics are sent from the workload's sidecar proxies to the Prometheus server with the default metrics pipeline.
  1. As requests are sent or received by the workload's sidecar proxy in the mesh, metrics are immediately sent to the Gloo agent via gRPC push procedures.
  2. The agent enriches the data. For example, the agent adds the ID of the source and destination workload that you can use to filter the metrics for the workload that you are interested in. Then, the agent uses gRPC push procedures to forward these metrics to the Gloo management server in the management cluster.
  3. The built-in Prometheus server scrapes the metrics from the Gloo management server. Scraped metrics are available in the Gloo UI and in the Prometheus UI.

Pros and cons

Review the pros and cons of using this approach to decide whether this approach is the right one for your environment.

✅ Recommended for small environments or proof of concepts (POC).
❌ Strongly coupled with Gloo core functionality. Large amounts of metrics can impact the performance of Gloo agents and management server.
❌ Must be scaled by increasing the number of Gloo agent and management server pods.

Alpha: Gloo OpenTelemetry (OTel) metrics collector

With Gloo OpenTelemetry (OTel), you can deploy a metrics pipeline to your Gloo environment that is decoupled from the Gloo agents and management server core functionality. Gloo metrics collector agents are designed to collect metrics from various sources, and enrich the data so that you can use them in an observability platform of your choice.

The OpenTelemetry metrics pipeline is released as an alpha feature. Functionality might change without prior notice in future releases. Do not use this feature in production environments.

Architecture overview

The following image shows how metrics are collected in your Gloo environment when using the Gloo OTel metrics collector pipeline.

Figure: Overview of how metrics are sent from the workload's sidecar proxies to the Prometheus server with the Gloo OTel metrics pipeline.
  1. Gloo metrics collector agents are deployed as a daemonset in all Gloo workload clusters. The collector agents scrape metrics from workloads in your cluster, such as the Gloo agents, the Istio control plane istiod, or the Istio-injected workloads. The agents then enrich and convert the metrics. For example, the ID of the source and destination workload is added to the metrics so that you can filter the metrics for the workload that you are interested in.
  2. The collector agents send the scraped metrics to the Gloo metrics gateway in the Gloo management cluster via gRPC push procedures.
  3. The Prometheus server scrapes the metrics from the Gloo metrics gateway.

Overview of scraped metrics

By default, the Gloo metrics collector agents are configured to scrape only the metrics that are required to visualize traffic in the Gloo UI Graph and to determine the health of Gloo Platform components.

If you want to scrape more metrics, you must edit the Gloo metrics collector agent configmap.

To find an overview of the metrics that are being scraped, choose between the following options.

Pros and cons

Review the pros and cons of using this approach to decide whether this approach is the right one for your environment.

✅ Recommended for large environments.
✅ Decoupled from the Gloo agents and management server.
✅ Scale individual Gloo metrics components with your workloads.
✅ Built-in metrics transformations, and better retry and multi-layer caching for more data resilience.
✅ Reuse Gloo metrics collector agents for other data sources in or outside your Gloo environment.
✅ Monitor the health of your metrics pipeline by using scraped metrics from the Gloo metrics gateway and collector agents.
❌ Currently offered as an alpha feature and not yet production-ready.
❌ Depending in your setup, might require more cluster resources than the legacy metrics pipeline.