Metrics pipeline options
When you install Gloo, you can choose how you want to collect metrics in your Gloo environment. Metrics provide important information about the performance and health of your service mesh, such as the time a request takes to be routed to your app, or the number of successful and failed requests that were sent to your apps. In addition, you can use the metrics to monitor the health of your Gloo Platform environment, such as the number of failed translations or workload clusters that experience issues with connecting to the Gloo management server. You can use these measures to detect failures and troubleshoot bottlenecks.
Gloo offers the following options to collect metrics in your environment:
After metrics are collected and enriched, the built-in Prometheus server scrapes the metrics and provides them to other observability tools, such as the Gloo UI or the Gloo Platform operations dashboard. You can use these tools to monitor the health of Gloo Mesh Enterprise and Gloo Platform, and to receive alerts if an issue is detected. For more information, see the Observability tools.
Legacy: Gloo agents and management server (Deprecated)
You can collect metrics for the services in your mesh by using the built-in capabilities of the Gloo agents and Gloo management server.
The legacy pipeline is deprecated and is planned to be removed in Gloo Gateway version 2.4. For a highly available and scalable telemetry solution that is decoupled from the Gloo agent and management server core functionality, migrate to the Gloo OpenTelemetry pipeline. See Gloo OpenTelemetry (OTel) collector for more information.
Architecture overview
The following image shows how metrics are sent from the workload's sidecar proxies to the Prometheus server in the management cluster.
- As requests are sent or received by the workload's sidecar proxy in the mesh, metrics are immediately sent to the Gloo agent via gRPC push procedures.
- The agent enriches the data. For example, the agent adds the ID of the source and destination workload that you can use to filter the metrics for the workload that you are interested in. Then, the agent uses gRPC push procedures to forward these metrics to the Gloo management server in the management cluster.
- The built-in Prometheus server scrapes the metrics from the Gloo management server. Scraped metrics are available in the Gloo UI and in the Prometheus UI.
Pros and cons
Review the pros and cons of using this approach to decide whether this approach is the right one for your environment.
✅ Recommended for small environments or proof of concepts (POC). ❌ Strongly coupled with Gloo core functionality. Large amounts of metrics can impact the performance of Gloo agents and management server. ❌ Must be scaled by increasing the number of Gloo agent and management server pods.
Setup
The legacy metrics pipeline is automatically set up when you follow the get started guide. If you did not follow the guide or disabled the legacy metric, see Legacy metrics pipeline for instructions on how to set up the pipeline.
Gloo OpenTelemetry (OTel) collector
With Gloo OpenTelemetry (OTel), you can deploy a metrics pipeline to your Gloo environment that is decoupled from the Gloo agents and management server core functionality. Gloo telemetry collector agents are designed to collect metrics from various sources, and enrich the data so that you can use them in an observability platform of your choice.
Architecture overview
The following image shows how metrics are collected in your Gloo environment when using the Gloo OTel telemetry collector pipeline.
- Gloo telemetry collector agents are deployed as a daemonset in all Gloo workload clusters. The collector agents scrape metrics from workloads in your cluster, such as the Gloo agents, the Istio control plane
istiod
, or the Istio-injected workloads. The agents then enrich and convert the metrics. For example, the ID of the source and destination workload is added to the metrics so that you can filter the metrics for the workload that you are interested in. - The collector agents send the scraped metrics to the Gloo telemetry gateway in the Gloo management cluster via gRPC push procedures.
- The Prometheus server scrapes the metrics from the Gloo telemetry gateway.
Overview of scraped metrics
By default, the Gloo telemetry collector agents are configured to scrape only the metrics that are required to visualize traffic in the Gloo UI Graph and to determine the health of Gloo Platform components.
If you want to scrape more metrics, you must edit the Gloo telemetry collector agent configmap.
To find an overview of the metrics that are being scraped, choose between the following options.
- Option 1: Check the Gloo telemetry collector agent configmap.
kubectl get configmap gloo-telemetry-collector-config -n gloo-mesh -o yaml
- Option 2: Open the
metrics
endpoint of the Gloo telemetry gateway.kubectl port-forward deploy/gloo-telemetry-gateway -n gloo-mesh 9091
open localhost:9091/metrics
Pros and cons
Review the pros and cons of using this approach to decide whether this approach is the right one for your environment.
✅ Recommended for large environments. ✅ Decoupled from the Gloo agents and management server. ✅ Scale individual Gloo metrics components with your workloads. ✅ Built-in metrics transformations, and better retry and multi-layer caching for more data resilience. ✅ Reuse Gloo telemetry collector agents for other data sources in or outside your Gloo environment. ✅ Monitor the health of your metrics pipeline by using scraped metrics from the Gloo telemetry gateway and collector agents. ❌ Depending in your setup, might require more cluster resources than the legacy metrics pipeline.
Setup
Follow the steps in Set up the pipeline to set up the Gloo OpenTelemetry pipeline and explore ways to customize or troubleshoot the pipeline.