Prometheus
During your installation, you have the option to deploy a Gloo Prometheus server that you can use to monitor the health of the Gloo control and data plane. The Prometheus server scrapes metrics from the components in your Gloo environment, such as the Gloo management server and agents, and forwards these metrics to other observability tools, such as the Gloo UI and the Gloo Platform operations dashboard.
Metrics provide important information about the performance and health of your Gloo, Istio, and Cilium resources. For example, you can monitor the time a request takes to be routed from the gateway to your app, the number of successful and failed requests that were processed, or the Gloo custom resources that could not be translated into Istio or Cilium resources. You can use these metrics to detect failures and troubleshoot bottlenecks.
About the built-in Prometheus server
Prometheus is a powerful time series database that you can use to visualize, analyze, and operate on metrics that are collected from your environment. When you install a Gloo product, you can decide to enable the built-in Prometheus server alongside the Gloo OpenTelemetry (OTel) pipeline. The telemetry pipeline collects metrics from various sources and provides them to the telemetry gateway so that the Prometheus server can scrape them.
After metrics are scraped and available to the Prometheus server, you can view these metrics by accessing the Prometheus UI and running PromQL queries. PromQL is a functional query language that lets you select and aggregate time series, and you can visualize the results of your query in a graph or table.
In addition, several Gloo observability tools use the data that is available in Prometheus to help you check and monitor the health of the Gloo control and data plane. For more information, see Observability tools that use Prometheus metrics.
Overview of available metrics
To find an overview of metrics that are available to you, open the Prometheus UI and query the metrics
endpoint. For more information, see Open the UI.
For sample Prometheus queries that you can run to monitor the health of your Gloo components, see the following links.
- Gloo Platform metrics
- Ingress gateway metrics (Gloo Gateway)
- Service mesh metrics (Gloo Mesh Enterprise)
- Cilium metrics (Gloo Network and Cilium clusters)
Overview of available alerts
To monitor the Gloo Platform components more easily, Gloo automatically sets up alerts in Prometheus for certain Gloo Platform metrics and observes these metrics over time. These metrics include:
- Latency: The time it takes to translate or reconcile Gloo resources in your environment.
- Gloo agents: Monitors the connection between the Gloo mangement server and workload clusters.
- Translation errors: Reports the Gloo resources that cannot be correctly translated into Istio or Cilium resources.
- Redis errors: Lists connection failures between the Gloo management server and the Redis database where all of the Gloo configuration is stored.
Alerts are automatically surfaced in the operations dashboard, but can also be accessed by using the Prometheus UI directly. To find a detailed overview of the alerts that are automatically configured in Gloo, see Gloo Platform alerts.
Observability tools that use Prometheus metrics
Several Gloo observability tools use the Prometheus metrics to visualize them in a more consumable way.
- Gloo UI: The Gloo UI monitors certain workload metrics in Prometheus and how they change over time. This data is shown in the Gloo UI Graph tab. For more information, see Monitored metrics in the Gloo UI.
- Operations dashboard: The Gloo Platform operations dashboard uses the data in Prometheus to visualizes key metrics and critical alerts for Gloo Platform components. For more information, see Operations dashboard.