Prometheus
Every Gloo installation comes with a built-in Prometheus server that is automatically set up and configured to scrape metrics from the metrics endpoints in your Gloo environment. Metrics provide important information about the performance and health of both your service mesh and the Gloo Platform components. For example, you can monitor the time a request takes to be routed to your app, the number of successful and failed requests that your app processed, or the Gloo custom resources that could not be translated into Istio or Cilium resources. You can use these metrics to detect failures and troubleshoot bottlenecks.
About the built-in Prometheus server
Prometheus is a powerful time series database that you can use to visualize, analyze, and operate on metrics that are collected from your environment. When you install a Gloo product, you must decide on the metrics pipeline that you want to use to collect metrics in your Gloo environment. Depending on the pipeline that you choose, the built-in Prometheus server is configured to scrape metrics from either the Gloo management server or the Gloo metrics gateway endpoint in the Gloo management cluster. To find an overview of the default Prometheus server configuration, see []().
After metrics are scraped and available to the Prometheus server, you can view these metrics by accessing the Prometheus UI and running PromQL queries. PromQL is a functional query language that lets you select and aggregate time series, and you can visualize the results of your query in a graph or table.
In addition, several Gloo observability tools use the data that is available in Prometheus to help you check and monitor the health of your service mesh and the Gloo Platform components more easily. For more information, see Observability tools that use Prometheus metrics.
Overview of available metrics
To find an overview of metrics that are available to you, open the Prometheus UI and query the metrics
endpoint.
- Open the built-in Prometheus dashboard.
- Access the
metrics
endpoint.
For a detailed list of service mesh and Gloo Platform-specific metrics, see the following links:
Overview of available alerts
To monitor the Gloo Platform components more easily, Gloo automatically sets up alerts in Prometheus for certain Gloo Platform metrics and observes these metrics over time. These metrics include:
- Latency: The time it takes to translate or reconcile Gloo resources in your environment.
- Gloo agents: Monitors the connection between the Gloo mangement server and workload clusters.
- Translation errors: Reports the Gloo resources that cannot be correctly translated into Istio or Cilium resources.
- Redis errors: Lists connection failures between the Gloo management server and the Redis database where all of the Gloo configuration is stored.
Alerts are automatically surfaced in the operations dashboard, but can also be accessed by using the Prometheus UI directly. To find a detailed overview of the alerts that are automatically configured in Gloo, see Expore default alerts.
Observability tools that use Prometheus metrics
Several Gloo observability tools use the Prometheus metrics to visualize them in a more consumable way.
- Gloo UI: The Gloo UI monitors certain workload metrics in Prometheus and how they change over time. This data is shown in the Gloo UI Graph tab. For more information, see Monitored metrics in the Gloo UI.
- Operations dashboard: The Gloo Platform operations dashboard uses the data in Prometheus to visualizes key metrics and critical alerts for Gloo Platform components. For more information, see Operations dashboard.