Review options to customize the default Prometheus setup.
Bring your own Prometheus
The built-in Prometheus server is the recommended approach for scraping metrics from Gloo components and feeding them to the Gloo UI graph to visualize workload communication. When you enable the built-in Prometheus during your installation, it is set up with a custom scraping configuration that ensures that only a minimum set of metrics and metric labels are collected.
However, the Prometheus pod is not set up with persistent storage and metrics are lost when the pod restarts or when the deployment is scaled down. Additionally, you might want to replace the built-in Prometheus server and use your organization’s own Prometheus-compatible solution or time series database that is hardened for production and integrates with other applications that might exist outside the cluster where your API Gateway runs. Review the options that you have for bringing your own Prometheus server.
Replace the built-in Prometheus with your own
If you have an existing Prometheus instance that you want to use in place of the built-in Prometheus server, you configure Gloo Network to disable the built-in Prometheus instance and to use your production Prometheus instance instead. This setup is a reasonable approach if you want to scrape raw Istio metrics to collect them in your production Prometheus instance. However, you cannot control the number of metrics that you collect, or federate and aggregate the metrics before you scrape them with your production Prometheus.
To query the metrics and compute results, you use the compute resources of the cluster where your production Prometheus instance runs. Note that depending on the number and complexity of the queries that you plan to run in your production Prometheus instance, especially if you use the instance to consolidate metrics of other apps as well, your production instance might get overloaded or start to respond more slowly.
To have more granular control over the metrics that you want to collect, it is recommended to set up additional receivers, processors, and exporters in the Gloo telemetry pipeline to make these metrics available to the Gloo telemetry gateway. Then, forward these metrics to the third-party solution or time series database of your choice, such as your production Prometheus or Datadog instance. For more information, see the Prometheus receiver and Prometheus exporter OpenTelemetry documentation.
Get your current installation Helm values, and save them in a file.
helm get values gloo-platform -n gloo-mesh -o yaml > gloo-network-single.yaml open gloo-network-single.yaml
In your Helm values file, disable the default Prometheus instance and instead enter the details of your custom Prometheus server. Make sure that the instance runs Prometheus version 2.16.0 or later. In the
prometheusUrlfield, enter the Prometheus URL that your instance is exposed on, such as
http://kube-prometheus-stack-prometheus.monitoring:9090. You can get this value from the
--web.external-urlfield in your Prometheus Helm values file or by selecting Status > Command-Line-Flags from the Prometheus UI. Do not use the FQDN for the Prometheus URL.
prometheus: enabled: false common: prometheusUrl: <Prometheus_server_URL_and_port>
Upgrade your installation by using your updated Helm values file.
helm upgrade gloo-platform gloo-network/gloo-network \ --namespace gloo-mesh \ --version 2.5.0 \ --values gloo-network-single.yaml
Run another Prometheus instance alongside the built-in one
You might want to run multiple Prometheus instances in your cluster that each capture metrics for certain components. For example, you might use the built-in Prometheus server in Gloo Network to capture metrics for the Gloo components, and use a different Prometheus server for your own apps’ metrics. While this setup is supported, make sure that you check the scraping configuration for each of your Prometheus instances to prevent metrics from being scraped multiple times.