Scrape Gloo management server metrics
Gloo management server metrics are automatically scraped by the built-in Prometheus and can be viewed by accessing the Prometheus expression browser directly or by opening the Gloo operations dashboard in Grafana. However, these metrics are not available to the Gloo telemetry gateway by default.
You can configure your Gloo OTel pipeline to scrape these metrics from the Gloo management server and make these metrics available to the Gloo telemetry gateway so that you can forward them to third-party services.
-
Add the following configuration to your values file for the Gloo Network installation Helm chart. This configuration sets up a scraping job for the Gloo management server. In addition, regex expressions are used to drop or manipulate metrics. The data is then forwarded to the default
prometheus
exporter.telemetryGatewayCustomization: extraReceivers: prometheus/gloo-mgmt: config: scrape_configs: - job_name: gloo-mesh-mgmt-server-otel honor_labels: true kubernetes_sd_configs: - namespaces: names: - gloo-mesh role: pod relabel_configs: - action: keep regex: gloo-mesh-mgmt-server|gloo-mesh-ui source_labels: - __meta_kubernetes_pod_label_app - action: keep regex: true source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_scrape - action: drop regex: true source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_scrape_slow - action: replace regex: (https?) source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_scheme target_label: __scheme__ - action: replace regex: (.+) source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_path target_label: __metrics_path__ - action: replace regex: ([^:]+)(?::\d+)?;(\d+) replacement: $$1:$$2 source_labels: - __address__ - __meta_kubernetes_pod_annotation_prometheus_io_port target_label: __address__ - action: labelmap regex: __meta_kubernetes_pod_annotation_prometheus_io_param_(.+) replacement: __param_$$1 - action: labelmap regex: __meta_kubernetes_pod_label_(.+) - action: replace source_labels: - __meta_kubernetes_namespace target_label: namespace - action: replace source_labels: - __meta_kubernetes_pod_name target_label: pod - action: drop regex: Pending|Succeeded|Failed|Completed source_labels: - __meta_kubernetes_pod_phase extraPipelines: metrics/gloo-mgmt: receivers: - prometheus/gloo-mgmt # Prometheus scrape config for mgmt-server processors: - memory_limiter - batch exporters: - prometheus # Prometheus deployed by Gloo.
-
Follow the Upgrade instructions in the Gloo Mesh Enterprise documentation to apply the changes in your environment. For multicluster environments, upgrade only the Gloo management server with your updated values file, as no upgrade of the Gloo agent is required.
-
Verify that the configmap for the telemetry gateway is updated with the values that you set in the values file.
kubectl get configmap gloo-telemetry-gateway-config -n gloo-mesh -o yaml
-
Perform a rollout restart of the gateway deployment to force your configmap changes to be applied in the telemetry gateway pods.
kubectl rollout restart -n gloo-mesh deployment/gloo-telemetry-gateway
Gloo management server metrics overview
The following metrics are scraped from the Gloo management server and made available to the Gloo telemetry gateway when you enable the Helm configuration in this guide.
Metric | Description |
---|---|
gloo_mesh_reconciler_time_sec_bucket |
The time the Gloo management server needs to sync with the Gloo agents in the workload clusters to apply the translated resources. This metric is captured in seconds for the following intervals (buckets): 1, 2, 5, 10, 15, 30, 50, 80, 100, 200. |
gloo_mesh_redis_sync_err |
The number of times the Gloo mangement server could not read from or write to the Gloo Redis instance. |
gloo_mesh_translation_time_sec_bucket |
The time the Gloo management server needs to translate Gloo resources into Istio, Envoy, or Cilium resources. This metric is captured in seconds for the following intervals (buckets): 1, 2, 5, 10, 15, 20, 25, 30, 45, 60, and 120. |
gloo_mesh_translator_concurrency |
The number of translation operations that the Gloo management server can perform at the same time. |
relay_pull_clients_connected |
The number of Gloo agents that are connected to the Gloo management server. |
relay_push_clients_warmed |
The number of Gloo agents that are connected to the Gloo management server. |
solo_io_gloo_gateway_license |
The number of minutes until the Gloo Gateway license expires. To prevent your management server from crashing when the license expires, make sure to upgrade the license before expiration. |
solo_io_gloo_mesh_license |
The number of minutes until the Gloo Mesh Enterprise license expires. To prevent your management server from crashing when the license expires, make sure to upgrade the license before expiration. |
solo_io_gloo_network_license |
The number of minutes until the Gloo Network license expires. To prevent your management server from crashing when the license expires, make sure to upgrade the license before expiration. |
translation_error |
The number of translation errors that were reported by the Gloo management server. |
translation_warning |
The number of translation warnings that were reported by the Gloo management server. |