Scrape Gloo management server metrics

Gloo management server metrics are automatically scraped by the built-in Prometheus and can be viewed by accessing the Prometheus expression browser directly or by opening the Gloo operations dashboard in Grafana. However, these metrics are not available to the Gloo telemetry gateway by default.

You can configure your Gloo OTel pipeline to scrape these metrics from the Gloo management server and make these metrics available to the Gloo telemetry gateway so that you can forward them to third-party services.

  1. Add the following configuration to your values file for the Gloo Network installation Helm chart. This configuration sets up a scraping job for the Gloo management server. In addition, regex expressions are used to drop or manipulate metrics. The data is then forwarded to the default prometheus exporter.

    telemetryGatewayCustomization:
      extraReceivers:
        prometheus/gloo-mgmt:
          config:
            scrape_configs:
            - job_name: gloo-mesh-mgmt-server-otel
              honor_labels: true
              kubernetes_sd_configs:
              - namespaces:
                  names:
                  - gloo-mesh
                role: pod
              relabel_configs:
              - action: keep
                regex: gloo-mesh-mgmt-server|gloo-mesh-ui
                source_labels:
                - __meta_kubernetes_pod_label_app
              - action: keep
                regex: true
                source_labels:
                - __meta_kubernetes_pod_annotation_prometheus_io_scrape
              - action: drop
                regex: true
                source_labels:
                - __meta_kubernetes_pod_annotation_prometheus_io_scrape_slow
              - action: replace
                regex: (https?)
                source_labels:
                - __meta_kubernetes_pod_annotation_prometheus_io_scheme
                target_label: __scheme__
              - action: replace
                regex: (.+)
                source_labels:
                - __meta_kubernetes_pod_annotation_prometheus_io_path
                target_label: __metrics_path__
              - action: replace
                regex: ([^:]+)(?::\d+)?;(\d+)
                replacement: $$1:$$2
                source_labels:
                - __address__
                - __meta_kubernetes_pod_annotation_prometheus_io_port
                target_label: __address__
              - action: labelmap
                regex: __meta_kubernetes_pod_annotation_prometheus_io_param_(.+)
                replacement: __param_$$1
              - action: labelmap
                regex: __meta_kubernetes_pod_label_(.+)
              - action: replace
                source_labels:
                - __meta_kubernetes_namespace
                target_label: namespace
              - action: replace
                source_labels:
                - __meta_kubernetes_pod_name
                target_label: pod
              - action: drop
                regex: Pending|Succeeded|Failed|Completed
                source_labels:
                - __meta_kubernetes_pod_phase
      extraPipelines:
        metrics/gloo-mgmt:
          receivers:
          - prometheus/gloo-mgmt # Prometheus scrape config for mgmt-server
          processors:
          - memory_limiter
          - batch
          exporters:
          - prometheus # Prometheus deployed by Gloo.
    
  2. Follow the Upgrade instructions in the Gloo Mesh Enterprise documentation to apply the changes in your environment. For multicluster environments, upgrade only the Gloo management server with your updated values file, as no upgrade of the Gloo agent is required.

  3. Verify that the configmap for the telemetry gateway is updated with the values that you set in the values file.

    kubectl get configmap gloo-telemetry-gateway-config -n gloo-mesh -o yaml
    
  4. Perform a rollout restart of the gateway deployment to force your configmap changes to be applied in the telemetry gateway pods.

    kubectl rollout restart -n gloo-mesh deployment/gloo-telemetry-gateway
    

Gloo management server metrics overview

The following metrics are scraped from the Gloo management server and made available to the Gloo telemetry gateway when you enable the Helm configuration in this guide.

Metric Description
gloo_mesh_reconciler_time_sec_bucket The time the Gloo management server needs to sync with the Gloo agents in the workload clusters to apply the translated resources. This metric is captured in seconds for the following intervals (buckets): 1, 2, 5, 10, 15, 30, 50, 80, 100, 200.
gloo_mesh_redis_sync_err The number of times the Gloo mangement server could not read from or write to the Gloo Redis instance.
gloo_mesh_translation_time_sec_bucket The time the Gloo management server needs to translate Gloo resources into Istio, Envoy, or Cilium resources. This metric is captured in seconds for the following intervals (buckets): 1, 2, 5, 10, 15, 20, 25, 30, 45, 60, and 120.
gloo_mesh_translator_concurrency The number of translation operations that the Gloo management server can perform at the same time.
relay_pull_clients_connected The number of Gloo agents that are connected to the Gloo management server.
relay_push_clients_warmed The number of Gloo agents that are connected to the Gloo management server.
solo_io_gloo_gateway_license The number of minutes until the Gloo Gateway license expires. To prevent your management server from crashing when the license expires, make sure to upgrade the license before expiration.
solo_io_gloo_mesh_license The number of minutes until the Gloo Mesh Enterprise license expires. To prevent your management server from crashing when the license expires, make sure to upgrade the license before expiration.
solo_io_gloo_network_license The number of minutes until the Gloo Network license expires. To prevent your management server from crashing when the license expires, make sure to upgrade the license before expiration.
translation_error The number of translation errors that were reported by the Gloo management server.
translation_warning The number of translation warnings that were reported by the Gloo management server.