Skip to content
You are viewing the documentation for Solo Enterprise for Istio, formerly known as Gloo Mesh (OSS APIs).

Prometheus

Page as Markdown

Learn how to deploy a standalone Prometheus instance and configure it to scrape Istio metrics from your cluster.

The built-in telemetry pipeline writes a curated set of Istio metrics to ClickHouse. If you want access to the full set of Istio metrics, persistent metric storage that survives pod restarts, or integration with an existing Prometheus stack, you can deploy a standalone Prometheus instance alongside the Solo UI.

Deploy Prometheus

If you do not have an existing Prometheus instance, use the Prometheus community Helm chart to deploy Prometheus in your cluster.

  1. Add the Prometheus community Helm chart repository.

    helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
    helm repo update
  2. Install the kube-prometheus-stack chart.

    helm upgrade --install kube-prometheus-stack \
      prometheus-community/kube-prometheus-stack \
      --namespace monitoring \
      --create-namespace
  3. Verify that the pods are running.

    kubectl get pods -n monitoring

Configure scrape targets

Add the following scrape jobs to your Prometheus configuration. These jobs target the same Istio components as the built-in telemetry pipeline: the istiod control plane, ztunnel daemonset, and waypoint and east-west gateway pods.

scrape_configs:
  # Scrape istiod control plane metrics
  - job_name: istiod
    honor_labels: true
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - action: keep
        regex: pilot|istiod
        source_labels:
          - __meta_kubernetes_pod_label_istio
      - action: keep
        regex: "true"
        source_labels:
          - __meta_kubernetes_pod_annotation_prometheus_io_scrape
      - action: replace
        regex: (https?)
        source_labels:
          - __meta_kubernetes_pod_annotation_prometheus_io_scheme
        target_label: __scheme__
      - action: replace
        regex: (.+)
        source_labels:
          - __meta_kubernetes_pod_annotation_prometheus_io_path
        target_label: __metrics_path__
      - action: replace
        regex: (\d+);(([A-Fa-f0-9]{1,4}::?){1,7}[A-Fa-f0-9]{1,4})
        replacement: '[$2]:$1'
        source_labels:
          - __meta_kubernetes_pod_annotation_prometheus_io_port
          - __meta_kubernetes_pod_ip
        target_label: __address__
      - action: replace
        regex: (\d+);((([0-9]+?)(\.|$)){4})
        replacement: $2:$1
        source_labels:
          - __meta_kubernetes_pod_annotation_prometheus_io_port
          - __meta_kubernetes_pod_ip
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_pod_annotation_prometheus_io_param_(.+)
        replacement: __param_$1
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - action: replace
        source_labels:
          - __meta_kubernetes_namespace
        target_label: namespace
      - action: replace
        source_labels:
          - __meta_kubernetes_pod_name
        target_label: pod_name
      - action: drop
        regex: Pending|Succeeded|Failed|Completed
        source_labels:
          - __meta_kubernetes_pod_phase

  # Scrape ztunnel metrics (ambient mode L4 proxy)
  - job_name: ztunnel
    honor_labels: true
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - action: keep
        regex: ztunnel
        source_labels:
          - __meta_kubernetes_pod_label_app
      - action: keep
        regex: "true"
        source_labels:
          - __meta_kubernetes_pod_annotation_prometheus_io_scrape
      - action: replace
        regex: (https?)
        source_labels:
          - __meta_kubernetes_pod_annotation_prometheus_io_scheme
        target_label: __scheme__
      - action: replace
        regex: (.+)
        source_labels:
          - __meta_kubernetes_pod_annotation_prometheus_io_path
        target_label: __metrics_path__
      - action: replace
        regex: (\d+);(([A-Fa-f0-9]{1,4}::?){1,7}[A-Fa-f0-9]{1,4})
        replacement: '[$2]:$1'
        source_labels:
          - __meta_kubernetes_pod_annotation_prometheus_io_port
          - __meta_kubernetes_pod_ip
        target_label: __address__
      - action: replace
        regex: (\d+);((([0-9]+?)(\.|$)){4})
        replacement: $2:$1
        source_labels:
          - __meta_kubernetes_pod_annotation_prometheus_io_port
          - __meta_kubernetes_pod_ip
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_pod_annotation_prometheus_io_param_(.+)
        replacement: __param_$1
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - action: replace
        source_labels:
          - __meta_kubernetes_namespace
        target_label: namespace
      - action: replace
        source_labels:
          - __meta_kubernetes_pod_name
        target_label: pod_name
      - action: drop
        regex: Pending|Succeeded|Failed|Completed
        source_labels:
          - __meta_kubernetes_pod_phase

  # Scrape waypoint and east-west gateway metrics
  - job_name: gateway
    honor_labels: true
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - action: keep
        regex: .+
        source_labels:
          - __meta_kubernetes_pod_label_gateway_networking_k8s_io_gateway_name
      - action: keep
        regex: "true"
        source_labels:
          - __meta_kubernetes_pod_annotation_prometheus_io_scrape
      - action: replace
        regex: (https?)
        source_labels:
          - __meta_kubernetes_pod_annotation_prometheus_io_scheme
        target_label: __scheme__
      - action: replace
        regex: (.+)
        source_labels:
          - __meta_kubernetes_pod_annotation_prometheus_io_path
        target_label: __metrics_path__
      - action: replace
        regex: (\d+);(([A-Fa-f0-9]{1,4}::?){1,7}[A-Fa-f0-9]{1,4})
        replacement: '[$2]:$1'
        source_labels:
          - __meta_kubernetes_pod_annotation_prometheus_io_port
          - __meta_kubernetes_pod_ip
        target_label: __address__
      - action: replace
        regex: (\d+);((([0-9]+?)(\.|$)){4})
        replacement: $2:$1
        source_labels:
          - __meta_kubernetes_pod_annotation_prometheus_io_port
          - __meta_kubernetes_pod_ip
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_pod_annotation_prometheus_io_param_(.+)
        replacement: __param_$1
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - action: replace
        source_labels:
          - __meta_kubernetes_namespace
        target_label: namespace
      - action: replace
        source_labels:
          - __meta_kubernetes_pod_name
        target_label: pod_name
      - action: drop
        regex: Pending|Succeeded|Failed|Completed
        source_labels:
          - __meta_kubernetes_pod_phase

To apply this configuration, add the scrape_configs block to your Prometheus ConfigMap or PrometheusSpec resource. If you installed with kube-prometheus-stack, pass the scrape jobs under prometheus.prometheusSpec.additionalScrapeConfigs in your Helm values.

Collected metrics

Unlike the built-in pipeline, which filters to a curated set of metrics before writing to ClickHouse, a standalone Prometheus instance collects the full set of metrics exposed by each target. This includes all standard Istio proxy, ztunnel, waypoint, and istiod metrics, as well as any custom metrics exposed by your workloads.

For a description of the metrics that the built-in pipeline curates, see Metrics.

Query metrics

  1. Port-forward to the Prometheus server. If you installed with kube-prometheus-stack, the service is in the monitoring namespace.

    kubectl port-forward svc/kube-prometheus-stack-prometheus -n monitoring 9090:9090
  2. Open http://localhost:9090 in your browser and use the Graph tab to run queries.

    The following examples show common queries for Istio metrics.

    Request rate per destination service over the last 5 minutes.

    sum(rate(istio_requests_total[5m])) by (destination_service_name, namespace)

    Proportion of 5xx responses per destination service.

    sum(rate(istio_requests_total{response_code=~"5.."}[5m])) by (destination_service_name, namespace)
    /
    sum(rate(istio_requests_total[5m])) by (destination_service_name, namespace)

    P99 request latency in milliseconds per destination service.

    histogram_quantile(0.99,
      sum(rate(istio_request_duration_milliseconds_bucket[5m])) by (destination_service_name, namespace, le)
    )

    Istiod version and build information.

    pilot_info

    Ztunnel endpoints currently marked unhealthy.

    istio_outlier_detection_endpoints_unhealthy > 0

Next steps

After Prometheus is collecting Istio metrics, you can visualize them with Grafana. For steps on importing pre-built Istio dashboards, see Grafana.