On this page

Customization options

Review options to customize the default Prometheus setup.

Bring your own Prometheus

The built-in Prometheus server is the recommended approach for scraping metrics from Gloo components and feeding them to the Gloo UI Graph to visualize workload communication. When you enable the built-in Prometheus during your installation, it is set up with a custom scraping configuration that ensures that only a minimum set of metrics and metric labels are collected.

However, the Prometheus pod is not set up with persistent storage and metrics are lost when the pod restarts or when the deployment is scaled down. Additionally, you might want to replace the built-in Prometheus server and use your organization’s own Prometheus-compatible solution or time series database that is hardened for production and integrates with other applications that might exist outside the cluster where your API Gateway runs. Review the options that you have for bringing your own Prometheus server.

Forward metrics to the built-in Prometheus in OpenShift

OpenShift comes with built-in Prometheus instances that you can use to monitor metrics for your workloads. Instead of using the built-in Prometheus that Gloo Mesh Enterprise provides, you might want to forward the metrics from the telemetry gateway and collector agents to the OpenShift Prometheus to have a single observability layer for all of your workloads in the cluster.

For more information, see Forward metrics to OpenShift.

Replace the built-in Prometheus with your own

If you have an existing Prometheus instance that you want to use in place of the built-in Prometheus server, you configure Gloo Mesh Enterprise to disable the built-in Prometheus instance and to use your production Prometheus instance instead. This setup is a reasonable approach if you want to scrape raw Istio metrics to collect them in your production Prometheus instance. However, you cannot control the number of metrics that you collect, or federate and aggregate the metrics before you scrape them with your production Prometheus.

To query the metrics and compute results, you use the compute resources of the cluster where your production Prometheus instance runs. Note that depending on the number and complexity of the queries that you plan to run in your production Prometheus instance, especially if you use the instance to consolidate metrics of other apps as well, your production instance might get overloaded or start to respond more slowly.

notifications

To have more granular control over the metrics that you want to collect, it is recommended to set up additional receivers, processors, and exporters in the Gloo telemetry pipeline to make these metrics available to the Gloo telemetry gateway. Then, forward these metrics to the third-party solution or time series database of your choice, such as your production Prometheus or Datadog instance. For more information, see the Prometheus receiver and Prometheus exporter OpenTelemetry documentation.

Get your current installation Helm values, and save them in a file.

  helm get values gloo-platform -n gloo-mesh -o yaml > gloo-single.yaml
open gloo-single.yaml

In your Helm values file, disable the default Prometheus instance and instead enter the details of your custom Prometheus server. Make sure that the instance runs Prometheus version 2.16.0 or later. In the prometheusUrl field, enter the Prometheus URL that your instance is exposed on, such as http://kube-prometheus-stack-prometheus.monitoring:9090. You can get this value from the --web.external-url field in your Prometheus Helm values file or by selecting Status > Command-Line-Flags from the Prometheus UI. Do not use the FQDN for the Prometheus URL.
```
  prometheus: 
  enabled: false
common: 
  prometheusUrl: <Prometheus_server_URL_and_port>
  
```

Upgrade your installation by using your updated values file.

  helm upgrade gloo-platform gloo-platform/gloo-platform \
  --namespace gloo-mesh \
  -f gloo-single.yaml \
  --version $GLOO_VERSION

Get your current values for the management cluster.

  helm get values gloo-platform -n gloo-mesh -o yaml --kube-context $MGMT_CONTEXT > mgmt-plane.yaml
open mgmt-plane.yaml

In your Helm values file, disable the default Prometheus instance and instead enter the details of your custom Prometheus server. Make sure that the instance runs Prometheus version 2.16.0 or later. In the prometheusUrl field, enter the Prometheus URL that your instance is exposed on, such as http://kube-prometheus-stack-prometheus.monitoring:9090. You can get this value from the --web.external-url field in your Prometheus Helm values file or by selecting Status > Command-Line-Flags from the Prometheus UI. Do not use the FQDN for the Prometheus URL.
```
  prometheus: 
  enabled: false
common: 
  prometheusUrl: <Prometheus_server_URL_and_port>
  
```

Upgrade the management cluster.

  helm upgrade gloo-platform gloo-platform/gloo-platform \
  --kube-context $MGMT_CONTEXT \
  --namespace gloo-mesh \
  -f mgmt-plane.yaml \
  --version $GLOO_VERSION

Run another Prometheus instance alongside the built-in one

You might want to run multiple Prometheus instances in your cluster that each capture metrics for certain components. For example, you might use the built-in Prometheus server in Gloo Mesh Enterprise to capture metrics for the Gloo components, and use a different Prometheus server for your own apps’ metrics. While this setup is supported, make sure that you check the scraping configuration for each of your Prometheus instances to prevent metrics from being scraped multiple times.

notifications

In Gloo version 2.5.0, the prometheus.io/port: "<port_number>" annotation was removed from the Gloo management server and agent. However, the prometheus.io/scrape: true annotation is still present. If you have another Prometheus instance that runs in your cluster, and it is not set up with custom scraping jobs for the Gloo management server and agent, the instance automatically scrapes all ports on the management server and agent pods. This can lead to error messages in the management server and agent logs. Note that this issue is resolved in version 2.5.2. To resolve this issue in earlier patch versions, you can choose between the following options:

Add the prometheus.io/port: "<port_number>" annotation to the management server and agent pods by using the deployment override option in your Helm chart.

  glooMgmtServer:
  deploymentOverrides:
    spec:
      template:
        metadata:
          annotations:
            prometheus.io/port: 9091 
glooAgent:
  deploymentOverrides:
    spec:
      template:
        metadata:
          annotations:
            prometheus.io/port: 9093

Configure your Prometheus server with the same scraping configuration that the built-in Prometheus server uses to capture metrics from the management server and agents. To get the scraping configuration of the built-in Prometheus, see Default Prometheus setup.

Remove high cardinality labels at creation time

To reduce the amount of data that is collected, you can modify how Istio metrics are recorded at creation time. With this setup, you can remove any unwanted cardinality labels before metrics are scraped by the built-in or your own custom Prometheus server.

report

Make sure to only remove labels that you do not need in any of your production queries, alerts, or dashboards. Removing labels from histograms can significantly reduce cardinality and the amount of data that you collect. For example, you might want to keep all the labels, including the high cardinality labels of the istio_request_duration_milliseconds metric to monitor request latency for your workloads. However, collecting the same high cardinality labels in histograms such as istio_request_bytes_bucket or istio_response_byte_bucket might not be important for your environment. After you apply the Envoy filter, high cardinality labels are permanently removed and cannot be recovered later.

Istio 1.18 and later

Use the Istio Telemetry API to customize how metrics are recorded.

Decide on the metrics that you want to remove labels from. To find an overview of the metric selectors that you can modify, see the Istio metric selector reference. You can start by looking at Istio histogram metrics, also referred to as distribution metrics. Histograms show the frequency distribution of data in a certain timeframe. While these metrics provide great insights and detail, they often come with lots of labels that lead to high cardinality. Note that these metric selectors correspond to the list of Istio Prometheus metrics that are collected. For example, the REQUEST_SIZE selector corresponds to the istio_request_bytes metric.
Decide on the labels that you want to remove from the metrics. For an overview of labels that are collected, see Labels. Note that this page lists the labels with their actual names, which you must specify as underscore-separated names in your Telemetry resource. For example, the “Response Flags” label is specified as response_flags.
Decide which mode of the collected metric that you want to modify. For each metric, the mode that defines how the metric is collected for a workload.
- CLIENT_AND_SERVER: Scenarios in which the workload is either the source or destination of the network traffic.
- CLIENT: Scenarios in which the workload is the source of the network traffic.
- SERVER: Scenarios in which the workload is the destination of the network traffic.

Configure an Istio Telemetry resource to remove specific labels. For example, this resource removes the response_flags label from the istio_request_bytes Prometheus metric by using the REQUEST_SIZE metric selector.

  apiVersion: telemetry.istio.io/v1
kind: Telemetry
metadata:
  name: remove-labels
  namespace: istio-system
spec:
  metrics:
    - providers:
        - name: prometheus
      overrides:
        - match:
            mode: CLIENT_AND_SERVER
            metric: REQUEST_SIZE
          tagOverrides:
            response_flags:
              operation: REMOVE

Istio 1.17 and earlier

Customize the Envoy filter in the Istio proxy deployment to modify how Istio metrics are recorded at creation time.

Decide which context of the Istio Envoy filter you want to modify. Each Istio release includes an Envoy filter that is named stats-filter-<istio_version> and that defines how metrics are collected for a workload. Depending on whether you modify the Envoy filter directly or use the Istio Helm chart to configure the filter, you can choose between the following contexts:
- SIDECAR_INBOUND or inboundSidecar: Used to collect metrics for traffic that is sent to a destination (reporter=destination).
- SIDECAR_OUTBOUND or outboundSidecar: Used to collect metrics for traffic that leaves a microservice (reporter=source).
- GATEWAY or gateway: Used to collect metrics for traffic that passes through the ingress gateway.
Decide on the metric labels you want to remove with your custom Envoy filter. To find an overview of metrics that are collected by default, see the Istio documentation. For an overview of labels that are collected, see Labels. You can start by looking at Istio histogram metrics, also referred to as distribution metrics. Histograms show the frequency distribution of data in a certain timeframe. While these metrics provide great insights and detail, they often come with lots of labels that lead to high cardinality.

Configure your Envoy filter to remove specific labels. To apply the same configuration across all of your Istio microservices, modify the filter in the Istio Helm chart. If you want to update the configuration for a particular workload only, you can patch the Envoy filter instead.

To find the name of the metric that you need to use in your filter configuration, see Metrics. Note that you must remove the istio_ prefix from the metric name before you add it to your filter configuration. For example, if you want to customize the request size metric, use request_bytes. To find an overview of available labels that you can remove, see Labels. Note that this page lists the labels with their actual names and not as the value that you need to provide in the Envoy filter or Helm chart. To find the corresponding label name value, refer to the Istio bootstrap config for your release.

Istio Helm chart: Upgrade your Helm installation and add the Envoy filter configuration.

  helm --kube-context=${CLUSTER1} upgrade --install istio ./istio-/manifests/charts/istio-control/istio-discovery -n istio-system --values - <<EOF
global:
  ...
meshConfig:
  ...
pilot:
  ...
telemetry:
  v2:
    prometheus:
      configOverride:
        outboundSidecar:
          metrics:
          - name: request_bytes
            tags_to_remove:
            - destination_service
            - response_flags
          - name: response_bytes
            tags_to_remove:
            - destination_service
            - response_flags
        inboundSidecar:
          disable_host_header_fallback: true
          metrics:
          - name: request_bytes
            tags_to_remove:
            - destination_service
            - response_flags
          - name: response_bytes
            tags_to_remove:
            - destination_service
            - response_flags
        gateway:
          disable_host_header_fallback: true
          metrics:
          - name: request_bytes
            tags_to_remove:
            - destination_service
            - response_flags
          - name: response_bytes
            tags_to_remove:
            - destination_service
            - response_flags
EOF

Manually patch Envoy config: In the following example, the Envoy filter for the productpage service from the Istio Bookinfo app is modified. All other workloads in the cluster continue to use the default Istio Envoy configuration. Note that this example is specific to Istio version 1.14. If you use a different Istio version, refer to the Istio Envoy documentation.

  apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: stats-filter-1.14-productpage
  namespace: bookinfo-frontends
spec:
  workloadSelector:
    labels:
      app: productpage
      version: v1
  configPatches:
  - applyTo: HTTP_FILTER
    match:
      context: SIDECAR_OUTBOUND
      listener:
        filterChain:
          filter:
            name: envoy.filters.network.http_connection_manager
            subFilter:
              name: envoy.filters.http.router
      proxy:
        proxyVersion: ^1\.14.*
    patch:
      operation: INSERT_BEFORE
      value:
        name: istio.stats
        typed_config:
          '@type': type.googleapis.com/udpa.type.v1.TypedStruct
          type_url: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
          value:
            config:
              configuration:
                '@type': type.googleapis.com/google.protobuf.StringValue
                value: |
                  {"metrics":[{"name":"request_bytes","tags_to_remove":["destination_service","response_flags"]},{"name":"response_bytes","tags_to_remove":["destination_service","response_flags"]}]}
              root_id: stats_outbound
              vm_config:
                code:
                  local:
                    inline_string: envoy.wasm.stats
                runtime: envoy.wasm.runtime.null
                vm_id: stats_outbound
  - applyTo: HTTP_FILTER
    match:
      context: SIDECAR_INBOUND
      listener:
        filterChain:
          filter:
            name: envoy.filters.network.http_connection_manager
            subFilter:
              name: envoy.filters.http.router
      proxy:
        proxyVersion: ^1\.14.*
    patch:
      operation: INSERT_BEFORE
      value:
        name: istio.stats
        typed_config:
          '@type': type.googleapis.com/udpa.type.v1.TypedStruct
          type_url: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
          value:
            config:
              configuration:
                '@type': type.googleapis.com/google.protobuf.StringValue
                value: |
                  {"disable_host_header_fallback":true,"metrics":[{"name":"request_bytes","tags_to_remove":["destination_service","response_flags"]},{"name":"response_bytes","tags_to_remove":["destination_service","response_flags"]}]}
              root_id: stats_inbound
              vm_config:
                code:
                  local:
                    inline_string: envoy.wasm.stats
                runtime: envoy.wasm.runtime.null
                vm_id: stats_inbound

Customization options

Bring your own Prometheus link

Forward metrics to the built-in Prometheus in OpenShift link

Replace the built-in Prometheus with your own link

Run another Prometheus instance alongside the built-in one link

Remove high cardinality labels at creation time link

Istio 1.18 and later link

Istio 1.17 and earlier link