Customization options
The built-in Prometheus server is a great way to gain insight into the network traffic that enters Gloo Gateway. However, the pod is not set up with persistent storage and metrics are lost when the pod restarts or when the deployment is scaled down. A lot of organizations also run their own Prometheus-compatible solution or time series database that is hardened for production and integrates with other applications that might exist outside of the cluster.
To build a production-level Prometheus setup, you can choose between the following options:
- Replace the built-in Prometheus server with your own instance
- Remove high cardinality labels at creation time
To read more about each option, see Customization options.
Replace the built-in Prometheus server with your own instance
In this setup, you disable the built-in Prometheus server and configure Gloo to use your production Prometheus instance instead.
-
Configure Gloo to disable the default Prometheus instance and instead connect to your custom Prometheus server. Make sure that the instance runs Prometheus version 2.16.0 or later. In the
prometheusUrl
field, enter the Prometheus URL that your instance is exposed on, such ashttp://kube-prometheus-stack-prometheus.monitoring:9090
. You can get this value from the--web.external-url
field in your Prometheus Helm values file or by selecting Status > Command-Line-Flags from the Prometheus UI. Do not use the FQDN for the Prometheus URL.helm upgrade --install gloo-platform gloo-platform/gloo-platform \ --namespace gloo-mesh \ --version $GLOO_VERSION \ --values gloo-gateway-single.yaml \ --set common.cluster=$CLUSTER_NAME \ --set licensing.glooGatewayLicenseKey=$GLOO_GATEWAY_LICENSE_KEY \ --set prometheus.enabled=false \ --set common.prometheusUrl=<Prometheus_server_URL_and_port>
If you installed Gloo Gateway using the
gloo-mesh-enterpise
,gloo-mesh-agent
, and other included Helm charts, or by usingmeshctl
version 2.2 or earlier, these Helm charts are considered legacy. Migrate your legacy installation to the newgloo-platform
Helm chart.helm upgrade --install gloo-mgmt gloo-mesh-enterprise/gloo-mesh-enterprise \ --namespace gloo-mesh \ --version $GLOO_VERSION \ --values values-mgmt-plane-env.yaml \ --set prometheus.enabled=false \ --set prometheusUrl=<Prometheus_server_URL_and_port> \ --set glooGatewayLicenseKey=${GLOO_GATEWAY_LICENSE_KEY} \ --set global.cluster=$CLUSTER_NAME
Make sure to include your Helm values when you upgrade either as a configuration file in the
–values
flag or with–set
flags. Otherwise, any previous custom values that you set might be overwritten. In single cluster setups, this might mean that your Gloo agent and ingress gateways are removed. To get your current values, such as for a release namedgloo-platform
, you can runhelm get values gloo-platform -n gloo-mesh > gloo-gateway-single.yaml
. For more information, see Get your Helm chart values in the upgrade guide. -
Configure your Prometheus server to scrape metrics from the Gloo management server endpoint
gloo-mesh-mgmt-server-admin.gloo-mesh:9091
. This setup might vary depending on the Prometheus server that you use. For example, if you use the Prometheus Community Chart, update the Helmvalues.yaml
file as follows to scrape metrics from the Gloo management server.serverFiles: prometheus.yml: scrape_configs: - job_name: gloo-mesh scrape_interval: 15s scrape_timeout: 10s static_configs: - targets: - gloo-mesh-mgmt-server-admin.gloo-mesh:9091
Remove high cardinality labels at creation time
With metrics federation, you can use recording rules to pre-compute frequently used metrics and reduce high cardinality labels before metrics are forwarded to an external Prometheus-compatible solution. The raw labels and metric dimensions are still available in the built-in Prometheus server and can be accessed if needed. –>
To reduce the amount of data that is collected, you can customize the Envoy filter in the Istio proxy deployment to modify how Istio metrics are recorded at creation time. With this setup, you can remove any unwanted cardinality labels before metrics are scraped by the built-in Prometheus server.
Make sure to only remove labels that you do not need in any of your production queries, alerts, or dashboards. After you apply the Envoy filter, high cardinality labels are permanently removed and cannot be recovered later.
-
Decide which context of the Istio Envoy filter you want to modify. Each Istio release includes an Envoy filter that is named
stats-filter-<istio_version>
and that defines how metrics are collected for a workload. Depending on whether you modify the Envoy filter directly or use the Istio Helm chart to configure the filter, you can choose between the following contexts:SIDECAR_INBOUND
orinboundSidecar
: Used to collect metrics for traffic that is sent to a destination (reporter=destination).SIDECAR_OUTBOUND
oroutboundSidecar
: Used to collect metrics for traffic that leaves a microservice (reporter=source).GATEWAY
orgateway
: Used to collect metrics for traffic that passes through the ingress gateway.
-
Decide on the metric labels you want to remove with your custom Envoy filter. To find an overview of metrics that are collected by default, see the Istio documentation. For an overview of labels that are collected, see Labels. You can start by looking at Istio histogram metrics, also referred to as distribution metrics. Histograms show the frequency distribution of data in a certain timeframe. While these metrics provide great insights and detail, they often come with lots of labels that lead to high cardinality.
Removing labels from histograms can significantly reduce cardinality and the amount of data that you collect. For example, you might want to keep all the labels, including the high cardinality labels of the
istio_request_duration_milliseconds
metric to monitor request latency for your workloads. However, collecting the same high cardinality labels in histograms such asistio_request_bytes_bucket
oristio_response_byte_bucket
might not be important for your environment. -
Configure your Envoy filter to remove specific labels. To apply the same configuration across all of your Istio microservices, modify the filter in the Istio Helm chart. If you want to update the configuration for a particular workload only, you can patch the Envoy filter instead.
To find the name of the metric that you need to use in your filter configuration, see Metrics. Note that you must remove the
istio_
prefix from the metric name before you add it to your filter configuration. For example, if you want to customize the request size metric, userequest_bytes
. To find an overview of available labels that you can remove, see Labels. Note that this page lists the labels with their actual names and not as the value that you need to provide in the Envoy filter or Helm chart. To find the corresponding label name value, refer to the Istio bootstrap config for your release.Upgrade your Helm installation and add the Envoy filter configuration.
helm --kube-context=${CLUSTER1} upgrade --install istio-1.18.3 ./istio-1.18.3/manifests/charts/istio-control/istio-discovery -n istio-system --values - <<EOF global: ... meshConfig: ... pilot: ... telemetry: v2: prometheus: configOverride: outboundSidecar: metrics: - name: request_bytes tags_to_remove: - destination_service - response_flags - name: response_bytes tags_to_remove: - destination_service - response_flags inboundSidecar: disable_host_header_fallback: true metrics: - name: request_bytes tags_to_remove: - destination_service - response_flags - name: response_bytes tags_to_remove: - destination_service - response_flags gateway: disable_host_header_fallback: true metrics: - name: request_bytes tags_to_remove: - destination_service - response_flags - name: response_bytes tags_to_remove: - destination_service - response_flags EOF
In the following example, the Envoy filter for the productpage service from the Istio Bookinfo app is modified. All other workloads in the cluster continue to use the default Istio Envoy configuration. Note that this example is specific to Istio version 1.14. If you use a different Istio version, refer to the Istio Envoy documentation.
apiVersion: networking.istio.io/v1alpha3 kind: EnvoyFilter metadata: name: stats-filter-1.14-productpage namespace: bookinfo-frontends spec: workloadSelector: labels: app: productpage version: v1 configPatches: - applyTo: HTTP_FILTER match: context: SIDECAR_OUTBOUND listener: filterChain: filter: name: envoy.filters.network.http_connection_manager subFilter: name: envoy.filters.http.router proxy: proxyVersion: ^1\.14.* patch: operation: INSERT_BEFORE value: name: istio.stats typed_config: '@type': type.googleapis.com/udpa.type.v1.TypedStruct type_url: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm value: config: configuration: '@type': type.googleapis.com/google.protobuf.StringValue value: | {"metrics":[{"name":"request_bytes","tags_to_remove":["destination_service","response_flags"]},{"name":"response_bytes","tags_to_remove":["destination_service","response_flags"]}]} root_id: stats_outbound vm_config: code: local: inline_string: envoy.wasm.stats runtime: envoy.wasm.runtime.null vm_id: stats_outbound - applyTo: HTTP_FILTER match: context: SIDECAR_INBOUND listener: filterChain: filter: name: envoy.filters.network.http_connection_manager subFilter: name: envoy.filters.http.router proxy: proxyVersion: ^1\.14.* patch: operation: INSERT_BEFORE value: name: istio.stats typed_config: '@type': type.googleapis.com/udpa.type.v1.TypedStruct type_url: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm value: config: configuration: '@type': type.googleapis.com/google.protobuf.StringValue value: | {"disable_host_header_fallback":true,"metrics":[{"name":"request_bytes","tags_to_remove":["destination_service","response_flags"]},{"name":"response_bytes","tags_to_remove":["destination_service","response_flags"]}]} root_id: stats_inbound vm_config: code: local: inline_string: envoy.wasm.stats runtime: envoy.wasm.runtime.null vm_id: stats_inbound