Forward metrics to OpenShift
Forward the metrics from the Gloo telemetry gateway and collector agents to the OpenShift Prometheus.
OpenShift comes with built-in Prometheus instances that you can use to monitor metrics for your workloads. Instead of using the built-in Prometheus that Gloo Mesh (Gloo Platform APIs) provides, you might want to forward the metrics from the telemetry gateway and collector agents to the OpenShift Prometheus to have a single observability layer for all of your workloads in the cluster.
Single cluster
Get the current values of the Helm release for your Gloo Mesh (Gloo Platform APIs) installation. Note that your Helm release might have a different name.
helm get values gloo-platform -n gloo-mesh -o yaml > gloo-single.yaml open gloo-single.yamlIn your Helm values file, expose the
otlp-metricsandmetricsports on the Gloo collector agent. Theotlp-metricsport is used to expose the metrics that were collected by the telemetry collector agent from other workloads in the cluster. Themetricsport exposes metrics for the Gloo telemetry collector agents themselves.telemetryCollector: enabled: true ports: otlp-metrics: containerPort: 9091 enabled: true protocol: TCP servicePort: 9091 metrics: enabled: true containerPort: 8888 servicePort: 8888 protocol: TCPUpgrade your Helm release. Change the release name as needed.
helm upgrade gloo-platform gloo-platform/gloo-platform \ --namespace gloo-mesh \ -f gloo-single.yaml \ --version $GLOO_VERSIONVerify that the Gloo telemetry collector deploys successfully.
kubectl get pods -n gloo-mesh | grep telemetryVerify that the ports are exposed on the telemetry collector service.
kubectl get services -n gloo-mesh | grep telemetryCreate a configmap to enable workload monitoring in the cluster.
kubectl apply -f - <<EOF apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | enableUserWorkload: true EOFCreate a service monitor resource to instruct the OpenShift Prometheus to scrape metrics from the Gloo telemetry collector agent. The service monitor scrapes metrics from the
otlp-metricsandmetricsports that you exposed earlier.kubectl apply -f- <<EOF apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: gloo-telemetry-collector-sm namespace: gloo-mesh spec: endpoints: - interval: 30s port: otlp-metrics scheme: http - interval: 30s port: metrics scheme: http selector: matchLabels: app.kubernetes.io/name: telemetryCollector EOFOpen the OpenShift web console and select the Administrator view.
Navigate to Observe > Metrics to open the built-in Prometheus expression browser.
Verify that you can see metrics for the
telemetrycollectorcontainer. For example, you can enterotelcol_exporter_sent_metric_pointsin the expression browser and verify that these metrics were sent. For an overview of metrics that are exposed, see Default metrics in the pipeline.
Review the Next section for optional steps that might help you use metrics in OpenShift Prometheus.
Multicluster
Management cluster
Get the current values of the Helm release for the management cluster. Note that your Helm release might have a different name.
helm get values gloo-platform -n gloo-mesh -o yaml --kube-context $MGMT_CONTEXT > mgmt-server.yaml open mgmt-server.yamlIn your Helm values file for the management cluster, expose the
otlp-metricsandmetricsports on the Gloo telemetry gateway and themetricsport of the Gloo telemetry collector agent. Theotlp-metricsport is used to expose the metrics that were collected by the telemetry collector agents across workload clusters and sent to the telemetry gateway. Themetricsport exposes metrics for the Gloo telemetry gateway and collector agents themselves.telemetryGateway: enabled: true service: type: LoadBalancer ports: otlp-metrics: containerPort: 9091 enabled: true protocol: TCP servicePort: 9091 metrics: enabled: true containerPort: 8888 servicePort: 8888 protocol: TCP telemetryCollector: enabled: true ports: otlp-metrics: containerPort: 9091 enabled: true protocol: TCP servicePort: 9091 metrics: enabled: true containerPort: 8888 servicePort: 8888 protocol: TCPUpgrade your Helm release in the management cluster. Change the release name as needed.
helm upgrade gloo-platform gloo-platform/gloo-platform \ --kube-context $MGMT_CONTEXT \ --namespace gloo-mesh \ -f mgmt-server.yaml \ --version $GLOO_VERSIONVerify that the Gloo telemetry gateway and collector agents deploy successfully.
kubectl get pods --context $MGMT_CONTEXT -n gloo-mesh | grep telemetryVerify that the ports are exposed on the telemetry collector and gateway services.
kubectl get services --context $MGMT_CONTEXT -n gloo-mesh | grep telemetryCreate a configmap to enable workload monitoring for the management cluster.
kubectl --context $MGMT_CONTEXT apply -f - <<EOF apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | enableUserWorkload: true EOFCreate a service monitor resource to instruct the OpenShift Prometheus to scrape metrics from the Gloo telemetry gateway. The service monitor scrapes metrics from the
otlp-metricsandmetricsports that you exposed earlier.kubectl --context ${MGMT_CONTEXT} apply -f- <<EOF apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: gloo-telemetry-gateway-sm namespace: gloo-mesh spec: endpoints: - interval: 30s port: otlp-metrics scheme: http - interval: 30s port: metrics scheme: http selector: matchLabels: app.kubernetes.io/name: telemetryGateway EOFCreate another service monitor to scrape metrics from the Gloo telemetry collector agent in the management cluster.
kubectl --context ${MGMT_CONTEXT} apply -f- <<EOF apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: gloo-telemetry-collector-sm namespace: gloo-mesh spec: endpoints: - interval: 30s port: metrics scheme: http - interval: 30s port: otlp-metrics scheme: http selector: matchLabels: app.kubernetes.io/name: telemetryCollector EOFOpen the OpenShift web console for the management cluster and select the Administrator view.
Navigate to Observe > Metrics to open the built-in Prometheus expression browser.
Verify that you can see metrics for the
telemetrygatewayandtelemetrycollectorcontainers. For example, you can enterotelcol_exporter_sent_metric_pointsin the expression browser and verify that these metrics were sent from both containers. For an overview of metrics that these two components expose, see Default metrics in the pipeline.
Workload cluster
Get the current values of the Helm release for the workload cluster. Note that your Helm release might have a different name.
helm get values gloo-platform -n gloo-mesh -o yaml --kube-context $REMOTE_CONTEXT > data-plane.yaml open data-plane.yamlIn your Helm values file for the workload cluster, expose the
metricsport on the Gloo telemetry collector agent. Themetricsport exposes metrics for the Gloo telemetry collector agents, such asotelcol_exporter_enqueue_failed_metric_points, that you can use to determine whether the connection between the collector agents and the telemetry gateway in the management cluster is healthy.telemetryCollector: enabled: true ports: metrics: enabled: true containerPort: 8888 servicePort: 8888 protocol: TCPUpgrade your Helm release in each workload cluster. Change the release name as needed. Be sure to update the cluster context for each workload cluster that you repeat this command for.
helm upgrade gloo-platform gloo-platform/gloo-platform \ --kube-context $REMOTE_CONTEXT \ --namespace gloo-mesh \ -f data-plane.yaml \ --version $GLOO_VERSIONVerify that the Gloo telemetry collector agents deploy successfully.
kubectl get pods --context $REMOTE_CONTEXT -n gloo-mesh | grep telemetryVerify that the port is exposed on the telemetry collector service.
kubectl get services --context $REMOTE_CONTEXT -n gloo-mesh | grep telemetryCreate a service monitor to scrape metrics from the Gloo telemetry collector agent in the management cluster.
kubectl --context ${REMOTE_CONTEXT} apply -f- <<EOF apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: gloo-telemetry-collector-sm namespace: gloo-mesh spec: endpoints: - interval: 30s port: metrics scheme: http selector: matchLabels: app.kubernetes.io/name: telemetryCollector EOFCreate a configmap to enable workload monitoring for the workload cluster.
kubectl --context $REMOTE_CONTEXT apply -f - <<EOF apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | enableUserWorkload: true EOFCreate a service monitor to scrape metrics from the istiod service mesh control plane.
kubectl --context ${REMOTE_CONTEXT1} apply -f- <<EOF apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: istiod-http-monitoring namespace: istio-system spec: endpoints: - interval: 30s port: http-monitoring scheme: http selector: matchLabels: app: istiod EOFOpen the OpenShift web console for the workload cluster and select the Administrator view.
Navigate to Observe > Metrics to open the built-in Prometheus expression browser.
Verify that you can see metrics for the
telemetrycollectorcontainers. For example, you can enterotelcol_exporter_sent_metric_pointsin the expression browser. For an overview of metrics that are exposed, see Default metrics in the pipeline.Optional: Repeat these steps for each workload cluster.
Review the Next section for optional steps that might help you use metrics in OpenShift Prometheus.
Next
Now that you set up metrics to flow from your Gloo Mesh (Gloo Platform APIs) open telemetry pipeline to the OpenShift Prometheus, review the following options to do more with the metrics.
Gloo UI
You can update the Gloo UI to read metrics from the OpenShift Prometheus instance to populate the Gloo UI graph and other metrics. This way, you can remove the built-in Prometheus instance that Gloo Mesh (Gloo Platform APIs) provides.
For more information, see Connect the Gloo UI to OpenShift Prometheus.
Alerts
Create alerts with your OpenShift monitoring tools. The following subsections show configuration files for Gloo, OTel, and Istio alerts.
For steps and more information about other metrics, review the following resources:
- For more metrics that you might create alerts on, see the following resources:
- For steps to create and manage alerts, see the OpenShift monitoring docs.
Translation alert
You might create an alert to monitor the translation time based on Gloo Mesh (Gloo Platform APIs) metrics. The following example Prometheus rule creates an alert when translation time is higher than ten seconds (10).
For complex environments with thousands of services and many clusters, this translation time might be normal. You can adjust this time based on your environment and expected performance. For expected translation and other scalability thresholds, see the scalability docs.
Example Prometheus rule:
kubectl --context ${MGMT_CONTEXT} apply -f- <<EOF
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: gloo-platform-translation-latency-is-high
namespace: gloo-mesh
spec:
groups:
- name: GlooPlatformAlerts
rules:
- alert: GlooPlatformTranslationLatencyIsHigh
for: 15m
expr: histogram_quantile(0.99, sum(rate(gloo_mesh_translation_time_sec_bucket[5m])) by(le)) > 10
labels:
severity: warning
annotations:
runbook: https://docs.solo.io/gloo-mesh-enterprise/main/troubleshooting/gloo/server-relay/
summary: The translation time exceeds 10 seconds.
EOF
OTel connectivity alert
You might create alerts to check the connectivity between the OTel gateways in the management cluster and the collectors in the workload clusters. For example, if the otelcol_exporter_queue_size metric increases consistently, this might indicate that the collectors can no longer send traffic to the gateway.
The following example sets up a warning if the OTel collector queue size exceeds 100. Adjust this value based on what is a normal queue size for your environment, so that you get alerted when the normal queue is exceeded. The default queue size is 1,000.
For more information about setting up monitoring for queues, see the OpenTelemetry docs.
Example Prometheus rule:
kubectl --context ${MGMT_CONTEXT} apply -f- <<EOF
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: gloo-platform-otel-queue-is-high
namespace: gloo-mesh
spec:
groups:
- name: GlooPlatformAlerts
rules:
- alert: GlooPlatformOTelQueueIsHigh
for: 15m
expr: predict_linear(otelcol_exporter_queue_size[15m], 900) >= 100
labels:
severity: warning
annotations:
runbook: https://docs.solo.io/gloo-mesh-enterprise/main/troubleshooting/gloo/telemetry/
summary: The Gloo OTel queue is building up. If the queue size is increasing consistently, this might indicate network issues. Check the connectivity between the gateway and collectors for any issues.
EOF
Istio alert
You might create an alert to monitor the time that it takes for the Istio proxy to get a configuration change. The following example Prometheus rule creates an alert when proxy convergence time is higher than one minute (60 seconds).
For complex environments with thousands of services and many clusters, this translation time might be normal. You can adjust this time based on your environment and expected performance. For expected translation and other scalability thresholds, see the scalability docs.
Example Prometheus rule:
kubectl --context ${MGMT_CONTEXT} apply -f- <<EOF
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: gloo-platform-istio-proxy-convergence-is-high
namespace: gloo-mesh
spec:
groups:
- name: GlooPlatformAlerts
rules:
- alert: GlooPlatformIstioProxyConvergenceIsHigh
for: 15m
expr: histogram_quantile(0.99, sum(rate(pilot_proxy_convergence_time_bucket[1m])) by (le)) > 60
labels:
severity: warning
annotations:
runbook: https://docs.solo.io/gloo-mesh-enterprise/main/troubleshooting/service-mesh/istio/
summary: The Istio proxy is taking more than 60 seconds to get configuration changes.
EOF