Skip to content
You are viewing the documentation for Solo Enterprise for Istio, formerly known as Gloo Mesh (OSS APIs).

Forward metrics to OpenShift

Page as Markdown

Forward the metrics from the telemetry gateway and collector agents to the OpenShift Prometheus.

OpenShift comes with built-in Prometheus instances that you can use to monitor metrics for your workloads. Instead of using the built-in Prometheus that Solo Enterprise for Istio provides, you might want to forward the metrics from the telemetry gateway and collector agents to the OpenShift Prometheus instance to have a single observability layer for all of your workloads in the cluster.

Single cluster

  1. Get the current Helm values of your Solo Enterprise for Istio release.

    helm get values gloo-platform -n gloo-mesh -o yaml > gloo-single.yaml
    open gloo-single.yaml
  2. In your Helm values file, expose the otlp-metrics and metrics ports on the collector agent. The otlp-metrics port is used to expose the metrics that were collected by the telemetry collector agent from other workloads in the cluster. The metrics port exposes metrics for the telemetry collector agents themselves.

    
    telemetryCollector:
      enabled: true
      ports:
        otlp-metrics:
          containerPort: 9091
          enabled: true
          protocol: TCP
          servicePort: 9091
        metrics: 
          enabled: true
          containerPort: 8888
          servicePort: 8888
          protocol: TCP
  3. Upgrade your Helm release.

    helm upgrade gloo-platform gloo-platform/gloo-platform \
      --namespace gloo-mesh \
      -f gloo-single.yaml \
      --version ${MGMT_VERSION}
  4. Verify that the telemetry collector deploys successfully.

    kubectl get pods -n gloo-mesh | grep telemetry
  5. Verify that the ports are exposed on the telemetry collector service.

    kubectl get services -n gloo-mesh | grep telemetry
  6. Create a configmap to enable workload monitoring in the cluster.

    kubectl apply -f - <<EOF
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: cluster-monitoring-config
      namespace: openshift-monitoring
    data:
      config.yaml: |
        enableUserWorkload: true
    EOF
  7. Create a service monitor resource to instruct the OpenShift Prometheus to scrape metrics from the telemetry collector agent. The service monitor scrapes metrics from the otlp-metrics and metrics ports that you exposed earlier.

    kubectl apply -f- <<EOF
    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      name: gloo-telemetry-collector-sm
      namespace: gloo-mesh
    spec:
      endpoints:
      - interval: 30s
        port: otlp-metrics
        scheme: http
      - interval: 30s
        port: metrics
        scheme: http
      selector:
        matchLabels:
          app.kubernetes.io/name: telemetryCollector
    EOF
  8. Open the OpenShift web console and select the Administrator view.

  9. Navigate to Observe > Metrics to open the built-in Prometheus expression browser.

  10. Verify that you can see metrics for the telemetrycollector container. For example, you can enter otelcol_exporter_sent_metric_points in the expression browser and verify that these metrics were sent. For an overview of metrics that are exposed, see Default metrics in the pipeline.

Review the Next section for optional steps that might help you use metrics in OpenShift Prometheus.

Multicluster

Management cluster

  1. Get the current Helm values for management plane release.

    helm get values gloo-platform -n gloo-mesh -o yaml --kube-context ${context1} > mgmt-server.yaml
    open mgmt-server.yaml
  2. In your Helm values file for the management plane, expose the otlp-metrics and metrics ports on the telemetry gateway and the metrics port of the telemetry collector agent. The otlp-metrics port is used to expose the metrics that were collected by the telemetry collector agents across connected clusters and sent to the telemetry gateway. The metrics port exposes metrics for the telemetry gateway and collector agents themselves.

    
    telemetryGateway:
      enabled: true
      service:
        type: LoadBalancer
      ports:
        otlp-metrics:
          containerPort: 9091
          enabled: true
          protocol: TCP
          servicePort: 9091
        metrics: 
          enabled: true
          containerPort: 8888
          servicePort: 8888
          protocol: TCP
    telemetryCollector:
      enabled: true
      ports:
        otlp-metrics:
          containerPort: 9091
          enabled: true
          protocol: TCP
          servicePort: 9091       
        metrics: 
          enabled: true
          containerPort: 8888
          servicePort: 8888
          protocol: TCP
  3. Upgrade your management plane Helm release.

    helm upgrade gloo-platform gloo-platform/gloo-platform \
     --kube-context ${context1} \
     --namespace gloo-mesh \
     -f mgmt-server.yaml \
     --version ${MGMT_VERSION}
  4. Verify that the telemetry gateway and collector agents deploy successfully.

    kubectl get pods --context ${context1} -n gloo-mesh | grep telemetry
  5. Verify that the ports are exposed on the telemetry collector and gateway services.

    kubectl get services --context ${context1} -n gloo-mesh | grep telemetry
  6. Create a configmap to enable workload monitoring for the cluster where the management plane is deployed.

    kubectl --context ${context1} apply -f - <<EOF
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: cluster-monitoring-config
      namespace: openshift-monitoring
    data:
      config.yaml: |
        enableUserWorkload: true
    EOF
  7. Create a service monitor resource to instruct the OpenShift Prometheus to scrape metrics from the telemetry gateway. The service monitor scrapes metrics from the otlp-metrics and metrics ports that you exposed earlier.

    kubectl --context ${context1} apply -f- <<EOF
    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      name: gloo-telemetry-gateway-sm
      namespace: gloo-mesh
    spec:
      endpoints:
      - interval: 30s
        port: otlp-metrics
        scheme: http
      - interval: 30s
        port: metrics
        scheme: http
      selector:
        matchLabels:
          app.kubernetes.io/name: telemetryGateway
    EOF
  8. Create another service monitor to scrape metrics from the telemetry collector agent in the cluster where the management plane is deployed.

    kubectl --context ${context1} apply -f- <<EOF
    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      name: gloo-telemetry-collector-sm
      namespace: gloo-mesh
    spec:
      endpoints:
      - interval: 30s
        port: metrics
        scheme: http
      - interval: 30s
        port: otlp-metrics
        scheme: http
      selector:
        matchLabels:
          app.kubernetes.io/name: telemetryCollector
    EOF
  9. Open the OpenShift web console for the management cluster and select the Administrator view.

  10. Navigate to Observe > Metrics to open the built-in Prometheus expression browser.

  11. Verify that you can see metrics for the telemetrygateway and telemetrycollector containers. For example, you can enter otelcol_exporter_sent_metric_points in the expression browser and verify that these metrics were sent from both containers. For an overview of metrics that these two components expose, see Default metrics in the pipeline.

Workload cluster

  1. Get the current Helm values for the data plane release in the connected workload cluster.

    helm get values gloo-platform -n gloo-mesh -o yaml --kube-context ${context2} > data-plane.yaml
    open data-plane.yaml
  2. In your data plane Helm values file, expose the metrics port on the telemetry collector agent. The metrics port exposes metrics for the telemetry collector agents, such as otelcol_exporter_enqueue_failed_metric_points, that you can use to determine whether the connection between the collector agents and the telemetry gateway in the management cluster is healthy.

    
    telemetryCollector:
      enabled: true
      ports:
        metrics: 
          enabled: true
          containerPort: 8888
          servicePort: 8888
          protocol: TCP
  3. Upgrade your Helm release in each connected workload cluster. Be sure to update the cluster context for each workload cluster that you repeat this command for.

    helm upgrade gloo-platform gloo-platform/gloo-platform \
      --kube-context ${context2} \
      --namespace gloo-mesh \
      -f data-plane.yaml \
      --version ${MGMT_VERSION}
  4. Verify that the telemetry collector agents deploy successfully.

    kubectl get pods --context ${context2} -n gloo-mesh | grep telemetry
  5. Verify that the port is exposed on the telemetry collector service.

    kubectl get services --context ${context2} -n gloo-mesh | grep telemetry
  6. Create a service monitor to scrape metrics from the telemetry collector agent in the workload cluster.

    kubectl --context ${context2} apply -f- <<EOF
    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      name: gloo-telemetry-collector-sm
      namespace: gloo-mesh
    spec:
      endpoints:
      - interval: 30s
        port: metrics
        scheme: http
      selector:
        matchLabels:
          app.kubernetes.io/name: telemetryCollector
    EOF
  7. Create a configmap to enable workload monitoring for the workload cluster.

    kubectl --context ${context2} apply -f - <<EOF
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: cluster-monitoring-config
      namespace: openshift-monitoring
    data:
      config.yaml: |
        enableUserWorkload: true
    EOF
  8. Create a service monitor to scrape metrics from the istiod service mesh control plane.

    kubectl --context ${context2} apply -f- <<EOF
    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      name: istiod-http-monitoring
      namespace: istio-system
    spec:
      endpoints:
      - interval: 30s
        port: http-monitoring
        scheme: http
      selector:
        matchLabels:
          app: istiod
    EOF
  9. Open the OpenShift web console for the workload cluster and select the Administrator view.

  10. Navigate to Observe > Metrics to open the built-in Prometheus expression browser.

  11. Verify that you can see metrics for the telemetrycollector containers. For example, you can enter otelcol_exporter_sent_metric_points in the expression browser. For an overview of metrics that are exposed, see Default metrics in the pipeline.

Review the Next section for optional steps that might help you use metrics in OpenShift Prometheus.

Next

Now that you set up metrics to flow from your Solo Enterprise for Istio open telemetry pipeline to the OpenShift Prometheus, review the following options to do more with the metrics.

Gloo UI

You can update the Gloo UI to read metrics from the OpenShift Prometheus instance to populate the Gloo UI graph and other metrics. This way, you can remove the built-in Prometheus instance that Solo Enterprise for Istio provides.

For more information, see Connect the Gloo UI to OpenShift Prometheus.

Alerts

Create alerts with your OpenShift monitoring tools. The following subsections show configuration files for OTel and Istio alerts.

For steps and more information about other metrics, review the following resources:

OTel connectivity alert

You might create alerts to check the connectivity between the OTel gateways in the management cluster and the collectors in the workload clusters. For example, if the otelcol_exporter_queue_size metric increases consistently, this might indicate that the collectors can no longer send traffic to the gateway.

The following example sets up a warning if the OTel collector queue size exceeds 100. Adjust this value based on what is a normal queue size for your environment, so that you get alerted when the normal queue is exceeded. The default queue size is 1,000.

For more information about setting up monitoring for queues, see the OpenTelemetry docs.

Example Prometheus rule:

kubectl --context ${context1} apply -f- <<EOF
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: gloo-platform-otel-queue-is-high
  namespace: gloo-mesh
spec:
  groups:
  - name: GlooPlatformAlerts
    rules:
    - alert: GlooPlatformOTelQueueIsHigh
      for: 15m 
      expr: predict_linear(otelcol_exporter_queue_size[15m], 900) >= 100
      labels:
        severity: warning 
      annotations:
        runbook: https://docs.solo.io/gloo-mesh-enterprise/main/troubleshooting/gloo/telemetry/
        summary: The OTel queue is building up. If the queue size is increasing consistently, this might indicate network issues. Check the connectivity between the gateway and collectors for any issues.
EOF

Istio alert

You might create an alert to monitor the time that it takes for the Istio proxy to get a configuration change. The following example Prometheus rule creates an alert when proxy convergence time is higher than one minute (60 seconds).

For complex environments with thousands of services and many clusters, this translation time might be normal. You can adjust this time based on your environment and expected performance.

Example Prometheus rule:

kubectl --context ${context1} apply -f- <<EOF
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: gloo-platform-istio-proxy-convergence-is-high
  namespace: gloo-mesh
spec:
  groups:
  - name: GlooPlatformAlerts
    rules:
    - alert: GlooPlatformIstioProxyConvergenceIsHigh
      for: 15m 
      expr: histogram_quantile(0.99, sum(rate(pilot_proxy_convergence_time_bucket[1m])) by (le)) > 60
      labels:
        severity: warning 
      annotations:
        runbook: https://docs.solo.io/gloo-mesh-enterprise/main/troubleshooting/service-mesh/istio/
        summary: The Istio proxy is taking more than 60 seconds to get configuration changes.
EOF