Troubleshoot issues

If you run into an issue with the Gloo OTel pipeline, you can use the following resources to start debugging the issue:

In addition, check out the following resources from the upstream OpenTelemetry project:

Change the default log level

A good practice to start troubleshooting issues in your pipeline and to inspect the data that is processed by your collectors is to change your pipeline log level to debug mode. The debug log level gives detailed information about the data that is received, processed, and exported by your pipeline.

  1. Add the following log level settings to your Helm values file.

    • gloo-mesh-enterprise Helm chart:
      metricsgatewayCustomization:
        service:
          telemetry:
            logs:
              level: "debug"
      
    • gloo-mesh-agent Helm chart:
      metricscollectorCustomization:
        service:
          telemetry:
            logs:
              level: "debug"
      
  2. Follow the Upgrade Gloo Mesh Enterprise guide to apply the changes in your environment. If you want to apply this change to the Gloo metrics gateway, upgrade your management cluster with the gloo-mesh-enterprise Helm chart. To apply the changes in your Gloo OTel collector agent pods, you must upgrade the Gloo agents in your workload clusters with the gloo-mesh-agent Helm chart.

  3. Verify that the configmap for the metrics gateway or collector agent pods is updated with the values you set in the values file.

    kubectl get configmap gloo-metrics-gateway-config -n gloo-mesh --context $MGMT_CONTEXT -o yaml
    kubectl get configmap gloo-metrics-collector-config -n gloo-mesh --context $REMOTE_CONTEXT -o yaml
    
  4. Perform a rollout restart of the gateway deployment or the collector daemon set to force your configmap changes to be applied in the metrics gateway or collector agent pods.

    kubectl rollout restart -n gloo-mesh deployment/gloo-metrics-gateway --context $MGMT_CONTEXT
    
    kubectl rollout restart -n gloo-mesh daemonset/gloo-metrics-collector-agent --context $REMOTE_CONTEXT
    

Changes in the metrics or collector agent configmap are not applied

The upstream OpenTelemetry project currently does not support reloading configuration changes dynamically and applying them in the gateway or collector agent pods. If you updated the configmap of the metrics gateway or the collector agents, and the changes are not applied in the respective pods, you must perform a rollout restart of the gateway deployment or collector daemon set to apply the new changes.

kubectl rollout restart -n gloo-mesh deployment/gloo-metrics-gateway --context ${MGMT_CONTEXT}
kubectl rollout restart -n gloo-mesh daemonset/gloo-metrics-collector-agent --context ${REMOTE_CONTEXT}

Monitor the health of receivers, exporters, and processors in the Gloo operations dashboard

The Gloo OpenTelemetry pipeline comes with built-in metrics that you can use to monitor the health of your pipeline. Pipeline metrics are automatically populated to the Gloo operations dashboard.

  1. Set up and open the Gloo operations dashboard.
  2. In the Gloo Telemetry Pipeline card, look for increasing timeouts, failures, or other errors for any of the pipeline receivers, processors, or exporters.

For an overview of recommended metrics and alerts to monitor the pipeline's health, see the OpenTelemetry documentation.