Troubleshoot issues
If you run into an issue with the Gloo OTel pipeline, you can use the following resources to start debugging the issue:
- Change the default log level
- Changes in the metrics or collector agent configmap are not applied
- Monitor the health of receivers, exporters, and processors in the Gloo operations dashboard
In addition, check out the following resources from the upstream OpenTelemetry project:
- Upstream OpenTelemetry troubleshooting guide
- Recommended metrics and alerts to monitor the pipeline's health
Change the default log level
A good practice to start troubleshooting issues in your pipeline and to inspect the data that is processed by your collectors is to change your pipeline log level to debug mode. The debug log level gives detailed information about the data that is received, processed, and exported by your pipeline.
-
Add the following settings to your values file for the
gloo-mesh-enterprise
Helm chart.metricscollectorCustomization: service: telemetry: logs: level: "debug"
-
Follow the Upgrade Gloo Mesh Enterprise guide to apply the changes in your environment. If you want to apply this change to the Gloo metrics gateway, upgrade your management cluster. To apply the changes in your Gloo OTel collector agent pods, you must upgrade the Gloo agents in your workload clusters.
-
Verify that the configmap for the metrics gateway or collector agent pods is updated with the values you set in the values file.
kubectl get configmap gloo-metrics-gateway-config -n gloo-mesh --context $MGMT_CONTEXT -o yaml kubectl get configmap gloo-metrics-collector-config -n gloo-mesh --context $REMOTE_CONTEXT -o yaml
-
Perform a rollout restart of the gateway deployment or the collector daemon set to force your configmap changes to be applied in the metrics gateway or collector agent pods.
kubectl rollout restart -n gloo-mesh deployment/gloo-metrics-gateway --context $MGMT_CONTEXT
kubectl rollout restart -n gloo-mesh daemonset/gloo-metrics-collector-agent --context $REMOTE_CONTEXT
Changes in the metrics or collector agent configmap are not applied
The upstream OpenTelemetry project currently does not support reloading configuration changes dynamically and applying them in the gateway or collector agent pods. If you updated the configmap of the metrics gateway or the collector agents, and the changes are not applied in the respective pods, you must perform a rollout restart of the gateway deployment or collector daemon set to apply the new changes.
kubectl rollout restart -n gloo-mesh deployment/gloo-metrics-gateway --context ${MGMT_CONTEXT}
kubectl rollout restart -n gloo-mesh daemonset/gloo-metrics-collector-agent --context ${REMOTE_CONTEXT}
Monitor the health of receivers, exporters, and processors in the Gloo operations dashboard
The Gloo OpenTelemetry pipeline comes with built-in metrics that you can use to monitor the health of your pipeline. Pipeline metrics are automatically populated to the Gloo operations dashboard.
- Set up and open the Gloo operations dashboard.
- In the
Gloo Telemetry Pipeline
card, look for increasing timeouts, failures, or other errors for any of the pipeline receivers, processors, or exporters.
For an overview of recommended metrics and alerts to monitor the pipeline's health, see the OpenTelemetry documentation.