Telemetry pipeline
Diagnose and resolve common issues with the Solo UI telemetry pipeline.
If metrics or traces are missing from the Solo UI, the issue is often in the telemetry pipeline. Use the steps on this page to diagnose problems with the solo-enterprise-telemetry-collector pod.
Collector health
Verify that the telemetry collector pod is running and healthy.
kubectl get po -n solo-enterpriseThe solo-enterprise-telemetry-collector-0 pod should have a status of Running. If the pod is in a crash loop or not ready, check its logs for errors.
kubectl logs -n solo-enterprise solo-enterprise-telemetry-collector-0Collector self-metrics
The telemetry collector exposes its own operational metrics on port 8888. These metrics are not ingested into ClickHouse or displayed in the Solo UI, but you can scrape them directly to diagnose pipeline health.
kubectl port-forward -n solo-enterprise solo-enterprise-telemetry-collector-0 8888:8888Then query the metrics endpoint.
curl http://localhost:8888/metricsThe following metrics are useful for diagnosing common issues.
| Metric | Description |
|---|---|
otelcol_processor_refused_metric_points | The number of metrics refused by a pipeline processor. High values can indicate the memory_limiter is dropping data due to memory pressure. |
otelcol_receiver_refused_metric_points | The number of metrics refused at the receiver. High values can indicate the collector is overloaded. |
otelcol_processor_refused_spans | The number of trace spans refused by the memory_limiter. |
otelcol_exporter_queue_size | The number of telemetry items currently queued for export. |
otelcol_exporter_queue_capacity | The maximum queue size. If otelcol_exporter_queue_size equals or exceeds this value, new data is dropped. |
otelcol_exporter_send_failed_spans | The number of spans that failed to export to ClickHouse. |
Metrics scraping
The metrics/istio pipeline uses a Prometheus receiver to scrape metrics from istiod, ztunnel, and waypoint proxy pods. Confirm that the target pods are running and have the required annotations.
kubectl get po -A -o json | jq '.items[] | select(.metadata.annotations["prometheus.io/scrape"] == "true") | {name: .metadata.name, namespace: .metadata.namespace, port: .metadata.annotations["prometheus.io/port"]}'If istiod, ztunnel, or waypoint pods are missing from the output, check that the pods are deployed correctly and that the annotations are present.
Data in ClickHouse
If the collector is running but metrics are not appearing in the Solo UI, verify that ClickHouse is reachable from the collector and that data is being written.
Check that the ClickHouse pod is running.
kubectl get po -n solo-enterpriseThe
solo-management-clickhouse-shard0-0pod should have a status ofRunning.Check the collector logs for ClickHouse export errors.
kubectl logs -n solo-enterprise solo-enterprise-telemetry-collector-0 | grep -i "clickhouse\|export\|error"
Multicluster telemetry
In multicluster setups, workload cluster collectors send data to the telemetry gateway in the management cluster on port 4316. If data from a workload cluster is missing from the Solo UI, check the relay collector in that cluster.
Verify the relay pod is running in the workload cluster.
kubectl get po -n solo-enterprise --context ${context2}The
solo-enterprise-telemetry-collector-0pod should have a status ofRunning.Check that the relay collector can reach the telemetry gateway.
kubectl logs -n solo-enterprise solo-enterprise-telemetry-collector-0 --context ${context2} | grep -i "error\|refused\|failed"Verify the telemetry gateway service is globally exposed in the management cluster.
kubectl get svc solo-enterprise-telemetry-gateway -n solo-enterprise --context ${context1} --show-labelsThe service should have the
solo.io/service-scope=globallabel. If not, apply it.kubectl label svc solo-enterprise-telemetry-gateway -n solo-enterprise solo.io/service-scope=global --overwrite --context ${context1}