Skip to content
You are viewing the documentation for Solo Enterprise for Istio, formerly known as Gloo Mesh (OSS APIs).

Telemetry pipeline

Page as Markdown

Diagnose and resolve common issues with the Solo UI telemetry pipeline.

If metrics or traces are missing from the Solo UI, the issue is often in the telemetry pipeline. Use the steps on this page to diagnose problems with the solo-enterprise-telemetry-collector pod.

Collector health

Verify that the telemetry collector pod is running and healthy.

kubectl get po -n solo-enterprise

The solo-enterprise-telemetry-collector-0 pod should have a status of Running. If the pod is in a crash loop or not ready, check its logs for errors.

kubectl logs -n solo-enterprise solo-enterprise-telemetry-collector-0

Collector self-metrics

The telemetry collector exposes its own operational metrics on port 8888. These metrics are not ingested into ClickHouse or displayed in the Solo UI, but you can scrape them directly to diagnose pipeline health.

kubectl port-forward -n solo-enterprise solo-enterprise-telemetry-collector-0 8888:8888

Then query the metrics endpoint.

curl http://localhost:8888/metrics

The following metrics are useful for diagnosing common issues.

MetricDescription
otelcol_processor_refused_metric_pointsThe number of metrics refused by a pipeline processor. High values can indicate the memory_limiter is dropping data due to memory pressure.
otelcol_receiver_refused_metric_pointsThe number of metrics refused at the receiver. High values can indicate the collector is overloaded.
otelcol_processor_refused_spansThe number of trace spans refused by the memory_limiter.
otelcol_exporter_queue_sizeThe number of telemetry items currently queued for export.
otelcol_exporter_queue_capacityThe maximum queue size. If otelcol_exporter_queue_size equals or exceeds this value, new data is dropped.
otelcol_exporter_send_failed_spansThe number of spans that failed to export to ClickHouse.

Metrics scraping

The metrics/istio pipeline uses a Prometheus receiver to scrape metrics from istiod, ztunnel, and waypoint proxy pods. Confirm that the target pods are running and have the required annotations.

kubectl get po -A -o json | jq '.items[] | select(.metadata.annotations["prometheus.io/scrape"] == "true") | {name: .metadata.name, namespace: .metadata.namespace, port: .metadata.annotations["prometheus.io/port"]}'

If istiod, ztunnel, or waypoint pods are missing from the output, check that the pods are deployed correctly and that the annotations are present.

Data in ClickHouse

If the collector is running but metrics are not appearing in the Solo UI, verify that ClickHouse is reachable from the collector and that data is being written.

  1. Check that the ClickHouse pod is running.

    kubectl get po -n solo-enterprise

    The solo-management-clickhouse-shard0-0 pod should have a status of Running.

  2. Check the collector logs for ClickHouse export errors.

    kubectl logs -n solo-enterprise solo-enterprise-telemetry-collector-0 | grep -i "clickhouse\|export\|error"

Multicluster telemetry

In multicluster setups, workload cluster collectors send data to the telemetry gateway in the management cluster on port 4316. If data from a workload cluster is missing from the Solo UI, check the relay collector in that cluster.

  1. Verify the relay pod is running in the workload cluster.

    kubectl get po -n solo-enterprise --context ${context2}

    The solo-enterprise-telemetry-collector-0 pod should have a status of Running.

  2. Check that the relay collector can reach the telemetry gateway.

    kubectl logs -n solo-enterprise solo-enterprise-telemetry-collector-0 --context ${context2} | grep -i "error\|refused\|failed"
  3. Verify the telemetry gateway service is globally exposed in the management cluster.

    kubectl get svc solo-enterprise-telemetry-gateway -n solo-enterprise --context ${context1} --show-labels

    The service should have the solo.io/service-scope=global label. If not, apply it.

    kubectl label svc solo-enterprise-telemetry-gateway -n solo-enterprise solo.io/service-scope=global --overwrite --context ${context1}