Metrics

Review default metrics that are available in Prometheus so that you can monitor the health of Gloo Mesh Enterprise components and Istio workloads.

View metrics

To view all metrics that are available in Prometheus, follow these steps:

  1. Port-forward the Prometheus pod in your cluster.

    meshctl proxy prometheus
    

    Port-forward the prometheus-server deployment on 9091.

    kubectl -n gloo-mesh port-forward deploy/prometheus-server 9091
    

  2. Open the Prometheus expression browser to run PromQL queries on metrics.

Default metrics

If you followed the Get started guide to install Gloo Mesh Enterprise, the Gloo telemetry pipeline is automatically set up for you. The Gloo telemetry pipeline collects the following metrics that are required for the Gloo UI graph. Prometheus scrapes the Gloo telemetry gateway and collector agent to feed other Gloo observability tools. You can also use the Prometheus expression browser to run PromQL queries on these metrics.

To reduce cardinality in the Gloo telemetry pipeline, only a few labels are collected for each metric. For more information, see Metric labels.

Istio proxy metrics

Metric Description
istio_requests_total The number of requests that were processed for an Istio proxy.
istio_request_duration_milliseconds The time it takes for a request to reach its destination in milliseconds.
istio_request_duration_milliseconds_bucket The time it takes for a request to reach its destination in milliseconds.
istio_request_duration_milliseconds_count The total number of Istio requests since the Istio proxy was last started.
istio_request_duration_milliseconds_sum The sum of all request durations since the last start of the Istio proxy.
istio_tcp_sent_bytes_total The total number of bytes that are sent in a response.
istio_tcp_received_bytes_total The total number of bytes that are received in a request.
istio_tcp_connections_opened_total The total number of open connections to an Istio proxy.

Istiod metrics

Metric Description
pilot_proxy_convergence_time The time it takes between applying a configuration change and the Istio proxy receiving the configuration change.

Cilium metrics

Metric Description
hubble_flows_processed_total The total number of network flows that were processed by the Cilium agent.
hubble_drop_total The total number of packages that were dropped by the Cilium agent.

Gloo management server metrics

Metric Description
gloo_mesh_reconciler_time_sec_bucket The time the Gloo management server needs to sync with the Gloo agents in the workload clusters to apply the translated resources. This metric is captured in seconds for the following intervals (buckets): 1, 2, 5, 10, 15, 30, 50, 80, 100, 200.
gloo_mesh_redis_sync_err The number of times the Gloo mangement server could not read from or write to the Gloo Redis instance.
gloo_mesh_redis_write_time_sec The time it takes in seconds for the Gloo mangement server to write to the Redis database.
gloo_mesh_translation_time_sec_bucket The time the Gloo management server needs to translate Gloo resources into Istio, Envoy, or Cilium resources. This metric is captured in seconds for the following intervals (buckets): 1, 2, 5, 10, 15, 20, 25, 30, 45, 60, and 120.
gloo_mesh_translator_concurrency The number of translation operations that the Gloo management server can perform at the same time.
relay_pull_clients_connected The number of Gloo agents that are connected to the Gloo management server.
relay_push_clients_warmed The number of Gloo agents that are ready to accept updates from the Gloo management server.
solo_io_gloo_gateway_license The number of minutes until the Gloo Gateway license expires. To prevent your management server from crashing when the license expires, make sure to upgrade the license before expiration.
solo_io_gloo_mesh_license The number of minutes until the Gloo Mesh Enterprise license expires. To prevent your management server from crashing when the license expires, make sure to upgrade the license before expiration.
solo_io_gloo_network_license The number of minutes until the Gloo Network for Cilium license expires. To prevent your management server from crashing when the license expires, make sure to upgrade the license before expiration.
translation_error The number of translation errors that were reported by the Gloo management server.
translation_warning The number of translation warnings that were reported by the Gloo management server.

Gloo telemetry pipeline metrics

Metric Description
otelcol_processor_refused_metric_points The number of metrics that were refused by the Gloo telemetry pipeline. For example, metrics might be refused to prevent collector agents from being overloaded in the case of insufficient memory resources.
otelcol_processor_refused_spans The metric spans that were refused by the memory_limiter in the Gloo telemetry pipeline to prevent collector agents from being overloaded.
otelcol_exporter_queue_capacity The amount of telemetry data that can be stored in memory while waiting on a worker in the collector agent to become available to send the data.
otelcol_exporter_queue_size The amount of telemetry data that is currently stored in memory. If the size is equal or larger than otelcol_exporter_queue_capacity, new telemetry data is rejected.
otelcol_loadbalancer_backend_latency The time the collector agents need to export telemetry data.
otelcol_exporter_send_failed_spans The number of telemetry data spans that could not be sent to a backend.

Metric labels

To reduce cardinality in the Gloo telemetry pipeline, only the following labels are collected for each metric.

Metric group Labels
Istio [“cluster”,“collector_pod”,“connection_security_policy”,“destination_cluster”,“destination_principal”,“destination_service”,“destination_workload”,“destination_workload_id”,“destination_workload_namespace”,“gloo_mesh”,“namespace”,“pod_name”,“reporter”,“response_code”,“source_cluster”,“source_principal”,“source_workload”,“source_workload_namespace”,“version”,“workload_id”]
Telemetry pipeline [“app”,“cluster”,“collector_name”,“collector_pod”,“component”,“exporter”,“namespace”,“pod_template_generation”,“processor”,“service_version”]
Hubble [“app”,“cluster”,“collector_pod”,“component”,“destination”,“destination_cluster”,“destination_pod”,“destination_workload”,“destination_workload_id”,“destination_workload_namespace”,“k8s_app”,“namespace”,“pod”,“protocol”,“source”,“source_cluster”,“source_pod”,“source_workload”,“source_workload_namespace”,“subtype”,“type”,“verdict”,“workload_id”]
Cilium (if enabled in Gloo telemetry pipeline) [“action”,“address_type”,“api_call”,“app”,“arch”,“area”,“cluster”,“collector_pod”,“component”,“direction”,“endpoint_state”,“enforcement”,“equal”,“error”,“event_type”,“family”,“k8s_app”,“le”,“level”,“map_name”,“method”,“name”,“namespace”,“operation”,“outcome”,“path”,“pod”,“pod_template_generation”,“protocol”,“reason”,“return_code”,“revision”,“scope”,“source”,“source_cluster”,“source_node_name”,“status”,“subsystem”,“target_cluster”,“target_node_ip”,“target_node_name”,“target_node_type”,“type”,“valid”,“value”,“version”]
eBPF (if enabled in Gloo telemetry pipeline) [“app”,“client_addr”,“cluster”,“code”,“collector_pod”,“component”,“destination”,“local_addr”,“namespace”,“pod”,“pod_template_generation”,“remote_identity”,“server_identity”,“source”]