View metrics

To view all metrics that are available in Prometheus, follow these steps:

  1. Port-forward the Prometheus pod in your cluster.

  2. Open the Prometheus expression browser to run PromQL queries on metrics.

Default metrics

If you followed the Get started guide to install Gloo Mesh Enterprise, the Gloo telemetry pipeline is automatically set up for you. The Gloo telemetry pipeline collects the following metrics that are required for the Gloo UI Graph. Prometheus scrapes the Gloo telemetry gateway and collector agent to feed other Gloo observability tools. You can also use the Prometheus expression browser to run PromQL queries on these metrics.

Istio proxy metrics

MetricDescription
istio_requests_totalThe number of requests that were processed for an Istio proxy.
istio_request_duration_millisecondsThe time it takes for a request to reach its destination in milliseconds.
istio_request_duration_milliseconds_bucketThe time it takes for a request to reach its destination in milliseconds.
istio_request_duration_milliseconds_countThe total number of Istio requests since the Istio proxy was last started.
istio_request_duration_milliseconds_sumThe sum of all request durations since the last start of the Istio proxy.
istio_tcp_sent_bytesThe number of bytes that are sent in a response at a particular moment in time.
istio_tcp_sent_bytes_totalThe total number of bytes that are sent in a response.
istio_tcp_received_bytesThe number of bytes that are received in a request at a particular moment in time.
istio_tcp_received_bytes_totalThe total number of bytes that are received in a request.
istio_tcp_connections_openedThe number of open connections to an Istio proxy at a particular moment in time.
istio_tcp_connections_opened_totalThe total number of open connections to an Istio proxy.

Istiod metrics

MetricDescription
pilot_proxy_convergence_timeThe time it takes between applying a configuration change and the Istio proxy receiving the configuration change.

Cilium metrics

MetricDescription
cilium_bpf_map_pressureThe ratio of the required map size compared to its configured size. Values that are greater than or equal to 1.0 indicate that the map is full.
cilium_drop_count_totalThe total number of dropped packages.
cilium_endpoint_regeneration_time_stats_secondsThe total time in seconds that the Cilium agent needed to generate Cilium endpoints.
cilium_identityThe number of identities that are currently allocated.
cilium_node_connectivity_statusThe connectivity status of each node in the cluster.
cilium_operator_ipam_ipsThe total number of used IP addresses that are currently in use.
cilium_policy_endpoint_enforcement_statusThe number of endpoints that are labeled by the policy enforcement status.
cilium_unreachable_nodesThe number of nodes that are not reachable.
hubble_flows_processed_totalThe total number of network flows that were processed by the Cilium agent.
hubble_drop_totalThe total number of packages that were dropped by the Cilium agent.

Gloo management server metrics

Gloo management server metrics are automatically scraped by the built-in Prometheus and can be viewed by accessing the Prometheus expression browser directly or by opening the Gloo operations dashboard in Grafana. However, because these metrics are not collected by the metrics/ui pipeline, these metrics are not made available to the Gloo telemetry gateway so that you can forward them to third-party solutions, such as Datadog.

You can configure your Gloo telemetry pipeline to scrape these metrics from the Gloo management server and make these metrics available to the Gloo telemetry gateway. For more information, see Add Gloo management server metrics.

MetricDescription
gloo_mesh_build_snapshot_metric_time_secThe time in seconds for the Gloo management server to generate an output snapshot for connected Gloo agents.
gloo_mesh_garbage_collection_time_secThe time it takes for the garbage collector to clean up unused resources in seconds, such as after the custom resource translation.
gloo_mesh_reconciler_time_sec_bucketThe time the Gloo management server needs to sync with the Gloo agents in the workload clusters to apply the translated resources. This metric is captured in seconds for the following intervals (buckets): 1, 2, 5, 10, 15, 30, 50, 80, 100, 200.
gloo_mesh_redis_relation_errThe number of errors that occured during a read or write operation of relationship data to Redis.
gloo_mesh_redis_sync_errThe number of times the Gloo mangement server could not read from or write to the Gloo Redis instance.
gloo_mesh_redis_write_time_secThe time it takes in seconds for the Gloo mangement server to write to the Redis database.
gloo_mesh_relay_client_delta_pull_time_secThe time it takes for a Gloo agent to receive a delta output snapshot from the Gloo management server in seconds.
gloo_mesh_relay_client_delta_pull_errThe number of errors that occured while sending a delta output snapshot to a connected Gloo agent.
gloo_mesh_relay_client_delta_push_time_secThe time it takes for a Gloo agent to send a delta input snapshot to the Gloo management server in seconds.
gloo_mesh_relay_client_delta_push_errThe number of errors that occured while sending a delta input snapshot from the Gloo agent to the Gloo management server.
gloo_mesh_snapshot_upserter_op_time_secThe time it takes for a snapshot to be updated and/or inserted in the Gloo management server local memory in seconds.
gloo_mesh_safe_mode_activeIndicates whether safe mode is enabled in the Gloo management server. Safe mode is available in version 2.4.12, 2.5.5, 2.6.0, or later.
gloo_mesh_translation_time_sec_bucketThe time the Gloo management server needs to translate Gloo resources into Istio, Envoy, or Cilium resources. This metric is captured in seconds for the following intervals (buckets): 1, 2, 5, 10, 15, 20, 25, 30, 45, 60, and 120.
gloo_mesh_translator_concurrencyThe number of translation operations that the Gloo management server can perform at the same time.
object_write_fails_totalThe number of times the Gloo agent tried to write invalid Istio configuration to the cluster that was rejected by the Istio control plane istiod.
relay_pull_clients_connectedThe number of Gloo agents that are connected to the Gloo management server.
relay_push_clients_warmedThe number of Gloo agents that are ready to accept updates from the Gloo management server.
solo_io_gloo_gateway_licenseThe number of minutes until the Gloo Mesh Gateway license expires. To prevent your management server from crashing when the license expires, make sure to upgrade the license before expiration.
solo_io_gloo_mesh_licenseThe number of minutes until the Gloo Mesh Gateway license expires. To prevent your management server from crashing when the license expires, make sure to upgrade the license before expiration.
solo_io_gloo_network_licenseThe number of minutes until the Gloo Network for Cilium license expires. To prevent your management server from crashing when the license expires, make sure to upgrade the license before expiration.
translation_errorThe number of translation errors that were reported by the Gloo management server.
translation_warningThe number of translation warnings that were reported by the Gloo management server.

Gloo telemetry pipeline metrics

MetricDescription
otelcol_processor_refused_metric_pointsThe number of metrics that were refused by the Gloo telemetry pipeline processor. For example, metrics might be refused to prevent collector agents from being overloaded in the case of insufficient memory resources.
otelcol_receiver_refused_metric_pointsThe number of metrics that were refused by the Gloo telemetry pipeline receiver. For example, metrics might be refused to prevent collector agents from being overloaded in the case of insufficient memory resources.
otelcol_processor_refused_spansThe metric spans that were refused by the memory_limiter in the Gloo telemetry pipeline to prevent collector agents from being overloaded.
otelcol_exporter_queue_capacityThe amount of telemetry data that can be stored in memory while waiting on a worker in the collector agent to become available to send the data.
otelcol_exporter_queue_sizeThe amount of telemetry data that is currently stored in memory. If the size is equal or larger than otelcol_exporter_queue_capacity, new telemetry data is rejected.
otelcol_loadbalancer_backend_latencyThe time the collector agents need to export telemetry data.
otelcol_exporter_send_failed_spansThe number of telemetry data spans that could not be sent to a backend.

Metric labels

To reduce cardinality in the Gloo telemetry pipeline, only the following labels are collected for each metric.

Metric groupLabels
Istio[“cluster”, “collector_pod” , “connection_security_policy”, “destination_cluster”, “destination_principal”, “destination_service”, “destination_workload”, “destination_workload_id”, “destination_workload_namespace”, “gloo_mesh”, “namespace”, “pod_name”, “reporter”, “response_code”, “source_cluster”, “source_principal”, “source_workload”, “source_workload_namespace”, “version”, “workload_id”]
Telemetry pipeline[“app”, “cluster”, “collector_name”, “collector_pod”, “component”, “exporter”, “namespace”, “pod_template_generation”, “processor”, “service_version”]
Hubble[“app”, “cluster”, “collector_pod”, “component”, “destination”, “destination_cluster”, “destination_pod”, “destination_workload”, “destination_workload_id”, “destination_workload_namespace”, “k8s_app”, “namespace”, “pod”, “protocol”, “source”, “source_cluster”, “source_pod”, “source_workload”, “source_workload_namespace”, “subtype”, “type”, “verdict”, “workload_id”]
Cilium*[“action”, “address_type”, “api_call”, “app”, “arch”, “area”, “cluster”, “collector_pod”, “component”, “direction”, “endpoint_state”, “enforcement”, “equal”, “error”, “event_type”, “family”, “k8s_app”, “le”, “level”, “map_name”, “method”, “name”, “namespace”, “operation”, “outcome”, “path”, “pod”, “pod_template_generation”, “protocol”, “reason”, “return_code”, “revision”, “scope”, “source”, “source_cluster”, “source_node_name”, “status”, “subsystem”, “target_cluster”, “target_node_ip”, “target_node_name”, “target_node_type”, “type”, “valid”, “value”, “version”]
eBPF*[“app”, “client_addr”, “cluster”, “code”, “collector_pod”, “component”, “destination”, “local_addr”, “namespace”, “pod”, “pod_template_generation”, “remote_identity”, “server_identity”, “source”]

* if enabled in Gloo telemetry pipeline