Metrics

This guide describes how to get started with Gloo Mesh Enterprise's out of the box metrics suite.

This feature currently only supports Istio meshes.

Before you begin

This guide assumes the following:

Environment Prerequisites

Istio

Each managed Istio control plane must be installed with the following configuration in the IstioOperator manifest.


CLUSTER_NAME=cluster-1
ISTIO_VERSION=1.10.5
cat << EOF | istioctl manifest install -y -f -
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  name: gloo-mesh-istio
  namespace: istio-system
spec:
  # only the control plane components are installed (https://istio.io/latest/docs/setup/additional-setup/config-profiles/)
  profile: minimal
  # Solo.io Istio distribution repository
  hub: gcr.io/istio-enterprise
  # Solo.io Gloo Mesh Istio tag
  tag: ${ISTIO_VERSION}

  meshConfig:
    # enable access logging to standard output
    accessLogFile: /dev/stdout

    defaultConfig:
      # wait for the istio-proxy to start before application pods
      holdApplicationUntilProxyStarts: true
      # enable Gloo Mesh metrics service (required for Gloo Mesh Dashboard)
      envoyMetricsService:
        address: enterprise-agent.gloo-mesh:9977
       # enable GlooMesh accesslog service (required for Gloo Mesh Access Logging)
      envoyAccessLogService:
        address: enterprise-agent.gloo-mesh:9977
      proxyMetadata:
        # Enable Istio agent to handle DNS requests for known hosts
        # Unknown hosts will automatically be resolved using upstream dns servers in resolv.conf
        # (for proxy-dns)
        ISTIO_META_DNS_CAPTURE: "true"
        # Enable automatic address allocation (for proxy-dns)
        ISTIO_META_DNS_AUTO_ALLOCATE: "true"
        # Used for gloo mesh metrics aggregation
        # should match trustDomain (required for Gloo Mesh Dashboard)
        GLOO_MESH_CLUSTER_NAME: ${CLUSTER_NAME}

    # Set the default behavior of the sidecar for handling outbound traffic from the application.
    outboundTrafficPolicy:
      mode: ALLOW_ANY
    # The trust domain corresponds to the trust root of a system. 
    # For Gloo Mesh this should be the name of the cluster that cooresponds with the CA certificate CommonName identity
    trustDomain: ${CLUSTER_NAME}
  components:
    ingressGateways:
    # enable the default ingress gateway
    - name: istio-ingressgateway
      enabled: true
      k8s:
        service:
          type: LoadBalancer
          ports:
            # main http ingress port
            - port: 80
              targetPort: 8080
              name: http2
            # main https ingress port
            - port: 443
              targetPort: 8443
              name: https
            # Port for gloo-mesh multi-cluster mTLS passthrough (Required for Gloo Mesh east/west routing)
            - port: 15443
              targetPort: 15443
              # Gloo Mesh looks for this default name 'tls' on an ingress gateway
              name: tls
    pilot:
      k8s:
        env:
         # Allow multiple trust domains (Required for Gloo Mesh east/west routing)
          - name: PILOT_SKIP_VALIDATE_TRUST_DOMAIN
            value: "true"
  values:
    # https://istio.io/v1.5/docs/reference/config/installation-options/#global-options
    global:
      # needed for connecting VirtualMachines to the mesh
      network: ${CLUSTER_NAME}
      # needed for annotating istio metrics with cluster (should match trust domain and GLOO_MESH_CLUSTER_NAME)
      multiCluster:
        clusterName: ${CLUSTER_NAME}
EOF

The envoyMetricsService config ensures that all Envoy proxies are configured to emit their metrics to the Enterprise Agent, which acts as an Envoy metrics service sink. The Enterprise Agents then forward all received metrics to Enterprise Networking, where metrics across all managed clusters are centralized.

The multiCluster config enables Istio collected metrics to be annotated with the Gloo Mesh registered cluster name. This allows for proper attribution of metrics in multicluster environments, and is particularly important for attributing requests that cross cluster boundaries.

Gloo Mesh Enterprise

When installing Gloo Mesh Enterprise, the enterprise-networking.metricsBackend.prometheus.enabled Helm value must be set to true. This can be done by providing the following argument to the helm install command, --set enterprise-networking.metricsBackend.prometheus.enabled=true.

This configures Gloo Mesh to install a Prometheus server which comes preconfigured to scrape the centralized metrics from the Enterprise Networking metrics endpoint.

After installation of the Gloo Mesh management plane into cluster-1, you should see the following deployments:

gloo-mesh      enterprise-networking-69d74c9744-8nlkd               1/1     Running   0          23m
gloo-mesh      prometheus-server-68b58c79f8-rlq54                   2/2     Running   0          23m

Functionality

Generate Traffic

Before any meaningful metrics are collected, traffic has to be generated in the system.

Port forward the productpage deployment (the productpage workload is convenient because it makes requests to the other workloads, but any workload of your choice will suffice).

kubectl -n bookinfo port-forward deploy/productpage-v1 9080

Then using a utility like hey, send requests to that destination:

# send 1 request per second
hey -z 1h -c 1 -q 1 http://localhost:9080/productpage\?u\=normal

Note that you may need to wait a few minutes before the metrics are returned from the Gloo Mesh API discussed below. The metrics need time to propagate from the Envoy proxies to the Gloo Mesh server, and for the Prometheus server to scrape the data from Gloo Mesh.

Prometheus UI

The Prometheus server comes with a builtin UI suitable for basic metrics querying. You can view it with the following commands:

# port forward prometheus server
kubectl -n gloo-mesh port-forward deploy/prometheus-server 9090

Then open localhost:9090 in your browser of choice. Here is a simple promql query to get you started with navigating the collected metrics. This query fetches the istio_requests_total metric (which counts the total number of requests) emitted by the productpage-v1.bookinfo.cluster-1 workload's Envoy proxy. You can read more about PromQL in the official documentation.

sum(
  increase(
    istio_requests_total{
      gm_workload_ref="productpage-v1.bookinfo.cluster-1",
    }[2m]
  )
) by (
  gm_workload_ref,
  gm_destination_workload_ref,
  response_code,
)

Using a Custom Prometheus Instance

To integrate Gloo Mesh with an existing Prometheus server or other Prometheus-compatible solution, you must disable the default Prometheus server. Then, configure Gloo Mesh and your Prometheus server to communicate with each other.

1. Set up Gloo Mesh Enterprise to disable the default Prometheus instance and instead read from your custom Prometheus instance's full URL, including the port number. You can include the following --set flags in a helm upgrade command, or update these fields in your Helm values configuration file when you install Gloo Mesh Enterprise.

--set enterprise-networking.metricsBackend.prometheus.enabled=false
--set enterprise-networking.metricsBackend.prometheus.url=<URL (with port) to Prometheus server>

2. Configure your Prometheus server to scrape metrics from Gloo Mesh. Although each solution might have a different setup, configure your solution to scrape from the enterprise-networking.gloo-mesh:9091 endpoint and respect the Prometheus scrapping annotations in the Gloo Mesh deployment.

For example, if you have the Prometheus Community Chart, update the Helm values.yaml file as follows to scrape metrics from Gloo Mesh.

serverFiles:
  prometheus.yml:
    scrape_configs:
    - job_name: gloo-mesh
      scrape_interval: 15s
      scrape_timeout: 10s
      static_configs:
      - targets:
        - enterprise-networking.gloo-mesh:9091

3. Optional: Scrape metrics from the agents on data plane clusters. You might collect these metrics for operational awareness of the system, such as for troubleshooting purposes. Note that these metrics are not rendered in the service graph of the Gloo Mesh UI. To collect these metrics, configure your Prometheus instance to scrape the enterprise-agent.gloo-mesh:9091/metrics endpoint on the data plane clusters.