View service mesh metrics

Use the Prometheus server that is built in to Gloo Mesh to monitor the health and performance of your service mesh.

Prometheus records and collects multi-dimensional data over time. With this data, you can easily see important health-related information for your service mesh, such as which routes perform well in your service mesh, where you might have a bottleneck, how fast your services respond to requests, or how many requests per second the services process. All data is stored in the Prometheus database and you can use the Prometheus Querying Language (PromQL) to perform complex queries and monitor how metrics change over time. In addition, you can easily set up alerts for when metrics reach a certain threshold.

Quicklinks for this page:

For more information about the built-in Prometheus server, the default setup and an overview of metrics that are available to you, see Service mesh metrics.

Before you begin

The information in this guide assumes that you followed the getting started tutorial for Kubernetes or OpenShift to install Gloo Mesh, install Istio with the IstioOperator manifest in the workload clusters, and deploy the Bookinfo app.

Open the built-in Prometheus dashboard

If you used the demo setup or the default Helm chart values when you installed Gloo Mesh Enterprise, the Prometheus server is automatically set up for you in the Gloo Mesh management cluster.

  1. Check if the Prometheus server is running in your Gloo Mesh management cluster.

    kubectl get pods -n gloo-mesh --context $MGMT_CONTEXT | grep prometheus 
    

    Example output:

    prometheus-server-647b488bb-wxlzh        2/2     Running   0          66m
    

    If no Prometheus server is set up in your management cluster, you can enable the Prometheus server by upgrading your Helm chart with the following command.

    helm upgrade gloo-mesh gloo-mesh-enterprise/gloo-mesh-enterprise \
    --namespace gloo-mesh --kube-context $MGMT_CONTEXT \
    --version=$VERSION \
    --set prometheus.enabled=true --reuse-values
    
  2. Set up port forwarding on your local machine to access the Prometheus dashboard.

    kubectl -n gloo-mesh port-forward deploy/prometheus-server 9090 --context $MGMT_CONTEXT
    
  3. In your web browser, enter localhost:9090/ to open the Prometheus dashboard.

View service mesh metrics in Prometheus

This guide assumes that you installed Istio

Use the product page microservice from the bookinfo app to test how metrics are sent from the Envoy proxy to see metrics that the Gloo Mesh agent collects for all the Envoy proxy in your service mesh,

  1. Set up port forwarding for the product page of the Istio bookinfo app.

    kubectl -n bookinfo port-forward deploy/productpage-v1 9080 --context $REMOTE_CONTEXT1
    
  2. Send multiple requests to the product page. Each request to the product page requires the product page to collect data from other bookinfo microservices. For each request, metrics are automatically sent to the Prometheus server.

    for ((i=1;i<=10;i++)); do curl -I -k "http://localhost:9080/productpage"; done
    
  3. From the Prometheus dashboard, enter the following PromQL query to see how many requests the product page's Envoy proxy in cluster1 sent to other bookinfo microservices.

    sum(istio_requests_total{workload_id="productpage-v1.bookinfo.cluster1"}) by (workload_id,destination_workload_id, response_code)
    
  4. Explore other queries that you run in the Prometheus dashboard to gain insight into your service mesh.

Sample queries to monitor apps in your service mesh

You can use the following metrics in your Prometheus UI to gain insight into the performance and health of the apps that run in your service mesh.

Metric PromQL query
Request rate for a given service rate(istio_requests_total{destination_app="<service_ID>"}[5m])
Request rate between source and destination workload rate(istio_requests_total{source_workload="istio-ingressgateway", destination_workload="<workload_ID>"}[5m])
Successful request rate to a destination workload rate(istio_requests_total{response_code=~"[2-3].*", destination_workload="<workload_ID>"}[5m])
Rate of failing requests to a destination workload `rate(istio_requests_total{response_code=~”[4-5].*", destination_workload="<workload_ID>”}[5m])|
Number of new requests within a certain timeframe sum(increase(istio_requests_total{}[5m])) by (workload_id, destination_workload_id)

Set up Prometheus for production

The built-in Prometheus server is a great way to gain insight into the performance of your service mesh. However, the pod is not set up with persistent storage and metrics are lost when the pod restarts or when the deployment is scaled down. A lot of organizations also run their own Prometheus-compatible solution or time series database that is hardened for production and integrates with other applications that might exist outside of the service mesh.

To build a production-level Prometheus setup, you can choose between the following options:

To read more about each option, see Best practices for collecting metrics in production

Replace the built-in Prometheus server with your own instance

In this setup, you disable the built-in Prometheus server and configure Gloo Mesh to use your production Prometheus instance instead.

  1. Configure Gloo Mesh to disable the default Prometheus instance and instead connect to your custom Prometheus server. If Gloo Mesh is already installed, you must use the helm uprade command to reinstall Gloo Mesh with the updated configuration and include the URL and port number of your Prometheus server instance. If Gloo Mesh is not yet installed, you can

    To use your own Prometheus server instance, make sure that the instance runs Prometheus version 2.16.0 or later.

    • Gloo Mesh is installed:

      helm upgrade gloo-mesh gloo-mesh-enterprise/gloo-mesh-enterprise \
       --namespace gloo-mesh \
       --kube-context $MGMT_CONTEXT \
       --set prometheus.enabled=false \
       --set prometheusUrl=<Prometheus_server_URL_and_port>
      
    • Gloo Mesh is not yet installed:

      helm install gloo-mesh-enterprise gloo-mesh-enterprise/gloo-mesh-enterprise \
       --namespace gloo-mesh \
       --kube-context ${MGMT_CONTEXT} \
       --set licenseKey=${GLOO_MESH_LICENSE_KEY} \
       --set prometheus.enabled=false \
       --set prometheusUrl=<Prometheus_server_URL_and_port>
      

      You can change the Prometheus settings in your Helm chart values file and use this file to install Gloo Mesh. For more information, see Modifying Helm chart values.

  2. Configure your Prometheus server to scrape metrics from the Gloo Mesh management server endpoint gloo-mesh-mgmt-server.gloo-mesh:9091. This setup might vary depending on the Prometheus server that you use. For example, if you use the Prometheus Community Chart, update the Helm values.yaml file as follows to scrape metrics from the Gloo Mesh management server.

    serverFiles:
      prometheus.yml:
        scrape_configs:
        - job_name: gloo-mesh
          scrape_interval: 15s
          scrape_timeout: 10s
          static_configs:
          - targets:
            - gloo-mesh-mgmt-server-admin.gloo-mesh:9091
    

Recommended: Locally federate metrics and provide them to your production monitoring instance

In this setup, you federate the metrics that you need and set up another Prometheus instance in the Gloo Mesh management cluster to scrape the federated metrics. Then, you can optionally forward the federated metrics to a Prometheus-compatible solution or a time series database that is hardened for production.

Before you begin, make sure that you followed the demo setup or installed Gloo Mesh with the default Helm values to set up the built-in Prometheus server. If you did not set up the built-in Prometheus server, upgrade your existing installation and set the prometheus.enabled Helm value to true.

  1. Get the configuration of the built-in Prometheus server in Gloo Mesh and save it to a local file on your machine.

    kubectl get configmap prometheus-server -n gloo-mesh --context $MGMT_CONTEXT -o yaml > config.yaml
    
  2. Review the metrics that are sent to the built-in Prometheus server by default.

    1. Set up port forwarding for the metrics endpoint of your Gloo Mesh management server to your local host.

      kubectl port-forward -n gloo-mesh --context $MGMT_CONTEXT deploy/gloo-mesh-mgmt-server 9091
      
    2. View the metrics that are collected by default.

      open http://localhost:9091/metrics
      
    3. Decide on the subset of metrics that you want to federate.

  3. Add a recording rule to the configmap of your Gloo Mesh Prometheus instance that you retrieved earlier to define how you want to aggregate the metrics. Recording rules let you precompute frequently needed or computationally expensive expressions. For example, you can remove high cardinality labels and federate only the labels that you need in future dashboards or alert queries. The results are saved in a new set of time series that you can later scrape or send to an external monitoring instance that is hardened for production. With this setup, you can protect your production instance as you send only the metrics that you need. In addition, you use the compute resources in the Gloo Mesh management cluster to prepare and aggregate the metrics.

    In this example, you use the istio_requests_total metric to record the total number of requests at the workload level in your service mesh. As part of this aggregation, pod labels are removed as they might lead to cardinality issues in certain environments. The result is saved as the workload:istio_requests_total metric to make sure that you can distinguish the original istio_requests_total metric from the aggregated one.

    apiVersion: v1
    data:
      alerting_rules.yml: |
        {}
      alerts: |
        {}
      prometheus.yml: |
      ...
      recording_rules.yml: |
        groups:
        - name: istio.workload.istio_requests_total
          interval: 10s
          rules:
          - record: workload:istio_requests_total
            expr: |
              sum(istio_requests_total{source_workload!=""})
              by (
                source_workload,
                source_workload_namespace,
                destination_service,
                source_app,
                destination_app,
                destination_workload,
                destination_workload_namespace,
                response_code,
                response_flags,
                reporter
              )
      rules: |
        {}
    kind: ConfigMap
    ...
       

  4. Deploy another Prometheus instance in the Gloo Mesh management cluster to scrape the federated metrics from the Gloo Mesh Prometheus instance.

    1. Create the monitoring namespace in the Gloo Mesh management cluster.
      kubectl create namespace monitoring --context $MGMT_CONTEXT
      
    2. Add the Prometheus community Helm repository.
      helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
      
    3. Install the Prometheus community chart.
      helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack --version 30.0.1 -f values.yaml -n monitoring --debug
      
    4. Verify that the Prometheus pods are running.
      kubectl get pods -n monitoring --context $MGMT_CONTEXT
      
  5. Add a service monitor to the Prometheus instance that you just created to scrape the aggregated metrics from the Gloo Mesh Prometheus instance and to expose them on the /federate endpoint.

    In the following example, metrics from the Gloo Mesh Prometheus instance that match the 'workload:(.*)' regex expression are scraped. With the recording rule that you defined earlier, workload:istio_requests_total is the only metric that matches this criteria. The service monitor configuration also removes workload: from the metric name so that it is displayed as the istio_requests_total metric in Prometheus queries. To access the aggregated metrics that you scraped, you send a request to the /federate endpoint and provide match[]={__name__=<metric>} as a request parameter.

    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      name: gloo-metrics-federation
      namespace: monitoring
      labels:
        app.kubernetes.io/name: gloo-prometheus
    spec:
      namespaceSelector:
        matchNames:
        - gloo-mesh
      selector:
        matchLabels:
          app: prometheus
      endpoints:
      - interval: 30s
        scrapeTimeout: 30s
        params:
          'match[]':
          - '{__name__=~"workload:(.*)"}'
        path: /federate
        targetPort: 9090
        honorLabels: true
        metricRelabelings:
        - sourceLabels: ["__name__"]
          regex: 'workload:(.*)'
          targetLabel: "__name__"
          action: replace
    
  6. Access the /federate endpoint to see the scraped metrics. Note that you must include the match[]={__name__=<metric>} request parameter to successfully see the aggregated metrics.

    1. Port forward the Prometheus service so that you can access the Prometheus UI on your local machine.

      kubectl port-forward service/kube-prometheus-stack-prometheus --context $MGMT_CONTEXT -n monitoring 9090
      
    2. Open the targets that are configured for your Prometheus instance.

      open https://localhost:9090/targets
      
    3. Select the gloo-metrics-federation target that you configured and verify that the endpoint address and match condition are correct, and that the State displays as UP.

      Gloo federation target

    4. Optional: Access the aggregated metrics on the /federate endpoint.

      open https://localhost:9090/federate?match[]={__name__="istio_requests_total"}
      

      Example output:

      # TYPE istio_requests_total untyped
      istio_requests_total{container="prometheus-server",destination_app="ratings",destination_service="ratings.bookinfo.svc.cluster.local",destination_workload="ratings-v1",destination_workload_namespace="bookinfo",endpoint="9090",job="prometheus-server",namespace="gloo-mesh",pod="prometheus-server-647b488bb-ns748",reporter="destination",response_code="200",response_flags="-",service="prometheus-server",source_app="istio-ingressgateway",source_workload="istioingressgateway",source_workload_namespace="istio-system",instance="",prometheus="monitoring/kube-prometheus-stack-prometheus",prometheus_replica="prometheus-kube-prometheus-stack-prometheus-0"} 11 1654888576995
      istio_requests_total{container="prometheus-server",destination_app="ratings",destination_service="ratings.bookinfo.svc.cluster.local",destination_workload="ratings-v1",destination_workload_namespace="bookinfo",endpoint="9090",job="prometheus-server",namespace="gloo-mesh",pod="prometheus-server-647b488bb-ns748",reporter="source",response_code="200",response_flags="-",service="prometheus-server",source_app="istio-ingressgateway",source_workload="istio-ingressgateway",source_workload_namespace="istio-system",instance="",prometheus="monitoring/kube-prometheus-stack-prometheus",prometheus_replica="prometheus-kube-prometheus-stack-prometheus-0"} 11 1654888576995
      
  7. Forward the federated metrics to your external Prometheus-compatible solution or time series database that is hardened for production. Refer to the Prometheus documentation to explore your forwarding options or try out the Prometheus agent mode.