Debugging the Gloo UI

You can use the Gloo UI and other observability tools to help debug your service mesh environment. But what happens when those observability tools go wrong?

If something is broken in the UI, you can check several places. The Gloo metrics pod observes traffic from the Istio sidecar in your pods to the Gloo apiserver in the management cluster, as shown in the following figure.

Figure: Model of places to check for Gloo observability issues.

Figure: Model of places to check for Gloo observability issues.

Check the Prometheus server metrics

The following example uses the Prometheus server that is deployed with Gloo by default.

  1. Enable port-forwarding of the Prometheus server deployment.

    kubectl port-forward -n gloo-mesh deploy/prometheus-server 9090
    
  2. Open your local host to the port.

    open http://localhost:9090/targets
    
  3. Check for a green UP state, such as in the following figure.

    Figure: Example of green UP state.

    Figure: Example of green UP state.

  4. Check the following query: http://localhost:9090/graph?g0.expr=istio_requests_total&g0.tab=1&g0.stacked=0&g0.range_input=1h.

  5. If you get an Empty query result from the previous step, confirm that your Gloo agents are not sending metrics to the management server.

    1. Open the management server dashboard.
        For more information, see the [CLI documentation](
      

      /gloo-mesh-enterprise/main/reference/cli/meshctl_proxy/ ). shell meshctl proxy

        Forward port 9091 of the `gloo-mesh-mgmt-server` pod to your localhost.
        ```sh
        kubectl port-forward -n gloo-mesh deploy/gloo-mesh-mgmt-server 9091
        ```
      
    2. In your browser, connect to http://localhost:9091/metrics.
    3. Search for istio_requests_total.
    4. If the search returns no results, debug the Gloo agents.
  6. If you see data returned, check the Prometheus queries.

    1. Get your current Helm values for the management cluster. Change the release name as needed.

      helm get values gloo-platform -n gloo-mesh --kube-context $MGMT_CONTEXT > mgmt-server.yaml
      open mgmt-server.yaml
      
    2. Delete the first line that contains USER-SUPPLIED VALUES:, and save the file.

    3. Upgrade your Helm release in the management cluster. Include the --set common.verbose=true flag.

      helm upgrade gloo-platform gloo-platform/gloo-platform \
         --kube-context $MGMT_CONTEXT \
         --namespace gloo-mesh \
         --version $UPGRADE_VERSION \
         -f mgmt-server.yaml \
         --set common.cluster=$MGMT_CLUSTER \
         --set licensing.glooGatewayLicenseKey=$GLOO_GATEWAY_LICENSE_KEY \
         --set common.verbose=true
      
    4. Check the logs of the gloo-mesh-ui pod. To view logs recorded since a relative duration such as 5s, 2m, or 3h, you can specify the --since <duration> flag.

      meshctl logs ui --kubecontext $MGMT_CONTEXT [--since DURATION]
      

      The logs resemble the following example.

      {"level":"debug","ts":1650923640.6589358,"logger":"prometheus-source","caller":"prom/range.go:59","msg":"executing query 
      sum(
        increase(
          istio_requests_total{
            workload_id=~\".+.bookinfo-experiments.cluster1|.+.gloo-mesh.cluster1|.+.bookinfo.cluster1|.+.bookinfo-demo.cluster1|.+.bookinfo.cluster2|.+.bookinfo-security.cluster2\",
            destination_workload_id=~\".+..+..+\",
            response_code=~\"[2-3].*\",
            reporter=\"source\",
            # Exclude data from outside the mesh
            workload_id!=\"unknown.unknown.unknown\",
          }[15m]
        )
        ) by (
        workload_id,
        destination_workload_id,
        )
      : "}
      
    5. Review the Prometheus queries for the Gloo Graph.

      open http://localhost:9090/graph