Debugging Gloo Mesh

Debug the Gloo Mesh management server

Debug the Gloo Mesh management server to verify your relay connection between the server and agent.

Debug Gloo Mesh agents

Debug the Gloo Mesh agent to make sure that your custom resource configurations can be applied.

Debug Gloo Mesh resources

Check your Gloo Mesh custom resources for issues that might impact your network traffic.

Debug routes

Check for issues if you cannot reach your app on a certain host and port.

Debug Redis

Restart Redis if you encounter persistent state issues for your Gloo Mesh custom resources.

Debug the management server and relay connection

  1. Check that the relay connection between the management server and workload agents is healthy.

    1. Forward port 9091 of the gloo-mesh-mgmt-server pod to your localhost.
      kubectl port-forward -n gloo-mesh --context $MGMT_CONTEXT deploy/gloo-mesh-mgmt-server 9091
      
    2. In your browser, connect to http://localhost:9091/metrics.
    3. In the metrics UI, look for the following lines. If the values are 1, the agents in the workload clusters are successfully registered with the management server. If the values are 0, the agents are not successfully connected. The warmed successful message indicates that the management server can push configuration to the agents.
      relay_pull_clients_connected{cluster="cluster-1"} 1
      relay_pull_clients_connected{cluster="cluster-2"} 1
      relay_push_clients_connected{cluster="cluster-1"} 1
      relay_push_clients_connected{cluster="cluster-2"} 1
      relay_push_clients_warmed{cluster="cluster-1"} 1
      relay_push_clients_warmed{cluster="cluster-2"} 1
      
    4. Take snapshots in case you want to refer to the logs later, such as to open a Support issue.
      curl localhost:9091/snapshots/input -o input_snapshot.json 
      curl localhost:9091/snapshots/output -o output_snapshot.json
      
  2. Check that the Gloo Mesh management services are running.

    1. Send a gRPC request to the Gloo Mesh management server.

      kubectl get secret --context $MGMT_CONTEXT -n gloo-mesh relay-root-tls-secret -o json | jq -r '.data["ca.crt"]' | base64 -d  > ca.crt
      grpcurl -authority enterprise-networking.gloo-mesh --cacert=./ca.crt 35.184.21.94:9900 list
      
    2. Verify that the following services are listed.

      envoy.service.accesslog.v3.AccessLogService
      envoy.service.metrics.v2.MetricsService
      envoy.service.metrics.v3.MetricsService
      grpc.reflection.v1alpha.ServerReflection
      relay.multicluster.skv2.solo.io.RelayCertificateService
      relay.multicluster.skv2.solo.io.RelayPullServer
      relay.multicluster.skv2.solo.io.RelayPushServer
      
  3. Check the logs on the gloo-mesh-mgmt-server pod on the management cluster for communication from the workload cluster.

    kubectl -n gloo-mesh --context $MGMT_CONTEXT logs deployment/gloo-mesh-mgmt-server | grep $REMOTE_CLUSTER
    

    Example output:

    {"level":"debug","ts":1616160185.5505846,"logger":"pull-resource-deltas","msg":"recieved request for delta: response_nonce:\"1\"","metadata":{":authority":["gloo-mesh-mgmt-server.gloo-mesh.svc.cluster.local:11100"],"content-type":["application/grpc"],"user-agent":["grpc-go/1.34.0"],"x-cluster-id":["remote.cluster"]},"peer":"10.244.0.17:40074"}
    

    To increase the verbosity of the logs, you can patch the management server deployment.

    kubectl patch deploy -n gloo-mesh gloo-mesh-mgmt-server --context $MGMT_CONTEXT --type "json" -p '[{"op":"add","path":"/spec/template/spec/containers/0/args/-","value":"--verbose=true"}]'
    

Back to top

Debug the agent

The Gloo Mesh agent keeps track of changes to Gloo Mesh custom resources in the cluster. It reports these changes back to the management cluster.

  1. Verify that the Gloo Mesh agent pod is running.

    kubectl get pods -n gloo-mesh --context ${REMOTE_CONTEXT1}
    

    If not, describe the pod and look for error messages.

    kubectl describe pod -n gloo-mesh -l app=gloo-mesh-agent --context ${REMOTE_CONTEXT1}
    
  2. Check the logs of the Gloo Mesh agent in your workload cluster. Optionally, you can format the output with jq or save it in a local file so that you can read and analyze the output more easily.

    kubectl logs --context ${REMOTE_CONTEXT1} -n gloo-mesh deploy/gloo-mesh-agent | jq
    
  3. In the logs, look for "err" messages. For example, you might see a message similar to the following.

    Message Description Steps to resolve
    "err": "RouteTable.networking.gloo.solo.io \"bookinfo\" not found", Gloo Mesh expected to find a resource such as a route table named bookinfo. You can check the resource field to see which namespace the resource was expected in. If you recently deleted the resource, wait to see if the error resolves itself. If not, try debugging the resource.
    "err": "Operation cannot be fulfilled on virtualgateways.networking.gloo.solo.io \"north-south-gw\": the object has been modified; please apply your changes to the latest version and try again Gloo Mesh is trying to reconcile your changes to the resource, such as updating a virtual gateway to listen on a different port. If you recently updated the resource, wait to see if the error resolves itself. If not, try debugging the resource.
  4. If you see many error messages and are unsure of the timeline of events, you can try to reset the logs. To do so, you delete the agent pod. You cannot make changes to your Gloo Mesh resources while the pod is not running.

    kubectl delete pod -n gloo-mesh -l app=gloo-mesh-agent --context ${REMOTE_CONTEXT1}
    
  5. Re-run the first command to get the logs of your new Gloo Mesh agent. Note that the agent might take a few minutes to produce logs.

  6. If you continue to see error messages that indicate state reconciliation issues, try debugging the Gloo Mesh resource.

Back to top

Debug Gloo Mesh resources

Sometimes, you might create Gloo Mesh resources and expect a certain result, such as a policy applying to traffic or a gateway listening for traffic. If you do not get the expected result, try the following general debugging steps.

  1. Check the Gloo Mesh resources in your management and remote clusters. You might find the following commands useful.
    kubectl get workspaces,workspacesettings,roottrustpolicies,dashboards,kubernetesclusters -A --context $MGMT_CONTEXT
    
    kubectl get ratelimitserverconfigs,RatelimitConfigs,ratelimitserversettings,ratelimitclientconfigs,ratelimitpolicies,wasmdeploymentpolicies,externalendpoints,externalservices,accesslogpolicies,failoverpolicies,faultinjectionpolicies,outlierdetectionpolicies -A --context $REMOTE_CONTEXT
    
    kubectl get virtualgateways,routetables,virtualdestinations,virtualservices,workspacesettings,retrytimeoutpolicies,accesspolicies,corspolicies,csrfpolicies,extauthpolicies,mirrorpolicies,transformationpolicies -A --context $REMOTE_CONTEXT
    
  2. Describe the resource, and look at the status, events, and other areas for more information.
    kubectl describe virtualgateway $GATEWAY_NAME -n $NAMESPACE --context $REMOTE_CONTEXT
    
  3. Verify that the resource is in the workspace that you expect it to be in. You can check the resource's namespace against the namespaces that are included in the workspace resource on the management cluster.
  4. Verify that related resources are in the same workspace, or are exported and imported appropriately. For example, your virtual gateway must be in the same workspace as the route table and policy that you want it to work with.
  5. If applicable, verify that the ports are set up appropriately on the resource and the backing service. For example, your virtual gateway might listen on port 80, which matches the port of the Kubernetes service for the gateway deployment.
  6. Check the logs of the management server in the management cluster for accepted or translated resource messages.
  7. Check the logs of agent pods on the remote cluster that the resource is created in.
  8. If you upgraded Gloo Mesh versions recently, make sure that you applied the CRDs as part of the upgrade.
  9. If you continue to see error messages that indicate state reconciliation issues, try restarting the Redis pod.

Back to top

Debug routes

Sometimes, you might notice errors when you try to access your services over a route.

  1. Try to send a request to your destination with verbose mode and look for error messages. For example, you might curl the path to your app on the ingress gateway.

    curl -vik https://${INGRESS_GW_IP}:443/ratings/1 -H "Host: www.example.com"
    

    Example output:

    Message Description Steps to resolve
    Connection reset by peer The destination sent a reset (RST) packet and dropped the connection. No TLS handshake was established. This issue often indicates a TLS issue. Skip to Step 5 to check the virtual gateway configuration for the tls section. Make sure the secret exists with valid credentials.
    Connection refused Your gateway might not be listening on the right port. Skip to Step 5 to verify that the gateway listens on the correct port for the correct host.
    successfully set certificate...SSL_ERROR_SYSCALL in connection Your TLS secrets are set up, but the hostname might not match the hosts in your virtual gateway configuration. For example, the secret might be a wildcard *.example.com, but the virtual gateway configuration specifies only www.example.com. Make sure that the TLS secrets for the gateway are in the same namespace as the ingress gateway. Then, make sure that the hostname that the secret is for matches the hostnames in the virtual gateway configuration.
  2. Verify that your Gloo Mesh setup is working correctly.

    1. Check the management server.
    2. Check the agent for the clusters where your resources run.
    3. Make sure that your management server and agent run the same minor version of Gloo Mesh.
      meshctl version --kubecontext ${MGMT_CONTEXT}
      meshctl version --kubecontext ${REMOTE_CONTEXT1}
      meshctl version --kubecontext ${REMOTE_CONTEXT2}
      
  3. Verify that your Istio gateway pods are in a Running state.

    kubectl get pods -A --context ${REMOTE_CONTEXT1}
    

    Example output:

    NAMESPACE      NAME                                     READY   STATUS    RESTARTS   AGE
    istio-system   istio-eastwestgateway-6559bbc949-xrfz6   1/1     Running   0          4d14h
    istio-system   istio-ingressgateway-64c544cfff-jgnkp    1/1     Running   0          4d16h
    

    If not, describe the pods, get the logs, and look for error messages.

    kubectl describe pod -n istio-system --context ${REMOTE_CONTEXT1} -l istio=ingressgateway
    
    kubectl logs --context ${REMOTE_CONTEXT1} -n istio-system deploy/istio-ingressgateway > logs-ingressgateway.txt
    

    For example, a filter_chain_not_found message indicates that the request does not have a matching SNI in the gateway. Make sure that the TLS secrets for the gateway are in the same namespace as the ingress gateway. Then, make sure that the hostname that the secret is for matches the hostnames in the virtual gateway configuration.

  4. If you are sending a request directly to the gateway and not to a host, check the service details for the IP address that the gateway is exposed on. Verify that you use this IP address in your request to the route.

    kubectl get svc -n istio-system --context ${REMOTE_CONTEXT1}
    

    In the following example, the external IP addresses are as follows:

    • istio-eastwestgateway gateway for traffic within the service mesh: 35.xxx.xx.x1
    • istio-ingressgateway gateway for traffic from outside the service mesh: 35.xxx.x.xx9
    NAME                    TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)                                                      AGE
    istio-eastwestgateway   LoadBalancer   10.xxx.x.xx8    35.xxx.xx.x1   15021:30974/TCP,15443:30522/TCP                              5d18h
    istio-ingressgateway    LoadBalancer   10.xxx.xx.xx3   35.xxx.x.xx9   15021:31225/TCP,80:30381/TCP,443:31078/TCP,15443:30649/TCP   5d18h
    
  5. If you are sending the request to a host, check your Gloo Mesh virtual gateway resource. Look for the information in the following table.

    kubectl get virtualgateways -A -o yaml --context ${REMOTE_CONTEXT1}
    

    Example output:

    apiVersion: networking.gloo.solo.io/v2
    kind: VirtualGateway
    metadata:
      name: north-south-gw
      namespace: bookinfo
    spec:
      listeners:
      - allowedRouteTables:
        - host: www.example.com
        http: {}
        port:
          number: 443
        tls:
          mode: SIMPLE
          secretName: gw-ssl-1-secret
      workloads:
      - selector:
          labels:
            istio: ingressgateway
    
    Review the following table to understand this configuration.
    Setting Description
    namespace Make sure that the gateway's namespace is in the same workspace as the route's app. Or, the gateway workspace must import the route's app.
    allowedRouteTables Make sure that the host that you call is allowed.
    port Check that the port matches the port that you call. For example, if you sent a request along an https:// URL, but the port is 80 (for HTTP), the request fails.
    tls Check the TLS details. If the gateway listens on port 443 for HTTPS traffic, this section is required.
    workloads Make sure that the virtual gateway selects the gateway workload that you checked earlier.
  6. If you are sending the request to a host, check your Gloo Mesh route table resource. Look for the information in the following table.

    kubectl get routetables -A -o yaml --context ${REMOTE_CONTEXT1}
    

    Example output:

    apiVersion: networking.gloo.solo.io/v2
    kind: RouteTable
    metadata:
      name: www-example-com
      namespace: bookinfo
    spec:
      defaultDestination:
        port:
          number: 9080
        ref:
          name: ratings
          namespace: bookinfo
      hosts:
      - www.example.com
      http:
      - forwardTo: {}
        labels:
          "no": auth
        matchers:
        - headers:
          - name: noauth
            value: "true"
        name: ratings-ingress-no-auth
      - forwardTo: {}
        labels:
          route: ratings
        name: ratings-ingress
      virtualGateways:
      - name: north-south-gw
    
    Review the following table to understand this configuration.
    Setting Description
    namespace Make sure that the route table's namespace is in the same workspace as the route's app. Or, the route table's workspace must import the route's app.
    defaultDestination Make sure that the details for your app are correct, such as the port and reference.
    host Verify that the host you call is included in the route table.
    forwardTo Check the forward to details. For example, the route table might be set up to forward requests with certain headers or labels to a different service.
    virtualGateways Make sure that the route table selects the virtual gateway that you checked earlier.
  7. Check that the underlying Istio resources are working properly. For example, your gateway might be missing a secret.

    istioctl analyze --context ${REMOTE_CONTEXT1} -n istio-system
    

Back to top

Restart Redis

After debugging the management server and agent pods, you might still see error messages related to reconciling state. Gloo Mesh stores the state of its resources in a Redis pod. You can try to restart the pod to resolve these reconciliation issues.

  1. Enable port forwarding on port 9091 of the gloo-mesh-mgmt-server pod to your localhost.

    kubectl port-forward -n gloo-mesh --context $MGMT_CONTEXT deploy/gloo-mesh-mgmt-server 9091
    
  2. Take snapshots of your current state in case you want to refer to the logs later, such as to open a Support issue.

    curl localhost:9091/snapshots/input -o input_snapshot.json 
    curl localhost:9091/snapshots/output -o output_snapshot.json
    
  3. Get the gloo-mesh-redis-* pod.

    kubectl get pods -n gloo-mesh --context $MGMT_CONTEXT
    

    Example output:

    NAME                                    READY   STATUS    RESTARTS   AGE
    gloo-mesh-mgmt-server-c7cc4dd77-8shdw   1/1     Running   0          4d19h
    gloo-mesh-redis-794d79b7df-28mcr        1/1     Running   0          4d19h
    gloo-mesh-ui-c8cfd5fdd-mdscf            3/3     Running   0          4d19h
    prometheus-server-647b488bb-ns748       2/2     Running   0          4d19h
    
  4. Delete the Redis pod.

    kubectl delete pod -n gloo-mesh --context $MGMT_CONTEXT $POD
    
  5. Try checking your Gloo Mesh management server or agent logs to see if the reconciliation errors are resolved.

Back to top