Management server and relay connection

The Gloo management server configures the Gloo agents to maintain the desired state of your environment. When you create Gloo custom resources, the server translates these to the appropriate open source custom resources that your Gloo product is based on, such as Istio, Envoy, or Cilium. Then, the server pushes config changes to the agents to apply in the workload clusters.

Debug the management server

Debug the Gloo management server.

  1. Verify that the Gloo management server pod is running.

    kubectl get pods -n gloo-mesh -l app=gloo-mesh-mgmt-server --context ${MGMT_CONTEXT}
    

    If not, describe the pod and look for error messages. If you have multiple replicas, check each pod.

    kubectl describe pod -n gloo-mesh -l app=gloo-mesh-mgmt-server --context ${MGMT_CONTEXT}
    
  2. Optional: To increase the verbosity of the logs, you can patch the management server deployment.

    kubectl patch deploy -n gloo-mesh gloo-mesh-mgmt-server --context $MGMT_CONTEXT --type "json" -p '[{"op":"add","path":"/spec/template/spec/containers/0/args/-","value":"--verbose=true"}]'
    
  3. Check the logs of the Gloo management server pod. If you have multiple replicas, check each pod.

    kubectl -n gloo-mesh --context ${MGMT_CONTEXT} logs deployment/gloo-mesh-mgmt-server
    

    Optionally, you can format the output with jq or save it in a local file so that you can read and analyze the output more easily.

    kubectl -n gloo-mesh --context ${MGMT_CONTEXT} logs deployment/gloo-mesh-mgmt-server | jq > mgmt-server-logs.json
    
  4. In the logs, look for error messages. For example, you might see a message similar to the following.

    Message Description Steps to resolve
    json: cannot unmarshal array into Go struct field The Gloo configuration of the resource does not match the expected configuration in the Gloo custom resource definition. Gloo cannot translate the resource, and dependent resources such as policies do not work. Review the configuration of the resource against the API reference, and try debugging the resource. For example, a field might be missing or have an incorrect value such as the wrong cluster name. If you recently upgraded the management server version, make sure that you reapply the CRDs.
    License is invalid or expired, crashing - license expired The Gloo license is expired. Your Gloo management server is in a crash loop, and no Gloo resources can be modified until you update the license. See Updating your Gloo license.
  5. If you see many error messages and are unsure of the timeline of events, you can try to reset the logs. To do so, you delete the management server pod. You cannot make changes to your Gloo resources while the pod is restarting.

    kubectl delete pod -n gloo-mesh -l app=gloo-mesh-mgmt-server --context ${MGMT_CONTEXT}
    
  6. Re-run the previous command to get the logs of your new Gloo management server. Note that the management server might take a few minutes to produce logs.

  7. If you continue to see error messages that indicate state reconciliation issues, try debugging Gloo resources.

Debug the relay connection

Verify the relay connection between the Gloo management server and agent.

  1. Verify that the Gloo management server and agent pods are running. If not, try troubleshooting the management server or agent.

    kubectl get pods -n gloo-mesh --context $MGMT_CONTEXT
    kubectl get pods -n gloo-mesh --context $REMOTE_CONTEXT
    
  2. Check that the relay connection between the management server and workload agents is healthy.

    1. Forward port 9091 of the gloo-mesh-mgmt-server pod to your localhost.
      kubectl port-forward -n gloo-mesh --context $MGMT_CONTEXT deploy/gloo-mesh-mgmt-server 9091
      
    2. In your browser, connect to http://localhost:9091/metrics.
    3. In the metrics UI, look for the following lines. If the values are 1, the agents in the workload clusters are successfully registered with the management server. If the values are 0, the agents are not successfully connected. The warmed successful message indicates that the management server can push configuration to the agents.
      relay_pull_clients_connected{cluster="cluster-1"} 1
      relay_pull_clients_connected{cluster="cluster-2"} 1
      relay_push_clients_connected{cluster="cluster-1"} 1
      relay_push_clients_connected{cluster="cluster-2"} 1
      relay_push_clients_warmed{cluster="cluster-1"} 1
      relay_push_clients_warmed{cluster="cluster-2"} 1
      
    4. Take snapshots in case you want to refer to the logs later, such as to open a Support issue.
      curl localhost:9091/snapshots/input -o input_snapshot.json 
      curl localhost:9091/snapshots/output -o output_snapshot.json
      
  3. Check that the Gloo Gateway management services are running.

    1. Send a gRPC request to the Gloo Gateway management server.

      kubectl get secret --context $MGMT_CONTEXT -n gloo-mesh relay-root-tls-secret -o json | jq -r '.data["ca.crt"]' | base64 -d  > ca.crt
      grpcurl -authority enterprise-networking.gloo-mesh --cacert=./ca.crt $INGRESS_GW_IP:9900 list
      
    2. Verify that the following services are listed.

      envoy.service.accesslog.v3.AccessLogService
      envoy.service.metrics.v2.MetricsService
      envoy.service.metrics.v3.MetricsService
      grpc.reflection.v1alpha.ServerReflection
      relay.multicluster.skv2.solo.io.RelayCertificateService
      relay.multicluster.skv2.solo.io.RelayPullServer
      relay.multicluster.skv2.solo.io.RelayPushServer
      
  4. Check the logs on the gloo-mesh-mgmt-server pod on the management cluster for communication from the workload cluster.

    kubectl -n gloo-mesh --context $MGMT_CONTEXT logs deployment/gloo-mesh-mgmt-server | grep $REMOTE_CLUSTER
    

    Example output:

    {"level":"debug","ts":1616160185.5505846,"logger":"pull-resource-deltas","msg":"recieved request for delta: response_nonce:\"1\"","metadata":{":authority":["gloo-mesh-mgmt-server.gloo-mesh.svc.cluster.local:11100"],"content-type":["application/grpc"],"user-agent":["grpc-go/1.34.0"],"x-cluster-id":["remote.cluster"]},"peer":"10.244.0.17:40074"}