Debugging Gloo Mesh
Debug the Gloo Mesh management server
Debug the Gloo Mesh management server to verify your relay connection between the server and agent.
Debug Gloo Mesh agents
Debug the Gloo Mesh agent to make sure that your custom resource configurations can be applied.
Debug Gloo Mesh resources
Check your Gloo Mesh custom resources for issues that might impact your network traffic.
Debug routes
Check for issues if you cannot reach your app on a certain host and port.
Debug Redis
Restart Redis if you encounter persistent state issues for your Gloo Mesh custom resources.
Debug the management server and relay connection
-
Check that the relay connection between the management server and workload agents is healthy.
- Forward port 9091 of the
gloo-mesh-mgmt-server
pod to your localhost.kubectl port-forward -n gloo-mesh --context $MGMT_CONTEXT deploy/gloo-mesh-mgmt-server 9091
- In your browser, connect to http://localhost:9091/metrics.
- In the metrics UI, look for the following lines. If the values are
1
, the agents in the workload clusters are successfully registered with the management server. If the values are0
, the agents are not successfully connected. Thewarmed
successful message indicates that the management server can push configuration to the agents.relay_pull_clients_connected{cluster="cluster-1"} 1 relay_pull_clients_connected{cluster="cluster-2"} 1 relay_push_clients_connected{cluster="cluster-1"} 1 relay_push_clients_connected{cluster="cluster-2"} 1 relay_push_clients_warmed{cluster="cluster-1"} 1 relay_push_clients_warmed{cluster="cluster-2"} 1
- Take snapshots in case you want to refer to the logs later, such as to open a Support issue.
curl localhost:9091/snapshots/input -o input_snapshot.json curl localhost:9091/snapshots/output -o output_snapshot.json
- Forward port 9091 of the
-
Check that the Gloo Mesh management services are running.
-
Send a gRPC request to the Gloo Mesh management server.
kubectl get secret --context $MGMT_CONTEXT -n gloo-mesh relay-root-tls-secret -o json | jq -r '.data["ca.crt"]' | base64 -d > ca.crt grpcurl -authority enterprise-networking.gloo-mesh --cacert=./ca.crt 35.184.21.94:9900 list
-
Verify that the following services are listed.
envoy.service.accesslog.v3.AccessLogService envoy.service.metrics.v2.MetricsService envoy.service.metrics.v3.MetricsService grpc.reflection.v1alpha.ServerReflection relay.multicluster.skv2.solo.io.RelayCertificateService relay.multicluster.skv2.solo.io.RelayPullServer relay.multicluster.skv2.solo.io.RelayPushServer
-
-
Check the logs on the
gloo-mesh-mgmt-server
pod on the management cluster for communication from the workload cluster.kubectl -n gloo-mesh --context $MGMT_CONTEXT logs deployment/gloo-mesh-mgmt-server | grep $REMOTE_CLUSTER
Example output:
{"level":"debug","ts":1616160185.5505846,"logger":"pull-resource-deltas","msg":"recieved request for delta: response_nonce:\"1\"","metadata":{":authority":["gloo-mesh-mgmt-server.gloo-mesh.svc.cluster.local:11100"],"content-type":["application/grpc"],"user-agent":["grpc-go/1.34.0"],"x-cluster-id":["remote.cluster"]},"peer":"10.244.0.17:40074"}
To increase the verbosity of the logs, you can patch the management server deployment.
kubectl patch deploy -n gloo-mesh gloo-mesh-mgmt-server --context $MGMT_CONTEXT --type "json" -p '[{"op":"add","path":"/spec/template/spec/containers/0/args/-","value":"--verbose=true"}]'
Debug the agent
The Gloo Mesh agent keeps track of changes to Gloo Mesh custom resources in the cluster. It reports these changes back to the management cluster.
-
Verify that the Gloo Mesh agent pod is running.
kubectl get pods -n gloo-mesh --context ${REMOTE_CONTEXT1}
If not, describe the pod and look for error messages.
kubectl describe pod -n gloo-mesh -l app=gloo-mesh-agent --context ${REMOTE_CONTEXT1}
-
Check the logs of the Gloo Mesh agent in your workload cluster. Optionally, you can format the output with
jq
or save it in a local file so that you can read and analyze the output more easily.kubectl logs --context ${REMOTE_CONTEXT1} -n gloo-mesh deploy/gloo-mesh-agent | jq
-
In the logs, look for
"err"
messages. For example, you might see a message similar to the following.Message Description Steps to resolve "err": "RouteTable.networking.gloo.solo.io \"bookinfo\" not found",
Gloo Mesh expected to find a resource such as a route table named bookinfo
. You can check theresource
field to see which namespace the resource was expected in.If you recently deleted the resource, wait to see if the error resolves itself. If not, try debugging the resource. "err": "Operation cannot be fulfilled on virtualgateways.networking.gloo.solo.io \"north-south-gw\": the object has been modified; please apply your changes to the latest version and try again
Gloo Mesh is trying to reconcile your changes to the resource, such as updating a virtual gateway to listen on a different port. If you recently updated the resource, wait to see if the error resolves itself. If not, try debugging the resource. -
If you see many error messages and are unsure of the timeline of events, you can try to reset the logs. To do so, you delete the agent pod. You cannot make changes to your Gloo Mesh resources while the pod is not running.
kubectl delete pod -n gloo-mesh -l app=gloo-mesh-agent --context ${REMOTE_CONTEXT1}
-
Re-run the first command to get the logs of your new Gloo Mesh agent. Note that the agent might take a few minutes to produce logs.
-
If you continue to see error messages that indicate state reconciliation issues, try debugging the Gloo Mesh resource.
Debug Gloo Mesh resources
Sometimes, you might create Gloo Mesh resources and expect a certain result, such as a policy applying to traffic or a gateway listening for traffic. If you do not get the expected result, try the following general debugging steps.
- Check the Gloo Mesh resources in your management and remote clusters. You might find the following commands useful.
kubectl get workspaces,workspacesettings,roottrustpolicies,dashboards,kubernetesclusters -A --context $MGMT_CONTEXT
kubectl get ratelimitserverconfigs,RatelimitConfigs,ratelimitserversettings,ratelimitclientconfigs,ratelimitpolicies,wasmdeploymentpolicies,externalendpoints,externalservices,accesslogpolicies,failoverpolicies,faultinjectionpolicies,outlierdetectionpolicies -A --context $REMOTE_CONTEXT
kubectl get virtualgateways,routetables,virtualdestinations,virtualservices,workspacesettings,retrytimeoutpolicies,accesspolicies,corspolicies,csrfpolicies,extauthpolicies,mirrorpolicies,transformationpolicies -A --context $REMOTE_CONTEXT
- Describe the resource, and look at the status, events, and other areas for more information.
kubectl describe virtualgateway $GATEWAY_NAME -n $NAMESPACE --context $REMOTE_CONTEXT
- Verify that the resource is in the workspace that you expect it to be in. You can check the resource's namespace against the namespaces that are included in the workspace resource on the management cluster.
- Verify that related resources are in the same workspace, or are exported and imported appropriately. For example, your virtual gateway must be in the same workspace as the route table and policy that you want it to work with.
- If applicable, verify that the ports are set up appropriately on the resource and the backing service. For example, your virtual gateway might listen on port 80, which matches the port of the Kubernetes service for the gateway deployment.
- Check the logs of the management server in the management cluster for accepted or translated resource messages.
- Check the logs of agent pods on the remote cluster that the resource is created in.
- If you upgraded Gloo Mesh versions recently, make sure that you applied the CRDs as part of the upgrade.
- If you continue to see error messages that indicate state reconciliation issues, try restarting the Redis pod.
Debug routes
Sometimes, you might notice errors when you try to access your services over a route.
-
Try to send a request to your destination with verbose mode and look for error messages. For example, you might curl the path to your app on the ingress gateway.
curl -vik https://${INGRESS_GW_IP}:443/ratings/1 -H "Host: www.example.com"
Example output:
Message Description Steps to resolve Connection reset by peer
The destination sent a reset ( RST
) packet and dropped the connection. No TLS handshake was established. This issue often indicates a TLS issue.Skip to Step 5 to check the virtual gateway configuration for the tls
section. Make sure the secret exists with valid credentials.Connection refused
Your gateway might not be listening on the right port. Skip to Step 5 to verify that the gateway listens on the correct port for the correct host. successfully set certificate...SSL_ERROR_SYSCALL in connection
Your TLS secrets are set up, but the hostname might not match the hosts in your virtual gateway configuration. For example, the secret might be a wildcard *.example.com
, but the virtual gateway configuration specifies onlywww.example.com
.Make sure that the TLS secrets for the gateway are in the same namespace as the ingress gateway. Then, make sure that the hostname that the secret is for matches the hostnames in the virtual gateway configuration. -
Verify that your Gloo Mesh setup is working correctly.
- Check the management server.
- Check the agent for the clusters where your resources run.
- Make sure that your management server and agent run the same minor version of Gloo Mesh.
meshctl version --kubecontext ${MGMT_CONTEXT} meshctl version --kubecontext ${REMOTE_CONTEXT1} meshctl version --kubecontext ${REMOTE_CONTEXT2}
-
Verify that your Istio gateway pods are in a
Running
state.kubectl get pods -A --context ${REMOTE_CONTEXT1}
Example output:
NAMESPACE NAME READY STATUS RESTARTS AGE istio-system istio-eastwestgateway-6559bbc949-xrfz6 1/1 Running 0 4d14h istio-system istio-ingressgateway-64c544cfff-jgnkp 1/1 Running 0 4d16h
If not, describe the pods, get the logs, and look for error messages.
kubectl describe pod -n istio-system --context ${REMOTE_CONTEXT1} -l istio=ingressgateway
kubectl logs --context ${REMOTE_CONTEXT1} -n istio-system deploy/istio-ingressgateway > logs-ingressgateway.txt
For example, a
filter_chain_not_found
message indicates that the request does not have a matching SNI in the gateway. Make sure that the TLS secrets for the gateway are in the same namespace as the ingress gateway. Then, make sure that the hostname that the secret is for matches the hostnames in the virtual gateway configuration. -
If you are sending a request directly to the gateway and not to a host, check the service details for the IP address that the gateway is exposed on. Verify that you use this IP address in your request to the route.
kubectl get svc -n istio-system --context ${REMOTE_CONTEXT1}
In the following example, the external IP addresses are as follows:
istio-eastwestgateway
gateway for traffic within the service mesh:35.xxx.xx.x1
istio-ingressgateway
gateway for traffic from outside the service mesh:35.xxx.x.xx9
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE istio-eastwestgateway LoadBalancer 10.xxx.x.xx8 35.xxx.xx.x1 15021:30974/TCP,15443:30522/TCP 5d18h istio-ingressgateway LoadBalancer 10.xxx.xx.xx3 35.xxx.x.xx9 15021:31225/TCP,80:30381/TCP,443:31078/TCP,15443:30649/TCP 5d18h
-
If you are sending the request to a host, check your Gloo Mesh virtual gateway resource. Look for the information in the following table.
kubectl get virtualgateways -A -o yaml --context ${REMOTE_CONTEXT1}
Example output:
Review the following table to understand this configuration.apiVersion: networking.gloo.solo.io/v2 kind: VirtualGateway metadata: name: north-south-gw namespace: bookinfo spec: listeners: - allowedRouteTables: - host: www.example.com http: {} port: number: 443 tls: mode: SIMPLE secretName: gw-ssl-1-secret workloads: - selector: labels: istio: ingressgateway
Setting Description namespace
Make sure that the gateway's namespace is in the same workspace as the route's app. Or, the gateway workspace must import the route's app. allowedRouteTables
Make sure that the host that you call is allowed. port
Check that the port matches the port that you call. For example, if you sent a request along an https://
URL, but the port is80
(for HTTP), the request fails.tls
Check the TLS details. If the gateway listens on port 443
for HTTPS traffic, this section is required.workloads
Make sure that the virtual gateway selects the gateway workload that you checked earlier. -
If you are sending the request to a host, check your Gloo Mesh route table resource. Look for the information in the following table.
kubectl get routetables -A -o yaml --context ${REMOTE_CONTEXT1}
Example output:
Review the following table to understand this configuration.apiVersion: networking.gloo.solo.io/v2 kind: RouteTable metadata: name: www-example-com namespace: bookinfo spec: defaultDestination: port: number: 9080 ref: name: ratings namespace: bookinfo hosts: - www.example.com http: - forwardTo: {} labels: "no": auth matchers: - headers: - name: noauth value: "true" name: ratings-ingress-no-auth - forwardTo: {} labels: route: ratings name: ratings-ingress virtualGateways: - name: north-south-gw
Setting Description namespace
Make sure that the route table's namespace is in the same workspace as the route's app. Or, the route table's workspace must import the route's app. defaultDestination
Make sure that the details for your app are correct, such as the port and reference. host
Verify that the host you call is included in the route table. forwardTo
Check the forward to details. For example, the route table might be set up to forward requests with certain headers or labels to a different service. virtualGateways
Make sure that the route table selects the virtual gateway that you checked earlier. -
Check that the underlying Istio resources are working properly. For example, your gateway might be missing a secret.
istioctl analyze --context ${REMOTE_CONTEXT1} -n istio-system
Restart Redis
After debugging the management server and agent pods, you might still see error messages related to reconciling state. Gloo Mesh stores the state of its resources in a Redis pod. You can try to restart the pod to resolve these reconciliation issues.
-
Enable port forwarding on port 9091 of the
gloo-mesh-mgmt-server
pod to your localhost.kubectl port-forward -n gloo-mesh --context $MGMT_CONTEXT deploy/gloo-mesh-mgmt-server 9091
-
Take snapshots of your current state in case you want to refer to the logs later, such as to open a Support issue.
curl localhost:9091/snapshots/input -o input_snapshot.json curl localhost:9091/snapshots/output -o output_snapshot.json
-
Get the
gloo-mesh-redis-*
pod.kubectl get pods -n gloo-mesh --context $MGMT_CONTEXT
Example output:
NAME READY STATUS RESTARTS AGE gloo-mesh-mgmt-server-c7cc4dd77-8shdw 1/1 Running 0 4d19h gloo-mesh-redis-794d79b7df-28mcr 1/1 Running 0 4d19h gloo-mesh-ui-c8cfd5fdd-mdscf 3/3 Running 0 4d19h prometheus-server-647b488bb-ns748 2/2 Running 0 4d19h
-
Delete the Redis pod.
kubectl delete pod -n gloo-mesh --context $MGMT_CONTEXT $POD
-
Try checking your Gloo Mesh management server or agent logs to see if the reconciliation errors are resolved.