Managed Istio installations
Troubleshoot the installation, upgrade, or uninstallation of Istio components.
If your installation settings are configured incorrectly, the Gloo agent might not start or you might experience issues with other Istio components such as gateways.
Before reviewing these specific Istio operator topics, try Debugging Istio.The following topics include troubelshooting information for either the IstioLifecycleManager and GatewayLifecycleManager, or the Gloo Operator. Be sure to follow the topics for the method of Istio management that you used.
IstioLifecycleManager does not fully uninstall or namespaces stuck in terminating
What’s happening
You installed the istiod
control plane and gateways by using the Gloo Istio lifecycle manager. When you follow the steps to uninstall Istio with the Istio lifecycle manager, the operator namespace, such as gm-iop-1-25
, in one or more workload clusters is stuck in Terminating
.
kubectl get ns --context ${REMOTE_CONTEXT}
Example output:
NAME STATUS AGE
bookinfo Active 132d
default Active 132d
global Active 123d
gloo-mesh Active 59d
gm-iop-1-25 Terminating 59d
helloworld Active 132d
httpbin Active 132d
kube-node-lease Active 132d
kube-public Active 132d
kube-system Active 132d
Why it’s happening
This error can happen when the istiod
pod is deleted before the Istio or gateway lifecycle manager is able to finish uninstalling all Istio resources. This error might also happen if you repeatedly overwrite the same IstioLifecycleManager
or GatewayLifecycleManager
resource with different revisions and configuration.
How to fix it
In one workload cluster, get the
IstioOperator
resources in the operator namespace, such as the following command.kubectl get istiooperator.install.istio.io -n gm-iop-1-25 --context ${REMOTE_CONTEXT}
Example output, in which operators still exist for an east-west gateway and the
istiod
control plane:NAME REVISION STATUS AGE istio-eastwestgateway-1-25 1-25 HEALTHY 59d istiod-control-plane 1-25 HEALTHY 59d
For each
IstioOperator
resource, repeat these steps to remove the finalizer.- Edit the resource, such as the following command. You might have a different revision or name for the operator resource.
kubectl edit istiooperator.install.istio.io -n gm-iop-1-25 istiod-control-plane --context ${REMOTE_CONTEXT}
- Delete all lines in the
finalizers
section, including thefinalizers:
line.apiVersion: install.istio.io/v1alpha1 kind: IstioOperator metadata: creationTimestamp: "2023-07-27T15:41:55Z" finalizers: - istio-finalizer.install.istio.io generation: 3 ...
- Save and close the file. The resource is now deleted.
- Repeat these steps for any other
IstioOperator
resources in the namespace.
- Edit the resource, such as the following command. You might have a different revision or name for the operator resource.
After all resources in the operator namespace are deleted, verify that the operator namespace no longer exists.
kubectl get ns --context ${REMOTE_CONTEXT}
Example output:
NAME STATUS AGE bookinfo Active 132d default Active 132d global Active 123d gloo-mesh Active 59d helloworld Active 132d httpbin Active 132d kube-node-lease Active 132d kube-public Active 132d kube-system Active 132d
Repeat steps 1 - 3 for each workload cluster where you used the Istio lifecycle manager to install Istio.
Conflicting IstioLifecycleManager errors
What’s happening
- When you check the management logs, you see an Istio lifecycle manager error similar to the following:
failed to upsert snapshot for istio lifecycle manager","parent":"gloo-platform~gloo-mesh~cluster-1~admin.gloo.solo.io/v2, Kind=IstioLifecycleManager","err":"conflicting IOPs have been created from a different parent Istio lifecycle manager ...
- When you check the Istio lifecycle manager status, you see a conflicting message similar to the following:Example output:
kubectl get IstioLifecycleManager -A -o yaml
status: clusters: cluster-1: installations: auto: message: 'Another conflicting IstioLifecycleManager has previously been used to install a IstioOperators in this cluster, please check on uninstall of : gm-iop-1-25.gloo-platform' observedRevision: 1-25 state: FAILED
Why it’s happening
You might have a conflicting IstioLifecycleManager
or GatewayLifecycleManager
resource. For example, you might have uninstalled a previous IstioLifecycleManager
resource that did not completely delete. This error can happen when the namespace is deleted before the Istio lifecycle manager is able to finish uninstalling all Istio resources.
How to fix it
- List the
IstioLifecycleManager
orGatewayLifecycleManager
resources. If you have multiple in the same namespace for the same purpose, try uninstalling the one that you no longer need.kubectl get IstioLifecycleManager -A
- If you already uninstalled the
istiod
control plane or gateways by using the Istio lifecycle manager, try to manually replace theIstioLifecycleManager
orGatewayLifecycleManager
CR by deleting the existing resource, such as the following example command.kubectl delete GatewayLifecycleManager istio-eastwestgateway -n gloo-mesh --context $MGMT_CONTEXT
Agent crashes with IstioOperator error
What’s happening
Your Gloo agent does not start and is in a CrashLoopBackOff
state.
When you check the agent logs, you see an error similar to the following:
failed to list *v1alpha1.IstioOperator: unknown field \"target\" in v1alpha1.ResourceMetricSource
...
{"level":"error","ts":1678198470.502656,"logger":"controller.input-ConfigMap-cache","caller":"controller/controller.go:208","msg":"Could not wait for Cache to sync","error":"failed to wait for input-ConfigMap-cache caches to sync: timed out waiting for cache to be synced","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:208\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:234\nsigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/manager/runnable_group.go:218"}
Why it’s happening
You might have an error in your Istio operator configuration, such as using a field that is deprecated or no longer supported.
By default, Gloo expects the Istio ingress gateway to have the name istio-ingressgateway
. If you use a custom name for the ingress gateway, you cannot set up horizontal pod autoscaling (HPA) for the Istio gateways.
How to fix it
- Review your Istio operator configuration file for any deprecated or unsupported fields. For example, review the upstream Istio
operator.proto
file for unsupported fields in your version of Istio. - If you use a custom Istio ingress gateway name, remove the HPA section from your Istio operator gateway configuration file. The following example shows the
hpaSpec
in theistio-eastwestgateway
andistio-ingressgateway
sections.... components: ingressGateways: - name: <custom-name> namespace: gloo-mesh-gateways enabled: true k8s: hpaSpec: maxReplicas: 5 metrics: - resource: name: cpu targetAverageUtilization: 60 type: Resource ... - name:
namespace: gloo-mesh-gateways enabled: true k8s: hpaSpec: maxReplicas: 5 metrics: - resource: name: cpu targetAverageUtilization: 60 type: Resource - To add back similar HPA functionality, set autoscaling minimum and maximum values to the gateway configuration. The following example shows both the
istio-eastwestgateway
andistio-ingressgateway
sections. Note that thetargetAverageUtilization
field is also removed, because that field is deprecated in Istio 1.14 and later.... spec: values: gateways: istio-ingressgateway: autoscaleMin: 2 autoscaleMax: 5 ... istio-eastwestgateway: autoscaleMin: 2 autoscaleMax: 5
- Update your Istio operator configuration.
Check Gloo Operator and ServiceMeshController installations
If you used the Gloo Operator to install a service mesh, use the ServiceMeshController
resource to monitor the status of the Gloo Operator installation for each Istio component, such as istiod
or ztunnel.
Verify that the Gloo Operator pod is running, and has no errors.
kubectl get pods -n gloo-mesh -l app.kubernetes.io/name=gloo-operator
Describe the ServiceMeshController resource in your cluster.
kubectl describe servicemeshcontroller -n gloo-mesh managed-istio
In the
Status
section of this example output, all statuses areTrue
and the phase isSUCCEEDED
, which indicates that the installation values are valid and the installation process successfully completed.... Status: Conditions: Last Transition Time: 2024-12-27T20:47:01Z Message: Manifests initialized Observed Generation: 1 Reason: ManifestsInitialized Status: True Type: Initialized Last Transition Time: 2024-12-27T20:47:02Z Message: CRDs installed Observed Generation: 1 Reason: CRDInstalled Status: True Type: CRDInstalled Last Transition Time: 2024-12-27T20:47:02Z Message: Deployment succeeded Observed Generation: 1 Reason: DeploymentSucceeded Status: True Type: ControlPlaneDeployed Last Transition Time: 2024-12-27T20:47:02Z Message: Deployment succeeded Observed Generation: 1 Reason: DeploymentSucceeded Status: True Type: CNIDeployed Last Transition Time: 2024-12-27T20:47:02Z Message: Deployment succeeded Observed Generation: 1 Reason: DeploymentSucceeded Status: True Type: WebhookDeployed Last Transition Time: 2024-12-27T20:47:02Z Message: All conditions are met Observed Generation: 1 Reason: SystemReady Status: True Type: Ready Phase: SUCCEEDED Events: <none>
To see the Helm values that were used during the installation or upgrade attempt, check the
Spec
section. Be sure that you set all required fields and that the values are valid by referring to the ServiceMeshController reference.- If you set the value of
installNamespace
to a namespace other thangloo-system
,gloo-mesh
, oristio-system
, you must include the--set manager.env.WATCH_NAMESPACES=<namespace>
setting. - If the values that you want to set are not available in the ServiceMeshController, review the available settings provided by the
gloo-extensions-config
configmap.
- If you set the value of
If you use the
gloo-extensions-config
configmap for advanced settings, be sure that the configmap is explicitly namedgloo-extensions-config
, and that it exists in the same namespace as thegloo-operator
.