Istio and gateway lifecycle manager
Troubleshoot the installation, upgrade, or uninstallation of Istio components.
If your installation settings are configured incorrectly, the Gloo agent might not start or you might experience issues with other Istio components such as gateways.
Before reviewing these specific Istio operator topics, try Debugging Istio.
Check Istio installation status and values
Use the ClusterIstioInstallation
resource to monitor the status of the Helm installation for each Istio component, such as istiod
and gateways. You can use this information alongside the status of the Istio and Gateway Lifecycle Manager resources to troubleshoot installation or upgrade issues in your environment.
List the
ClusterIstioInstallation
resources in your environment, and find the resource for the Istio installation you want to investigate. Gloo Mesh Core creates these internal resources to represent the state of your Istio installations in each workload cluster.kubectl get ClusterIstioInstallation -A --context ${REMOTE_CONTEXT}
Describe the resource.
kubectl get ClusterIstioInstallation <name> -n <namespace> --context ${REMOTE_CONTEXT} -o yaml
In the
status.helm.upgradeInstall.releases
section, check thestate
of each component’s Helm chart installation. The state can help you determine whether your Istio installations are completed.State Description PENDING
Istio installation settings are present on the cluster, but are not yet installed. INSTALLING
Istio components are currently installing in the cluster. UPGRADING
Istio components are currently upgrading to use updated settings. UNINSTALLING
Istio components are uninstalling from the cluster. FINISHED
The installation or upgrade process is finished. FAILED
The installation or upgrade process has failed. Check the statuses of the IstioLifecycleManager
andGatewayLifecycleManager
resources, or if you manually installed with Helm, the status of the Helm release. You can also check the insights for any Istio-related errors.In this example output, the Helm charts for the
base
,istiod
, andcni
Istio components all have a state ofFINISHED
.status: helm: upgradeInstall: - releases: - chartUrl: <Solo_Helm_repo> name: istio-base-default namespace: gloo-mesh state: FINISHED version: 1.22.3-solo - chartUrl: <Solo_Helm_repo> name: istio-istiod-default namespace: gloo-mesh state: FINISHED version: 1.22.3-solo - chartUrl: <Solo_Helm_repo> name: istio-cni-default namespace: gloo-mesh state: FINISHED version: 1.22.3-solo
To see the Helm values that were used during the installation or upgrade attempt, check the
status.helm.upgradeInstall.values
section. For example, during an Istio upgrade, you can verify that any updated settings you provide were correctly translated by the management cluster and sent to the workload clusters.
IstioLifecycleManager does not fully uninstall or namespaces stuck in terminating
What’s happening
You installed the istiod
control plane and gateways by using the Gloo Istio lifecycle manager. When you follow the steps to uninstall Istio with the Istio lifecycle manager, the operator namespace, such as gm-iop-1-22
, in one or more workload clusters is stuck in Terminating
.
kubectl get ns --context ${REMOTE_CONTEXT}
Example output:
NAME STATUS AGE
bookinfo Active 132d
default Active 132d
global Active 123d
gloo-mesh Active 59d
gm-iop-1-22 Terminating 59d
helloworld Active 132d
httpbin Active 132d
kube-node-lease Active 132d
kube-public Active 132d
kube-system Active 132d
Why it’s happening
This error can happen when the istiod
pod is deleted before the Istio or gateway lifecycle manager is able to finish uninstalling all Istio resources. This error might also happen if you repeatedly overwrite the same IstioLifecycleManager
or GatewayLifecycleManager
resource with different revisions and configuration.
How to fix it
In one workload cluster, get the
IstioOperator
resources in the operator namespace, such as the following command.kubectl get istiooperator.install.istio.io -n gm-iop-1-22 --context ${REMOTE_CONTEXT}
Example output, in which operators still exist for the
istiod
control plane:NAME REVISION STATUS AGE istiod-control-plane 1-22 HEALTHY 59d
For each
IstioOperator
resource, repeat these steps to remove the finalizer.- Edit the resource, such as the following command. You might have a different revision or name for the operator resource.
kubectl edit istiooperator.install.istio.io -n gm-iop-1-22 istiod-control-plane --context ${REMOTE_CONTEXT}
- Delete all lines in the
finalizers
section, including thefinalizers:
line.apiVersion: install.istio.io/v1alpha1 kind: IstioOperator metadata: annotations: cluster.solo.io/cluster: cluster1 creationTimestamp: "2023-07-27T15:41:55Z" finalizers: - istio-finalizer.install.istio.io generation: 3 ...
- Save and close the file. The resource is now deleted.
- Repeat these steps for any other
IstioOperator
resources in the namespace.
- Edit the resource, such as the following command. You might have a different revision or name for the operator resource.
After all resources in the operator namespace are deleted, verify that the operator namespace no longer exists.
kubectl get ns --context ${REMOTE_CONTEXT}
Example output:
NAME STATUS AGE bookinfo Active 132d default Active 132d global Active 123d gloo-mesh Active 59d helloworld Active 132d httpbin Active 132d kube-node-lease Active 132d kube-public Active 132d kube-system Active 132d
Repeat steps 1 - 3 for each workload cluster where you used the Istio lifecycle manager to install Istio.
Conflicting IstioLifecycleManager errors
What’s happening
- When you check the management logs, you see an Istio lifecycle manager error similar to the following:
failed to upsert snapshot for istio lifecycle manager","parent":"gloo-platform~gloo-mesh~cluster-1~admin.gloo.solo.io/v2, Kind=IstioLifecycleManager","err":"conflicting IOPs have been created from a different parent Istio lifecycle manager ...
- When you check the Istio lifecycle manager status, you see a conflicting message similar to the following:Example output:
kubectl get IstioLifecycleManager -A -o yaml
status: clusters: cluster-1: installations: auto: message: 'Another conflicting IstioLifecycleManager has previously been used to install a IstioOperators in this cluster, please check on uninstall of : gm-iop-1-22.gloo-platform' observedRevision: 1-22 state: FAILED
Why it’s happening
You might have a conflicting IstioLifecycleManager
or GatewayLifecycleManager
resource. For example, you might have uninstalled a previous IstioLifecycleManager
resource that did not completely delete. This error can happen when the namespace is deleted before the Istio lifecycle manager is able to finish uninstalling all Istio resources.
How to fix it
- List the
IstioLifecycleManager
orGatewayLifecycleManager
resources. If you have multiple in the same namespace for the same purpose, try uninstalling the one that you no longer need.kubectl get IstioLifecycleManager -A
- If you already uninstalled the
istiod
control plane or gateways by using the Istio lifecycle manager, try to manually replace theIstioLifecycleManager
orGatewayLifecycleManager
CR by deleting the existing resource, such as the following example command.kubectl delete GatewayLifecycleManager istio-eastwestgateway -n gloo-mesh --context $MGMT_CONTEXT
Agent crashes with IstioOperator error
What’s happening
Your Gloo agent does not start and is in a CrashLoopBackOff
state.
When you check the agent logs, you see an error similar to the following:
failed to list *v1alpha1.IstioOperator: unknown field \"target\" in v1alpha1.ResourceMetricSource
...
{"level":"error","ts":1678198470.502656,"logger":"controller.input-ConfigMap-cache","caller":"controller/controller.go:208","msg":"Could not wait for Cache to sync","error":"failed to wait for input-ConfigMap-cache caches to sync: timed out waiting for cache to be synced","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:208\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:234\nsigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/manager/runnable_group.go:218"}
Why it’s happening
You might have an error in your Istio operator configuration, such as using a field that is deprecated or no longer supported.
By default, Gloo expects the Istio ingress gateway to have the name istio-ingressgateway
. If you use a custom name for the ingress gateway, you cannot set up horizontal pod autoscaling (HPA) for the Istio gateways.
How to fix it
- Review your Istio operator configuration file for any deprecated or unsupported fields. For example, review the upstream Istio
operator.proto
file for unsupported fields in your version of Istio. - If you use a custom Istio ingress gateway name, remove the HPA section from your Istio operator gateway configuration file. The following example shows the
hpaSpec
in theistio-eastwestgateway
andistio-ingressgateway
sections.... components: ingressGateways: - name: <custom-name> namespace: gloo-mesh-gateways enabled: true k8s: hpaSpec: maxReplicas: 5 metrics: - resource: name: cpu targetAverageUtilization: 60 type: Resource
- To add back similar HPA functionality, set autoscaling minimum and maximum values to the gateway configuration. The following example shows both the
istio-eastwestgateway
andistio-ingressgateway
sections. Note that thetargetAverageUtilization
field is also removed, because that field is deprecated in Istio 1.14 and later.... spec: values: gateways: istio-ingressgateway: autoscaleMin: 2 autoscaleMax: 5
- Update your Istio operator configuration.