If your installation settings are configured incorrectly, the Gloo agent might not start or you might experience issues with other Istio components such as gateways.

Check Istio installation status and values

Use the ClusterIstioInstallation resource to monitor the status of the Helm installation for each Istio component, such as istiod and gateways. You can use this information alongside the status of the Istio and Gateway Lifecycle Manager resources to troubleshoot installation or upgrade issues in your environment.

  1. List the ClusterIstioInstallation resources in your environment, and find the resource for the Istio installation you want to investigate. Gloo Mesh Core creates these internal resources to represent the state of your Istio installations in each workload cluster.

      kubectl get ClusterIstioInstallation -A --context ${REMOTE_CONTEXT}
      
  2. Describe the resource.

      kubectl get ClusterIstioInstallation <name> -n <namespace> --context ${REMOTE_CONTEXT} -o yaml
      
  3. In the status.helm.upgradeInstall.releases section, check the state of each component’s Helm chart installation. The state can help you determine whether your Istio installations are completed.

    StateDescription
    PENDINGIstio installation settings are present on the cluster, but are not yet installed.
    INSTALLINGIstio components are currently installing in the cluster.
    UPGRADINGIstio components are currently upgrading to use updated settings.
    UNINSTALLINGIstio components are uninstalling from the cluster.
    FINISHEDThe installation or upgrade process is finished.
    FAILEDThe installation or upgrade process has failed. Check the statuses of the IstioLifecycleManager and GatewayLifecycleManager resources, or if you manually installed with Helm, the status of the Helm release. You can also check the insights for any Istio-related errors.

    In this example output, the Helm charts for the base, istiod, and cni Istio components all have a state of FINISHED.

      
    status:
      helm:
        upgradeInstall:
        - releases:
          - chartUrl: <Solo_Helm_repo>
            name: istio-base-default
            namespace: gloo-mesh
            state: FINISHED
            version: 1.22.3-solo
          - chartUrl: <Solo_Helm_repo>
            name: istio-istiod-default
            namespace: gloo-mesh
            state: FINISHED
            version: 1.22.3-solo
          - chartUrl: <Solo_Helm_repo>
            name: istio-cni-default
            namespace: gloo-mesh
            state: FINISHED
            version: 1.22.3-solo
      
  4. To see the Helm values that were used during the installation or upgrade attempt, check the status.helm.upgradeInstall.values section. For example, during an Istio upgrade, you can verify that any updated settings you provide were correctly translated by the management cluster and sent to the workload clusters.

IstioLifecycleManager does not fully uninstall or namespaces stuck in terminating

What’s happening

You installed the istiod control plane and gateways by using the Gloo Istio lifecycle manager. When you follow the steps to uninstall Istio with the Istio lifecycle manager, the operator namespace, such as gm-iop-1-22, in one or more workload clusters is stuck in Terminating.

  kubectl get ns --context ${REMOTE_CONTEXT}
  

Example output:

  NAME                 STATUS        AGE
bookinfo             Active        132d
default              Active        132d
global               Active        123d
gloo-mesh            Active        59d
gm-iop-1-22          Terminating   59d
helloworld           Active        132d
httpbin              Active        132d
kube-node-lease      Active        132d
kube-public          Active        132d
kube-system          Active        132d
  

Why it’s happening

This error can happen when the istiod pod is deleted before the Istio or gateway lifecycle manager is able to finish uninstalling all Istio resources. This error might also happen if you repeatedly overwrite the same IstioLifecycleManager or GatewayLifecycleManager resource with different revisions and configuration.

How to fix it

  1. In one workload cluster, get the IstioOperator resources in the operator namespace, such as the following command.

      kubectl get istiooperator.install.istio.io -n gm-iop-1-22 --context ${REMOTE_CONTEXT}
      

    Example output, in which operators still exist for the istiod control plane:

      NAME                           REVISION   STATUS    AGE
    istiod-control-plane           1-22       HEALTHY   59d
      
  2. For each IstioOperator resource, repeat these steps to remove the finalizer.

    1. Edit the resource, such as the following command. You might have a different revision or name for the operator resource.
        kubectl edit istiooperator.install.istio.io -n gm-iop-1-22 istiod-control-plane --context ${REMOTE_CONTEXT}
        
    2. Delete all lines in the finalizers section, including the finalizers: line.
        apiVersion: install.istio.io/v1alpha1
      kind: IstioOperator
      metadata:
        annotations:
          cluster.solo.io/cluster: cluster1
        creationTimestamp: "2023-07-27T15:41:55Z"
        finalizers:
        - istio-finalizer.install.istio.io
        generation: 3
        ...
        
    3. Save and close the file. The resource is now deleted.
    4. Repeat these steps for any other IstioOperator resources in the namespace.
  3. After all resources in the operator namespace are deleted, verify that the operator namespace no longer exists.

      kubectl get ns --context ${REMOTE_CONTEXT}
      

    Example output:

      NAME                 STATUS        AGE
    bookinfo             Active        132d
    default              Active        132d
    global               Active        123d
    gloo-mesh            Active        59d
    helloworld           Active        132d
    httpbin              Active        132d
    kube-node-lease      Active        132d
    kube-public          Active        132d
    kube-system          Active        132d
      
  4. Repeat steps 1 - 3 for each workload cluster where you used the Istio lifecycle manager to install Istio.

Conflicting IstioLifecycleManager errors

What’s happening

  1. When you check the management logs, you see an Istio lifecycle manager error similar to the following:
      failed to upsert snapshot for istio lifecycle manager","parent":"gloo-platform~gloo-mesh~cluster-1~admin.gloo.solo.io/v2, Kind=IstioLifecycleManager","err":"conflicting IOPs have been created from a different parent Istio lifecycle manager
    ...
      
  2. When you check the Istio lifecycle manager status, you see a conflicting message similar to the following:
      kubectl get IstioLifecycleManager -A -o yaml
      
    Example output:
      
    status:
      clusters:
        cluster-1:
          installations:
            auto:
              message: 'Another conflicting IstioLifecycleManager has previously been
                used to install a IstioOperators in this cluster, please check on uninstall
                of : gm-iop-1-22.gloo-platform'
              observedRevision: 1-22
              state: FAILED
      

Why it’s happening

You might have a conflicting IstioLifecycleManager or GatewayLifecycleManager resource. For example, you might have uninstalled a previous IstioLifecycleManager resource that did not completely delete. This error can happen when the namespace is deleted before the Istio lifecycle manager is able to finish uninstalling all Istio resources.

How to fix it

  1. List the IstioLifecycleManager or GatewayLifecycleManager resources. If you have multiple in the same namespace for the same purpose, try uninstalling the one that you no longer need.
      kubectl get IstioLifecycleManager -A
      
  2. If you already uninstalled the istiod control plane or gateways by using the Istio lifecycle manager, try to manually replace the IstioLifecycleManager or GatewayLifecycleManager CR by deleting the existing resource, such as the following example command.
      kubectl delete GatewayLifecycleManager istio-eastwestgateway -n gloo-mesh --context $MGMT_CONTEXT
      

Agent crashes with IstioOperator error

What’s happening

Your Gloo agent does not start and is in a CrashLoopBackOff state.

When you check the agent logs, you see an error similar to the following:

  failed to list *v1alpha1.IstioOperator: unknown field \"target\" in v1alpha1.ResourceMetricSource
...
{"level":"error","ts":1678198470.502656,"logger":"controller.input-ConfigMap-cache","caller":"controller/controller.go:208","msg":"Could not wait for Cache to sync","error":"failed to wait for input-ConfigMap-cache caches to sync: timed out waiting for cache to be synced","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:208\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:234\nsigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/manager/runnable_group.go:218"}
  

Why it’s happening

You might have an error in your Istio operator configuration, such as using a field that is deprecated or no longer supported.

By default, Gloo expects the Istio ingress gateway to have the name istio-ingressgateway. If you use a custom name for the ingress gateway, you cannot set up horizontal pod autoscaling (HPA) for the Istio gateways.

How to fix it

  1. Review your Istio operator configuration file for any deprecated or unsupported fields. For example, review the upstream Istio operator.proto file for unsupported fields in your version of Istio.
  2. If you use a custom Istio ingress gateway name, remove the HPA section from your Istio operator gateway configuration file. The following example shows the hpaSpec in the istio-eastwestgateway and istio-ingressgateway sections.
      ...
    components:
     ingressGateways:
       - name: <custom-name>
         namespace: gloo-mesh-gateways
         enabled: true
         k8s:
           hpaSpec:
             maxReplicas: 5
             metrics:
               - resource:
                   name: cpu
                   targetAverageUtilization: 60
                 type: Resource
      
  3. To add back similar HPA functionality, set autoscaling minimum and maximum values to the gateway configuration. The following example shows both the istio-eastwestgateway and istio-ingressgateway sections. Note that the targetAverageUtilization field is also removed, because that field is deprecated in Istio 1.14 and later.
      ...
    spec:
      values:
        gateways:
          istio-ingressgateway:
            autoscaleMin: 2
            autoscaleMax: 5
      
  4. Update your Istio operator configuration.