Istio operator

Istio operators are used to install Istio, either as part of Gloo Platform's managed installation with Istio Lifecycle Manager or your own manual installation. If the Istio operator is configured incorrectly, the Gloo agent might not start or you might experience issues with other Istio components such as gateways.

Before reviewing these specific Istio operator topics, try Debugging Istio.

IstioLifecycleManager does not fully uninstall or namespaces stuck in terminating

What's happening

You installed the istiod control plane and gateways by using the Gloo Platform Istio lifecycle manager. When you follow the steps to uninstall Istio with the Istio lifecycle manager, the operator namespace, such as gm-iop-1-20, in one or more workload clusters is stuck in Terminating.

kubectl get ns --context $REMOTE_CONTEXT

Example output:

NAME                 STATUS        AGE
bookinfo             Active        132d
default              Active        132d
global               Active        123d
gloo-mesh            Active        59d
gm-iop-1-20          Terminating   59d
helloworld           Active        132d
httpbin              Active        132d
kube-node-lease      Active        132d
kube-public          Active        132d
kube-system          Active        132d

Why it's happening

This error can happen when the istiod pod is deleted before the Istio or gateway lifecycle manager is able to finish uninstalling all Istio resources. This error might also happen if you repeatedly overwrite the same IstioLifecycleManager or GatewayLifecycleManager resource with different revisions and configuration.

How to fix it

  1. In one workload cluster, get the IstioOperator resources in the operator namespace, such as the following command.

    kubectl get istiooperator.install.istio.io -n gm-iop-1-20 --context $REMOTE_CONTEXT1
    

    Example output, in which operators still exist for an east-west gateway and the istiod control plane:

    NAME                           REVISION   STATUS    AGE
    istio-eastwestgateway-1-20     1-20       HEALTHY   59d
    istiod-control-plane           1-20       HEALTHY   59d
    
  2. For each IstioOperator resource, repeat these steps to remove the finalizer.

    1. Edit the resource, such as the following command. You might have a different revision or name for the operator resource.
      kubectl edit istiooperator.install.istio.io -n gm-iop-1-20 istiod-control-plane --context $REMOTE_CONTEXT
      
    2. Delete all lines in the finalizers section, including the finalizers: line.
            apiVersion: install.istio.io/v1alpha1
            kind: IstioOperator
            metadata:
              annotations:
                cluster.solo.io/cluster: cluster1
              creationTimestamp: "2023-07-27T15:41:55Z"
              finalizers:
              - istio-finalizer.install.istio.io
              generation: 3
              ...
            
    3. Save and close the file. The resource is now deleted.
    4. Repeat these steps for any other IstioOperator resources in the namespace.
  3. After all resources in the operator namespace are deleted, verify that the operator namespace no longer exists.

    kubectl get ns --context $REMOTE_CONTEXT
    

    Example output:

    NAME                 STATUS        AGE
    bookinfo             Active        132d
    default              Active        132d
    global               Active        123d
    gloo-mesh            Active        59d
    helloworld           Active        132d
    httpbin              Active        132d
    kube-node-lease      Active        132d
    kube-public          Active        132d
    kube-system          Active        132d
    
  4. Repeat steps 1 - 3 for each workload cluster where you used the Istio lifecycle manager to install Istio.

Conflicting IstioLifecycleManager errors

What's happening

  1. When you check the management logs, you see an Istio lifecycle manager error similar to the following:
    failed to upsert snapshot for istio lifecycle manager","parent":"gloo-platform~gloo-mesh~cluster-1~admin.gloo.solo.io/v2, Kind=IstioLifecycleManager","err":"conflicting IOPs have been created from a different parent Istio lifecycle manager
    ...
    
  2. When you check the Istio lifecycle manager status, you see a conflicting message similar to the following:
    kubectl get IstioLifecycleManager -A -o yaml
    

    Example output:

    status:
      clusters:
        cluster-1:
          installations:
            auto:
              message: 'Another conflicting IstioLifecycleManager has previously been
                used to install a IstioOperators in this cluster, please check on uninstall
                of : gm-iop-1-20.gloo-platform'
              observedRevision: 1-20
              state: FAILED
    

Why it's happening

You might have a conflicting IstioLifecycleManager or GatewayLifecycleManager resource. For example, you might have uninstalled a previous IstioLifecycleManager resource that did not completely delete. This error can happen when the namespace is deleted before the Istio lifecycle manager is able to finish uninstalling all Istio resources.

How to fix it

  1. List the IstioLifecycleManager or GatewayLifecycleManager resources. If you have multiple in the same namespace for the same purpose, try uninstalling the one that you no longer need.
    kubectl get IstioLifecycleManager -A
    
  2. If you already uninstalled the istiod control plane or gateways by using the Istio lifecycle manager, try to manually replace the IstioLifecycleManager or GatewayLifecycleManager CR.

Agent crashes with IstioOperator error

What's happening

Your Gloo agent does not start and is in a CrashLoopBackOff state.

When you check the agent logs, you see an error similar to the following:

failed to list *v1alpha1.IstioOperator: unknown field \"target\" in v1alpha1.ResourceMetricSource
...
{"level":"error","ts":1678198470.502656,"logger":"controller.input-ConfigMap-cache","caller":"controller/controller.go:208","msg":"Could not wait for Cache to sync","error":"failed to wait for input-ConfigMap-cache caches to sync: timed out waiting for cache to be synced","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:208\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:234\nsigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/manager/runnable_group.go:218"}

Why it's happening

You might have an error in your Istio operator configuration, such as using a field that is deprecated or no longer supported.

By default, Gloo Platform expects the Istio ingress gateway to have the name istio-ingressgateway. If you use a custom name for the ingress gateway, you cannot set up horizontal pod autoscaling (HPA) for the Istio gateways.

How to fix it

  1. Review your Istio operator configuration file for any deprecated or unsupported fields. For example, review the upstream Istio operator.proto file for unsupported fields in your version of Istio.
  2. If you use a custom Istio ingress gateway name, remove the HPA section from your Istio operator gateway configuration file. The following example shows the hpaSpec in the istio-eastwestgateway and istio-ingressgateway sections.
       ...
       components:
        ingressGateways:
        # Enable the default east-west gateway
          - name: <custom-name>
            # Deploy to the gloo-mesh-gateways namespace
            namespace: gloo-mesh-gateways
            enabled: true
            k8s:
              hpaSpec:
                maxReplicas: 5
                metrics:
                  - resource:
                      name: cpu
                      targetAverageUtilization: 60
                    type: Resource
          ...
          - name: <custom-name>
            # Deploy to the gloo-mesh-gateways namespace
            namespace: gloo-mesh-gateways
            enabled: true
            k8s:
              hpaSpec:
                maxReplicas: 5
                metrics:
                  - resource:
                      name: cpu
                      targetAverageUtilization: 60
                    type: Resource
       
  3. To add back similar HPA functionality, set autoscaling minimum and maximum values to the gateway configuration. The following example shows both the istio-eastwestgateway and istio-ingressgateway sections. Note that the targetAverageUtilization field is also removed, because that field is deprecated in Istio 1.14 and later.
       ...
       spec:
         values:
           gateways:
             istio-ingressgateway:
               autoscaleMin: 2
               autoscaleMax: 5
          ...
             istio-eastwestgateway:
               autoscaleMin: 2
               autoscaleMax: 5
       
  4. Update your Istio operator configuration.