On this page

Release notes

Review summaries of the main changes in the Gloo 2.4 release.

Introduction

The release notes include important installation changes and known issues. They also highlight ways that you can take advantage of new features or enhancements to improve your product usage.

For more information, see the following related resources:

Changelog: A full list of changes, including the ability to compare previous patch and minor versions.
Upgrade guide: Steps to upgrade from the previous minor version to the current version.
Version reference: Information about Solo’s version support.

Breaking changes

Review details about the following breaking changes. To review when breaking changes were released, you can use the comparison feature of the changelog.

Upstream Prometheus upgrade

Gloo Mesh Enterprise includes a built-in Prometheus server to help monitor the health of your Gloo components. This release of Gloo upgrades the Prometheus community Helm chart from version 19.7.2 to 25.11.0. As part of this upgrade, upstream Prometheus changed the selector labels for the deployment, which requires recreating the deployment. To help with this process, the Gloo Helm chart includes a pre-upgrade hook that automatically recreates the Prometheus deployment during a Helm upgrade. This breaking change impacts upgrades from previous versions to version 2.4.10, 2.5.1, or 2.6.0 and later.

If you do not want the redeployment to happen automatically, you can disable this process by setting the prometheus.skipAutoMigration Helm value to true. For example, you might use Argo CD, which converts Helm pre-upgrade hooks to Argo PreSync hooks and causes issues. To ensure that the Prometheus server is deployed with the right version, follow these steps:

Confirm that you have an existing deployment of Prometheus at the old Helm chart version of chart: prometheus-19.7.2.
```
  kubectl get deploy -n gloo-mesh prometheus-server -o yaml | grep chart
  
```
Delete the Prometheus deployment. Note that while Prometheus is deleted, you cannot observe Gloo performance metrics.
```
  kubectl delete deploy -n gloo-mesh prometheus-server
  
```
In your Helm values file, set the prometheus.skipAutoMigration field to true.
Continue with the Helm upgrade of Gloo Mesh Enterprise. The upgrade recreates the Prometheus server deployment at the new version.

New volume mount for Cilium flow logs

To improve Cilium flog log collection, a new volume cilium-run was introduced and added to the configuration of the Gloo telemetry pipeline. This volume is automatically mounted on the host where the telemetry collectors run when using the default self-signed TLS certificates for the telemetry gateway and collectors. The steps to work around this change depend on your Gloo telemetry pipeline setup:

Default self-signed TLS certificates (OpenShift only): When you use the default self-signed TLS certificates for the Gloo telemetry pipeline, the volume is automatically added to the pipeline configuration and mounted on the host during the upgrade. No additional steps are required. However, if you run on OpenShift, elevated permissions for the gloo-mesh service account are required to allow Gloo Mesh Enterprise to mount the volume on the host.
To elevate the permissions in OpenShift, run the following command on the management cluster and all workload clusters:
```
  oc adm policy add-scc-to-group hostmount-anyuid system:serviceaccounts:gloo-mesh
  
```

Custom TLS certificates: If you previously configured your telemetry pipeline to use custom TLS certificates, or if you attempt to you use the default Gloo telemetry pipeline settings from a previous release to customize certificate settings in 2.4, you must add the new volume and volume mount to the Gloo agent Helm chart as shown in the following example. For steps on how to set up the Gloo telemetry pipeline with custom certificates, see Set up OTel with a custom certificate.

Required in OpenShift: Elevate the permissions of the gloo-mesh service account to allow mounting of volumes on the host.
```
  oc adm policy add-scc-to-group hostmount-anyuid system:serviceaccounts:gloo-mesh
  
```

Add the cilium-run volume mount to the Helm chart for the Gloo agent.

  
telemetryCollector:
  config:
    exporters:
      otlp:
        # Domain for gateway's DNS entry
        # The default port is 4317.
        # If you set up an external load balancer between the telemetry gateway and collector agents, and you configured TLS passthrough to forward data to the telemetry gateway on port 4317, use port 443 instead.
        endpoint: [domain]:4317
        tls:
          ca_file: /etc/otel-certs/ca.crt
  enabled: true
  resources:
    limits:
      cpu: 2
      memory: 2Gi
    requests:
      cpu: 500m
      memory: 1Gi
  extraVolumes:
    # Include this section if you created a custom root CA cert secret
    - name: root-ca  # customers modify this list entry for BYO SSL certs
      secret:
        # Add your root CA cert secret name
        secretName: telemetry-root-secret
        defaultMode: 420
    - name: telemetry-configmap
      configMap:
        name: gloo-telemetry-collector-config
        items:
          - key: relay
            path: relay.yaml
    - hostPath:
        path: /var/run/cilium
        type: DirectoryOrCreate
      name: cilium-run
  extraVolumeMounts:
    - name: root-ca  
      readOnly: true
      mountPath: /etc/otel-certs
    - name: telemetry-configmap
      mountPath: /conf
    - name: cilium-run
      mountPath: /var/run/cilium
telemetryCollectorCustomization:
  # Domain for gateway's DNS entry
  serverName: [domain]

Installation changes

In addition to comparing differences across versions in the changelog, review the following installation changes from the previous minor version to version 2.4.

Global workspace during installation

Previously, single cluster installation profiles included a global workspace and workspace settings by default. In version 2.4, you can use the glooMgmtServer.createGlobalWorkspace=true setting in the Helm chart, or create a workspace manually after installation.

OTel collector installation

Previously, to set the endpoint during the OTel collector installation, you might have escaped quotations such as endpoint: "\"${ENDPOINT_TELEMETRY_GATEWAY}\"". Now, the syntax is simplified so that you can enter endpoint: "${ENDPOINT_TELEMETRY_GATEWAY}", such as in the following example.

  
telemetryCollector:
  enabled: true
  config:
    exporters:
      otlp:
        endpoint: "${ENDPOINT_TELEMETRY_GATEWAY}"

New features

Review the following new features that are introduced in version 2.4 and that you can enable in your environment.

Redis safe mode

In versions 2.4.11 and lower, a race condition was identified that can be triggered during simultaneous restarts of the management plane and Redis, including an upgrade to a newer Gloo version. If hit, this failure mode can lead to partial translations on the Gloo management server which can result in Istio resources being temporarily deleted from the output snapshots that are sent to the Gloo agents. For more information about this failure scenario, see Redis and Gloo management server restart. To resolve this issue, a new safe mode feature was added that you can enable by setting glooMgmtServer.safeMode Helm chart option to true.

If safe mode is enabled, translation of input snapshots halts until the input snapshots of all registered Gloo agents are present in the Redis cache. This feature improves management plane stability during disaster scenarios and upgrades. For more information, see Safe mode. The safe mode feature is disabled by default.

To enable safe mode, follow these general steps:

Scale down the number of Gloo management server pods to 0.

  kubectl scale deployment gloo-mesh-mgmt-server --replicas=0 -n gloo-mesh

Upgrade your Gloo Mesh Enterprise installation. Add the following settings in the Helm values file for the Gloo management plane.
```
  
glooMgmtServer:
  safeMode: true
  
```
Scale the Gloo management server back up to the number of desired replicas. The following example uses 1 replica.
```
  kubectl scale deployment gloo-mesh-mgmt-server --replicas=1 -n gloo-mesh
  
```

Redis safe start window

With safe mode, the Gloo management server halts translation until the input snapshots of all workload clusters are present in the Redis cache. However, if clusters have connectivity issues, translation might be halted for a long time, even for healthy clusters. You might want translation to resume after a certain period of time, even if some input snapshots are missing in the Redis cache. To do so, you must use the glooMgmtServer.safeStartWindow field in your Gloo management server Helm values file to specify the time in seconds to halt translation. Note that this setting is ignored if glooMgmtServer.safeMode is set to true. The default value is 180 seconds. You can disable the wait time by setting this field to 0 (zero). For more information, see Option 2: Safe start window.

To set a safe start window, follow these general steps:

Scale down the number of Gloo management server pods to 0.

  kubectl scale deployment gloo-mesh-mgmt-server --replicas=0 -n gloo-mesh

Upgrade your Gloo Mesh Enterprise installation. Add the following settings in the Helm values file for the Gloo management plane.
```
  
glooMgmtServer:
  safeMode: false
  safeStartWindow: 90
  
```
Scale the Gloo management server back up to the number of desired replicas. The following example uses 1 replica.
```
  kubectl scale deployment gloo-mesh-mgmt-server --replicas=1 -n gloo-mesh
  
```

Break up large Envoy filters

Some Gloo policies, such as JWT or other external auth policies are translated into Envoy filters during the Gloo translation process. These Envoy filters are stored in the Kubernetes data store etcd alongside other Gloo configurations and applied to the ingress gateway or sidecar proxy to enforce the policies. In environments where you apply policies to a lot of apps and routes, the size of the Envoy filter can become very large and exceed the maximum file size limit in etcd. When the maximum file size limit is reached, new configuration is rejected in etcd and Istio, which leads to policies not being applied and enforced properly.

To prevent this issue in your environment, it is recommended to set the new EXPERIMENTAL_SEGMENT_ENVOY_FILTERS_BY_MATCHER environment variable on the Gloo management server to instruct the server to break up large Envoy filters into multiple smaller Envoy filters. In your Helm values file for the Gloo management server, add the following snippet:

  
glooMgmServer: 
  extraEnvs:
    EXPERIMENTAL_SEGMENT_ENVOY_FILTERS_BY_MATCHER:
      value: "true"

report

Important: To safely upgrade and ensure existing Envoy filters are correctly re-created, the Gloo management server, and the Istio control plane istiod must temporarily be scaled down to 0 replicas. This upgrade procedure can have the following implications for your environment:

Delayed configuration updates: During the upgrade, the Gloo management server and istiod control plane are temporarily scaled down. Because of that, the propagation of configuration changes to the sidecar or gateway proxy, such as new routing rules or security policies, is delayed. This can cause inconsistencies in traffic management and policy enforcement.
New pods cannot be added to the mesh: The Istio control plane istiod implements the sidecar injection webhook. When the control plane is scaled down, sidecar injection does not work and new pods cannot be added to the service mesh. You can manually inject sidecars into your pods. However, keep in mind that these pods do not receive traffic as endpoint discovery is also disabled when the Istio control plane is scaled down. After the control plane is scaled back up, pods are automatically injected with sidecars and added to the mesh.
mTLS certificate issues: If certificates expire while the Istio control plane is not available, mutual TLS between services in the mesh might be impacted.

Note that the EXPERIMENTAL_SEGMENT_ENVOY_FILTERS_BY_MATCHER environment variable is removed in Gloo Mesh Enterprise version 2.5.0. This is because the Envoy filter segmentation is promoted to standard behavior and enabled by default. You no longer need to set the environment variable. If you want to enable this feature in version 2.3.x or 2.4.x, use the upgrade steps in version 2.5 as a general guidance for how to safely scale down the Gloo management server, Gloo agent, and istiod, and re-create the Envoy filters in your environment.

I/O threads for Redis in 2.4.15

A new Helm value redis.deployment.ioThreads was introduced to specify the number of I/O threads to use for the built-in Redis instance. Redis is mostly single threaded, however some operations, such as UNLINK or slow I/O accesses can be performed on side threads. Increasing the number of side threads can help improve and maximize the performance of Redis as these operations can run in parallel.

report

The default and minimum valid value for this setting is 1. If you plan to increase the number of I/O side threads, make sure that you also change the CPU requests and CPU limits for the Redis pod. Set the CPU requests and limits to the same number that you use for the I/O side threads plus 1. That way, you can ensure that each side thread has an available CPU core, and that an additional CPU core is left for the main Redis thread. For example, if you want to set I/O threads to 2, make sure to add 3 CPU cores to the resource requests and limits for the Redis pod. You can find further recommendations regarding I/O threads in this Redis configuration example.

If you set I/O threads, the Redis pod must be restarted during the upgrade so that the changes can be applied. During the restart, the input snapshots from all connected Gloo agents are removed from the Redis cache. If you also update settings in the Gloo management server that require the management server pod to restart, the management server’s local memory is cleared and all Gloo agents are disconnected. Although the Gloo agents attempt to reconnect to send their input snapshots and re-populate the Redis cache, some agents might take longer to connect or fail to connect at all. To ensure that the Gloo management server halts translation until the input snapshots of all workload cluster agents are present in Redis, it is recommended to enable safe mode on the management server alongside updating the I/O threads for the Redis pod. For more information, see Safe mode. Note that in version 2.6.0 and later, safe mode is enabled by default.

To update I/O side threads in Redis as part of your Gloo Mesh Enterprise upgrade:

Scale down the number of Gloo management server pods to 0.

  kubectl scale deployment gloo-mesh-mgmt-server --replicas=0 -n gloo-mesh

Upgrade Gloo Mesh Enterprise and use the following settings in your Helm values file for the management server. Make sure to also increase the number of CPU cores to one core per thread, and add an additional CPU core for the main Redis thread. The following example also enables safe mode on the Gloo management server to ensure translation is done with the complete context of all workload clusters.
```
  
glooMgmtServer:
  safeMode: true
redis: 
  deployment: 
    ioThreads: 2
    resources: 
      requests: 
        cpu: 3
      limits: 
        cpu: 3
  
```
Scale the Gloo management server back up to the number of desired replicas. The following example uses 1 replica.
```
  kubectl scale deployment gloo-mesh-mgmt-server --replicas=1 -n gloo-mesh
  
```

Feature changes

Review the following changes that might impact how you use certain features in your Gloo environment.

Sidecar acceleration

Support for the eBPF-based acceleration alpha feature is removed.

Known issues

The Solo team fixes bugs, delivers new features, and makes changes on a regular basis as described in the changelog. Some issues, however, might impact many users for common use cases. These known issues are as follows:

Cluster names: Do not use underscores (_) in the names of your clusters or in the kubeconfig context for your clusters.
Istio:
- Due to a lack of support for the Istio CNI and iptables for the Istio proxy, you cannot run Istio (and therefore Gloo Mesh Enterprise) on AWS Fargate. For more information, see the Amazon EKS issue.
- The WasmDeploymentPolicy Gloo CR is currently unsupported in Istio versions 1.18 and later.
- For FIPS-compliant Solo distributions of Istio 1.17.2 and 1.16.4, you must use the -patch1 versions of the latest Istio builds published by Solo, such as 1.17.2-patch1-solo-fips for Solo distribution of Istio 1.17. These patch versions fix a FIPS-related issue introduced in the upstream Envoy code. In 1.17.3 and later, FIPS compliance is available in the -fips tags of regular Solo distributions of Istio, such as 1.17.3-solo-fips.
OTel pipeline: FIPS-compliant builds are not currently supported for the OTel collector agent image.

Release notes

Introduction link

Breaking changes link

Upstream Prometheus upgrade link

New volume mount for Cilium flow logs link

Installation changes link

Global workspace during installation link

OTel collector installation link

New features link

Redis safe mode link

Redis safe start window link

Break up large Envoy filters link

I/O threads for Redis in 2.4.15 link

Feature changes link

Sidecar acceleration link

Known issues link

Introduction

Breaking changes

Upstream Prometheus upgrade

New volume mount for Cilium flow logs

Installation changes

Global workspace during installation

OTel collector installation

New features

Redis safe mode

Redis safe start window

Break up large Envoy filters

I/O threads for Redis in 2.4.15

Feature changes

Sidecar acceleration

Known issues