Release notes
Review summaries of the main changes in the Gloo 2.5 release.
Make sure that you review the breaking changes đĨ that were introduced in this release and the impact that they have on your current environment.
Introduction
The release notes include important installation changes and known issues. They also highlight ways that you can take advantage of new features or enhancements to improve your product usage.
For more information, see the following related resources:
- Changelog: A full list of changes, including the ability to compare previous patch and minor versions.
- Upgrade guide: Steps to upgrade from the previous minor version to the current version.
- Version reference: Information about Solo’s version support.
đĨ Breaking changes
Review details about the following breaking changes. To review when breaking changes were released, you can use the comparison feature of the changelog. The severity is intended as a guide to help you assess how much attention to pay to this area during the upgrade, but can vary depending on your environment.
đ¨ High
Review severe changes that can impact production and require manual intervention.
- Envoy filter changes: The naming convention for EnvoyFilters changed. In addition, the functionality of the
EXPERIMENTAL_SEGMENT_ENVOY_FILTERS_BY_MATCHER
was promoted to standard behavior. The Gloo management server must re-create existing EnvoyFilters, which requires a careful upgrade procedure to avoid disruptions in your environment and policies from failing open. - Default add-ons namespace removed: The default namespace for add-ons, which was previously
gloo-mesh-addons
, is removed. If you do not explicitly set the namespace in the Helm chart, add-ons are automatically deployed to the namespace that Gloo Mesh Gateway is installed to, which can lead to disruptions and downtime for your add-on components.
đ Medium
Review changes that might have impact to production and require manual intervention, but possibly not until the next version is released.
- Prometheus annotations removed: The
prometheus.io/port: "<port_number>"
annotation was removed from the Gloo management server and agent in 2.5.0. However, theprometheus.io/scrape: true
annotation is still present. When you use your own Prometheus instance to scrape metrics, the instance might try to scrape metrics from port numbers that require TLS authentication. This can lead to error messages in the logs. Note that this issue is resolved in 2.5.2. - Gloo UI auth secret: If you set up external authentication for the Gloo UI and you use the
glooUi.auth.oidc.clientSecret
Helm setting to reference the Kubernetes secret that stores your credentials, the secret might get deleted during the upgrade. To resolve this issue, review the steps in the release note.
âšī¸ Low
Review informational updates that you might want to implement but that are unlikely to materially impact production.
- Known Portal issues in 2.5.2: A known issue exists in 2.5.2 that causes interruption during translation. This issue is resolved in version 2.5.3.
- Upstream Prometheus upgrade: The Prometheus Helm chart version is upgraded to a newer version, which requires the Prometheus deployment to be re-created. Gloo Mesh Gateway uses Helm pre-upgrade hooks to re-create the deployment, which can cause issues in automated environments such as Argo CD.
Envoy filter changes
The following Envoy filter changes were introduced in version 2.5.0. Both changes require the Envoy filters in your environment to be re-created.
Make sure that you follow the Upgrade steps to safely upgrade your environment and re-create the Envoy filters.
New naming convention for Envoy filters:The naming convention for Envoy filters changed to ensure stable and consistent behavior during upgrades and policy changes. Before, the workload labels and a generated ID were used to build the Envoy filter name. Starting in 2.5.0, the name is built by using a combination of the Envoy filter namespace, cluster, workload labels, and a hash for the Envoy filter matchers. Note that Envoy filter names can still change if you update your workload selectors.
To safely apply this change, make sure to carefully follow the Upgrade steps.
Break up large Envoy filters promoted to standard behavior:Some Gloo policies, such as JWT or other external auth policies are translated into Envoy filters during the Gloo translation process. These Envoy filters are created per proxy and are then applied to the ingress gateway or sidecar proxy to enforce the policies. In environments where you apply policies to a lot of apps and routes, the size of the Envoy filter can become very large and exceed the maximum file size limit in etcd. When the maximum file size limit is reached, new configuration is rejected in etcd and Istio, which leads to policies not being applied and enforced properly.
To prevent this issue, the experimental environment variable
EXPERIMENTAL_SEGMENT_ENVOY_FILTERS_BY_MATCHER
was introduced in Gloo Mesh Gateway version 2.3 and 2.4. If enabled on the Gloo management server, the server automatically breaks up large Envoy filters and creates an Envoy filter per matcher. If the environment variable is not set, Envoy filters are created per proxy.Starting in version 2.5.0, the experimental environment variable is removed and its functionality is promoted to standard behavior. The Gloo management server now automatically creates Envoy filters for each matcher. If you did not previously enable the
EXPERIMENTAL_SEGMENT_ENVOY_FILTERS_BY_MATCHER
environment variable to create Envoy filters per matcher, your Envoy filters must be re-created as part of the upgrade to version 2.5.0.
To safely upgrade and ensure existing Envoy filters are correctly re-created, the Gloo management server, and the Istio control plane istiod must temporarily be scaled down to 0 replicas. This upgrade procedure can have the following implications for your environment:
- Delayed configuration updates: During the upgrade, the Gloo management server and istiod control plane are temporarily scaled down. Because of that, the propagation of configuration changes to the sidecar or gateway proxy, such as new routing rules or security policies, is delayed. This can cause inconsistencies in traffic management and policy enforcement.
- Complex environments with long translation times: If you have a complex environment and your average translation time regularly takes more than 60 seconds, scaling down
istiod
might have unexpected impacts and delay the time for your traffic to continue as normal. - New pods cannot be added to the mesh: The Istio control plane istiod implements the sidecar injection webhook. When the control plane is scaled down, sidecar injection does not work and new pods cannot be added to the service mesh. You can manually inject sidecars into your pods. However, keep in mind that these pods do not receive traffic as endpoint discovery is also disabled when the Istio control plane is scaled down. After the control plane is scaled back up, pods are automatically injected with sidecars and added to the mesh.
- mTLS certificate issues: If certificates expire while the Istio control plane is not available, mutual TLS between services in the mesh might be impacted.
Follow the Upgrade steps to safely upgrade your environment and re-create Envoy filters.
Default add-ons namespace removed
In previous releases, all add-ons were automatically installed to the gloo-mesh-addons
namespace unless you specified a different namespace during the Gloo Mesh Gateway installation. Starting with release 2.5.0, this default value is removed. If no value is set in the common.addonNamespace
Helm field, your add-ons are now deployed to the namespace that the Helm release is installed to, which defaults to gloo-mesh
. To avoid disruptions or downtime for your add-on components, such as a rate limit server, set the namespace you want your add-ons to be installed to in the common.addonNamespace
field of your Helm values file. However, note that all guides now assume gloo-mesh
as the add-ons namespace.
Gloo UI auth secret
If you set up external authentication for the Gloo UI and you use the glooUi.auth.oidc.clientSecret
Helm setting to reference the Kubernetes secret that stores your credentials, the secret might get deleted during the upgrade, due to a bug.
To resolve this issue, you can manually create the secret before the upgrade. Note that during this process, the Gloo UI temporarily cannot be accessed.
- Get the client secret from your OIDC provider.
- Manually delete the existing Gloo UI OIDC client secret that your installation automatically created for you. You can check the name of the secret in the
glooUi.auth.oidc.clientSecret
Helm setting. The following example usesdashboard
in thegloo-mesh
namespace.kubectl get delete secret -n gloo-mesh dashboard
- Re-create the OIDC client secret without any Helm labels or annotations.
apiVersion: v1 kind: Secret metadata: name: dashboard namespace: gloo-mesh type: Opaque stringData: oidc-client-secret: $OIDC_CLIENT_SECRET
- Update your Helm configuration file to remove the
glooUi.auth.oidc.clientSecret
setting and to refer to the secret that you just recreated in theglooUi.auth.oidc.clientSecretName
setting.... glooUi: enabled: true auth: enabled: true backend: oidc oidc: appUrl: # The URL that the UI for the OIDC app is available at, from the DNS and other ingress settings that expose the OIDC app UI service. clientId: # From the OIDC provider clientSecretName: dashboard #The Kubernetes secret with your OIDC client secret that you previously created. issuerUrl: # The URL to connect to the OpenID Connect identity provider, often in the format 'https://<domain>.<provider_url>/'. appUrl: # The URL that the Gloo UI is exposed at, such as 'https://localhost:8090'.
- Continue with the upgrade.
Known Portal issues in 2.5.2
Gloo Mesh Gateway version 2.5.2 has a known issue in Portal that causes interruption during translation. This issue is resolved in version 2.5.3. Portal users are advised to skip version 2.5.2, and to directly upgrade to 2.5.3 instead.
Upstream Prometheus upgrade
Gloo Mesh Gateway includes a built-in Prometheus server to help monitor the health of your Gloo components. This release of Gloo upgrades the Prometheus community Helm chart from version 19.7.2 to 25.11.0. As part of this upgrade, upstream Prometheus changed the selector labels for the deployment, which requires recreating the deployment. To help with this process, the Gloo Helm chart includes a pre-upgrade hook that automatically recreates the Prometheus deployment during a Helm upgrade. This breaking change impacts upgrades from previous versions to version 2.4.10, 2.5.1, or 2.6.0 and later.
If you do not want the redeployment to happen automatically, you can disable this process by setting the prometheus.skipAutoMigration
Helm value to true
. For example, you might use Argo CD, which converts Helm pre-upgrade hooks to Argo PreSync
hooks and causes issues. To ensure that the Prometheus server is deployed with the right version, follow these steps:
- Confirm that you have an existing deployment of Prometheus at the old Helm chart version of
chart: prometheus-19.7.2
.kubectl get deploy -n gloo-mesh prometheus-server -o yaml | grep chart
- Delete the Prometheus deployment. Note that while Prometheus is deleted, you cannot observe Gloo performance metrics.
kubectl delete deploy -n gloo-mesh prometheus-server
- In your Helm values file, set the
prometheus.skipAutoMigration
field totrue
. - Continue with the Helm upgrade of Gloo Mesh Gateway. The upgrade recreates the Prometheus server deployment at the new version.
Prometheus annotations removed
In Gloo version 2.5.0, the prometheus.io/port: "<port_number>"
annotation was removed from the Gloo management server and agent. However, the prometheus.io/scrape: true
annotation is still present. If you have another Prometheus instance that runs in your cluster, and it is not set up with custom scraping jobs for the Gloo management server and agent, the instance automatically scrapes all ports on the management server and agent pods. This can lead to error messages in the management server and agent logs. Note that this issue is resolved in version 2.5.2. To resolve this issue in Gloo version 2.5.0 or 2.5.1, see Run another Prometheus instance alongside the built-in one.
âī¸ Installation changes
In addition to comparing differences across versions in the changelog, review the following installation changes from the previous minor version to version 2.5.
Gloo agent health check port
Because you can now run the Gloo agent as a sidecar container in the management server pod, the default Gloo agent health check port is changed from 8090 to 8091.
Gloo UI
To use the Gloo UI and visualize the network traffic in your environment with the Gloo UI graph, you must set the telemetryCollector.enabled
Helm setting to true
in each cluster in your environment, including the management cluster.
Portal logs pipeline
The Gloo telemetry pipeline telemetryCollectorCustomization.pipelines.logs/istio_access_logs
is renamed to telemetryCollectorCustomization.pipelines.logs/portal
. For more information, see Monitor Portal analytics in the Gloo Mesh Gateway docs.
New default values for Gloo UI auth sessions
Some of the default Helm values changed for configuring the Gloo UI auth session storage:
glooUi.auth.oidc.session.backend
: The default value changed from""
(empty) tocookie
to ensure auth sessions are stored in browser cookies by default.glooUi.auth.oidc.session.redis.host
: The default value changed from""
(empty) togloo-mesh-redis.gloo-mesh:6379
to ensure a valid Redis host is set whenglooUi.auth.oidc.session.backend
is changed toredis
.
Note that if you previously set values for glooUi.auth.oidc.session.backend
and glooUi.auth.oidc.session.redis.host
, these values are not overwritten. The new default values are only set if these two fields are currently set to ""
(empty).
To learn how to set up Gloo UI auth session storage, see Store UI sessions.
đ Bug fixes
Multiple Istio revisions in the same cluster
If you run multiple revisions of Istio in your cluster and use discoverySelectors
in each revision to discover the resources in specific namespaces, enable the glooMgmtServer.extraEnvs.IGNORE_REVISIONS_FOR_VIRTUAL_DESTINATION_TRANSLATION
environment variable on the Gloo management server. This setting allows virtual destinations to be translated correctly if the east-west gateway and the backing services belong to different namespaces.
This feature is available in version 2.5.5 and later.
To enable this feature, add the following values to your Helm values file.
glooMgmtServer:
extraEnvs:
- name: IGNORE_REVISIONS_FOR_VIRTUAL_DESTINATION_TRANSLATION
value: "true"
To check if you use discoverySelectors
in your Istio revision:
Istio lifecycle manager installations:
Get the details of your Istio lifecycle manager resources.
kubectl get istiolifecyclemanagers -A -o yaml
In your Istio lifecycle manager resource, check if you use
discoverySelectors
in yourspec.installations.istioOperatorSpec.meshConfig
... spec: installations: - clusters: - defaultRevision: true name: mycluster istioOperatorSpec: components: pilot: k8s: env: - name: PILOT_SKIP_VALIDATE_TRUST_DOMAIN value: "true" meshConfig: discoverySelectors: - matchLabels: istio-discovery: enabled
Manual installations:
Get the details of your Istio operator.
kubectl get istiooperator -A -o yaml
In your Istio operator configuration, check if you use
discoverySelectors
in yourmeshConfig
settings.... meshConfig: discoverySelectors: - matchLabels: istio-discovery: enabled
Indefinite reconciliation of IssuedCertificates
If you install the gloo-mesh-mgmt-server
management plane or gloo-mesh-agent
data plane in a custom namespace other than gloo-mesh
, skip upgrades to 2.5.7 and 2.5.8. These patch versions have a bug that causes IssuedCertificates to reconcile indefinitely. Instead, upgrade directly to patch version 2.5.9 or later, in which the issue is resolved.
đ New features
Review the following new features that are introduced in version 2.5 and that you can enable in your environment.
Delimiters in JWT token claims
As of version 2.5.12, you can configure custom delimiters when you extract claims from JWT tokens. This way, you can append the claim information in a header in a different format than the default comma-delimited format. For example steps, see Extract claims to headers.
Failover priority
You can use the new priorityLabels
field in the FailoverPolicy to prioritize destinations that get traffic in case of failure. Previously, you could only use localityMappings
to set up basic failover from one location to another, not a prioritized order of multiple locations. For more information, see the Failover guide.
Redis safe mode
In versions 2.5.3 and lower, a race condition was identified that can be triggered during simultaneous restarts of the management plane and Redis, including an upgrade to a newer Gloo version. If hit, this failure mode can lead to partial translations on the Gloo management server which can result in Istio resources being temporarily deleted from the output snapshots that are sent to the Gloo agents. For more information about this failure scenario, see Redis and Gloo management server restart. To resolve this issue, a new safe mode feature was added that you can enable by setting glooMgmtServer.safeMode
Helm chart option to true.
If safe mode is enabled, translation of input snapshots halts until the input snapshots of all registered Gloo agents are present in the Redis cache. This feature improves management plane stability during disaster scenarios and upgrades. For more information, see Safe mode. The safe mode feature is disabled by default.
To enable safe mode, follow these general steps:
Scale down the number of Gloo management server pods to 0.
kubectl scale deployment gloo-mesh-mgmt-server --replicas=0 -n gloo-mesh
Upgrade your Gloo Mesh Gateway installation. Add the following settings in the Helm values file for the Gloo management plane.
glooMgmtServer: safeMode: true
Scale the Gloo management server back up to the number of desired replicas. The following example uses 1 replica.
kubectl scale deployment gloo-mesh-mgmt-server --replicas=1 -n gloo-mesh
Redis safe start window
With safe mode, the Gloo management server halts translation until the input snapshots of all workload clusters are present in the Redis cache. However, if clusters have connectivity issues, translation might be halted for a long time, even for healthy clusters. You might want translation to resume after a certain period of time, even if some input snapshots are missing in the Redis cache. To do so, you must use the glooMgmtServer.safeStartWindow
field in your Gloo management server Helm values file to specify the time in seconds to halt translation. Note that this setting is ignored if glooMgmtServer.safeMode
is set to true. The default value is 180 seconds. You can disable the wait time by setting this field to 0
(zero). For more information, see Option 2: Safe start window.
To set a safe start window, follow these general steps:
Scale down the number of Gloo management server pods to 0.
kubectl scale deployment gloo-mesh-mgmt-server --replicas=0 -n gloo-mesh
Upgrade your Gloo Mesh Gateway installation. Add the following settings in the Helm values file for the Gloo management plane.
glooMgmtServer: safeMode: false safeStartWindow: 90
Scale the Gloo management server back up to the number of desired replicas. The following example uses 1 replica.
kubectl scale deployment gloo-mesh-mgmt-server --replicas=1 -n gloo-mesh
I/O threads for Redis in 2.5.6
A new Helm value redis.deployment.ioThreads
was introduced to specify the number of I/O threads to use for the built-in Redis instance. Redis is mostly single threaded, however some operations, such as UNLINK or slow I/O accesses can be performed on side threads. Increasing the number of side threads can help improve and maximize the performance of Redis as these operations can run in parallel.
The default and minimum valid value for this setting is 1. If you plan to increase the number of I/O side threads, make sure that you also change the CPU requests and CPU limits for the Redis pod. Set the CPU requests and limits to the same number that you use for the I/O side threads plus 1. That way, you can ensure that each side thread has an available CPU core, and that an additional CPU core is left for the main Redis thread. For example, if you want to set I/O threads to 2, make sure to add 3 CPU cores to the resource requests and limits for the Redis pod. You can find further recommendations regarding I/O threads in this Redis configuration example.
If you set I/O threads, the Redis pod must be restarted during the upgrade so that the changes can be applied. During the restart, the input snapshots from all connected Gloo agents are removed from the Redis cache. If you also update settings in the Gloo management server that require the management server pod to restart, the management server’s local memory is cleared and all Gloo agents are disconnected. Although the Gloo agents attempt to reconnect to send their input snapshots and re-populate the Redis cache, some agents might take longer to connect or fail to connect at all. To ensure that the Gloo management server halts translation until the input snapshots of all workload cluster agents are present in Redis, it is recommended to enable safe mode on the management server alongside updating the I/O threads for the Redis pod. For more information, see Safe mode. Note that in version 2.6.0 and later, safe mode is enabled by default.
To update I/O side threads in Redis as part of your Gloo Mesh Gateway upgrade:
Scale down the number of Gloo management server pods to 0.
kubectl scale deployment gloo-mesh-mgmt-server --replicas=0 -n gloo-mesh
Upgrade Gloo Mesh Gateway and use the following settings in your Helm values file for the management server. Make sure to also increase the number of CPU cores to one core per thread, and add an additional CPU core for the main Redis thread. The following example also enables safe mode on the Gloo management server to ensure translation is done with the complete context of all workload clusters.
glooMgmtServer: safeMode: true redis: deployment: ioThreads: 2 resources: requests: cpu: 3 limits: cpu: 3
Scale the Gloo management server back up to the number of desired replicas. The following example uses 1 replica.
kubectl scale deployment gloo-mesh-mgmt-server --replicas=1 -n gloo-mesh
Annotations for virtual services via route tables
In version 2.5.9, you can now add annotations to route tables that are populated to the Istio virtual service that the route table is translated to. A common example for using annotations is to exclude the virtual service from being managed by ExternalDNS so that you can manually manage the DNS record or integrate it with a different DNS provider.
For more information, see Route table annotations for Istio virtual services.
đ§ Known issues
The Solo team fixes bugs, delivers new features, and makes changes on a regular basis as described in the changelog. Some issues, however, might impact many users for common use cases. These known issues are as follows:
Cluster names: Do not use underscores (
_
) in the names of your clusters or in thekubeconfig
context for your clusters.External auth: In version 2.5.8 a regression was introduced that impacts the translation of external auth policies when a virtual destination is used for the external auth server. This issue is fixed in version 2.5.9. If you use a virtual destination for the external auth server, skip the 2.5.8 release and upgrade to 2.5.9 or later instead.
Istio:
- Due to a lack of support for the Istio CNI and iptables for the Istio proxy, you cannot run Istio (and therefore Gloo Mesh Gateway) on AWS Fargate. For more information, see the Amazon EKS issue.
- Istio 1.21 is not supported in Gloo Mesh Gateway version 2.5.
- Istio 1.20 is supported only as patch version
1.20.1-patch1
and later. Do not use patch versions 1.20.0 and 1.20.1, which contain bugs that impact several Gloo Mesh Gateway features that rely on Istio ServiceEntries.
- If you have multiple external services that use the same host and plan to use Istio 1.20, you must use patch version 1.20.7 or later to ensure that the Istio service entry that is created for those external services is correct.
- The
WasmDeploymentPolicy
Gloo CR is currently unsupported in Istio versions 1.18 and later.
- For FIPS-compliant Solo distributions of Istio 1.17.2 and 1.16.4, you must use the
-patch1
versions of the latest Istio builds published by Solo, such as1.17.2-patch1-solo-fips
for Solo distribution of Istio 1.17. These patch versions fix a FIPS-related issue introduced in the upstream Envoy code. In 1.17.3 and later, FIPS compliance is available in the-fips
tags of regular Solo distributions of Istio, such as1.17.3-solo-fips
.
OTel pipeline: FIPS-compliant builds are not currently supported for the OTel collector agent image.
Portal:
- Gloo Mesh Gateway version 2.5.2 has a known issue in Portal that causes interruption during translation. This issue is resolved in version 2.5.3. Portal users are advised to skip version 2.5.2, and to directly upgrade to 2.5.3 instead.
- For other known issues regarding the developer portal, see the Portal documentation.
Workspaces: In Istio version 1.21 or earlier, when you reconfigure your Gloo workspaces, such as by moving from one workspace to multiple workspaces, routing to services that are exposed with a virtual destination might fail. You must re-apply the virtual destination to fix routing for these services. Note that this issue is fixed in Istio version 1.22 and later.