Release notes

Review the following summaries of the main changes in the 2.5.5 release. The release notes include important installation changes and known issues. They also highlight ways that you can take advantage of new features or enhancements to improve your product usage.

For more information, see the following related resources:

Breaking changes

Review details about the following breaking changes. To review when breaking changes were released, you can use the comparison feature of the changelog.

Default Gloo Platform add-ons namespace removed

In previous releases, all add-ons were automatically installed to the gloo-mesh-addons namespace unless you specified a different namespace during the Gloo Mesh Enterprise installation. Starting with release v2.5.0, this default value is removed. If no value is set in the common.addonsNamespace Helm field, your add-ons are now deployed to the namespace that the Helm release is installed to. To avoid disruptions or downtime for your add-on components, such as a rate limit server, set the namespace you want your add-ons to be installed to in the common.addonsNamespace field of your Helm values file.

Known Portal issues in 2.5.2

Gloo Mesh Gateway version 2.5.2 has a known issue in Portal that causes interruption during translation. This issue is resolved in version 2.5.3. Portal users are advised to skip version 2.5.2, and to directly upgrade to 2.5.3 instead.

New feature gate for east-west routes in JWT policies

Now, you can use the applyToRoutes selector in JWT policies to select east-west service mesh routes.

Previously, you could only select ingress routes that were attached to a virtual gateway. The use of a virtual gateway for ingress routes required a Gloo Mesh Gateway license in addition to your Gloo Mesh Enterprise license. For a Mesh-only scenario, you previously had to use the applyToDestinations selector. This meant that the same JWT policy applied to the destinations no matter how traffic reached them.

Now, you can use applyToRoutes for east-west routes. This way, you have more flexibility in how a destination is protected. For example, you might have several external and internal routes that go to the same backing destination. To secure these routes with different JWT policies, you can use applyToRoutes instead of applyToDestinations.

Depending on your existing JWT policy setup, this new feature can cause unexpected results. For example, you might have east-west routes that are selected by a JWT policy. However, because JWT policies did not work for east-west routes in version 2.5 and earlier, the JWT policy did not take effect. Your workloads within the service mesh could communicate to each other with including valid JWTs in the request. Now with this feature enabled, those same requests require valid JWTs. As such, you might notice service mesh traffic stop working until you update your JWT policies or east-west routes in your route tables. Continue with the following steps.

Before you upgrade to 2.6:

  1. Check your existing JWT policies that use applyToRoutes selectors and note the routes that they apply to.
  2. Check your existing route tables with the routes that you previously noted.
  3. Determine whether JWT policies apply to east-west service mesh routes.
    • If a route table includes a workload selector, or if a route table omits both the virtual gateway and workload selector fields: The JWT policies apply to the east-west service mesh routes. This might conflict with other JWT policies that already select the backing destinations of these routes.
    • If the route tables do not include a workload selector (except in the case that the route table also does not include a virtual gateway): The JWT policies do not apply to the east-west service mesh routes.
  4. Decide how to address the potential impact of updating the behavior of the JWT policy applyToRoutes selector.
    • To prevent JWT policies from applying to east-west service mesh routes, choose from the following options:
      • Update your configuration. For example, you might use a different label to select routes. Or, you might make separate route tables with separate route labels for your ingress routes vs. east-west routes.
      • Disable the feature gate by upgrading your Gloo Helm release with the featureGates.EnableJWTPolicyEastWestRoute value set to false.
    • To start applying JWT policies to east-west service mesh routes: Continue to upgrade to version 2.6. In version 2.6 and later, the feature gate is enabled by default.

For more information, see the following resources:

Upstream Prometheus upgrade

Gloo Mesh Enterprise includes a built-in Prometheus server to help monitor the health of your Gloo components. This release of Gloo upgrades the Prometheus community Helm chart from version 19.7.2 to 25.11.0. As part of this upgrade, upstream Prometheus changed the selector labels for the deployment, which requires recreating the deployment. To help with this process, the Gloo Helm chart includes a pre-upgrade hook that automatically recreates the Prometheus deployment during a Helm upgrade. This breaking change impacts upgrades from previous versions to version 2.4.10, 2.5.1, or 2.6.0 and later.

If you do not want the redeployment to happen automatically, you can disable this process by setting the prometheus.skipAutoMigration Helm value to true. For example, you might use Argo CD, which converts Helm pre-upgrade hooks to Argo PreSync hooks and causes issues. To ensure that the Prometheus server is deployed with the right version, follow these steps:

  1. Confirm that you have an existing deployment of Prometheus at the old Helm chart version of chart: prometheus-19.7.2.
    kubectl get deploy -n gloo-mesh prometheus-server -o yaml | grep chart
    
  2. Delete the Prometheus deployment. Note that while Prometheus is deleted, you cannot observe Gloo performance metrics.
    kubectl delete deploy -n gloo-mesh prometheus-server
    
  3. In your Helm values file, set the prometheus.skipAutoMigration field to true.
  4. Continue with the Helm upgrade of Gloo Mesh Enterprise. The upgrade recreates the Prometheus server deployment at the new version.

Prometheus annotations removed

In Gloo Mesh Gateway version 2.5.0, the prometheus.io/port: "<port_number>" annotation was removed from the Gloo management server and agent. However, the prometheus.io/scrape: true annotation is still present. If you have another Prometheus instance that runs in your cluster, and it is not set up with custom scraping jobs for the Gloo management server and agent, the instance automatically scrapes all ports on the management server and agent pods. This can lead to error messages in the management server and agent logs. To resolve this issue, see Run another Prometheus instance alongside the built-in one. Note that this issue is resolved in version 2.5.2.

Installation changes

In addition to comparing differences across versions in the changelog, review the following installation changes from the previous minor version to version 2.5.

Bug fixes

Multiple Istio revisions in the same cluster

If you run multiple revisions of Istio in your cluster and use discoverySelectors in each revision to discover the resources in specific namespaces, enable the glooMgmtServer.extraEnvs.IGNORE_REVISIONS_FOR_VIRTUAL_DESTINATION_TRANSLATION environment variable on the Gloo management server. This setting allows virtual destinations to be translated correctly if the east-west gateway and the backing services belong to different namespaces.

This feature is available in version 2.5.5 and later.

To enable this feature, add the following values to your Helm values file.

glooMgmtServer:
  extraEnvs:
  - name: IGNORE_REVISIONS_FOR_VIRTUAL_DESTINATION_TRANSLATION
    value: "true"

To check if you use discoverySelectors in your Istio revision:

  1. Get the details of your Istio lifecycle manager resources.

    kubectl get istiolifecyclemanagers -A -o yaml
    
  2. In your Istio lifecycle manager resource, check if you use discoverySelectors in your spec.installations.istioOperatorSpec.meshConfig

    ...
    spec:
     installations:
     - clusters:
       - defaultRevision: true
         name: mycluster
       istioOperatorSpec:
         components:
           pilot:
             k8s:
               env:
               - name: PILOT_SKIP_VALIDATE_TRUST_DOMAIN
                 value: "true"
         meshConfig:
           discoverySelectors:
             - matchLabels:
                 istio-discovery: enabled
    
  1. Get the details of your Istio operator.

    kubectl get istiooperator -A -o yaml
    
  2. In your Istio operator configuration, check if you use discoverySelectors in your meshConfig settings.

    ...
    meshConfig:
      discoverySelectors:
        - matchLabels:
            istio-discovery: enabled
    

Feature changes

Review the following changes that might impact how you use certain features in your Gloo environment.

Route table delegation

Route table delegation behavior around label inheritance changed in version 2.5. For more information about how route table delegation works, see the Concept docs.

Previously, the precedence for delegated route labels was as follows:

Now, the precedence for delegated routes is as follows:

This change can impact how routes are selected, as well as the policies that are attached to the route. In particular, you might have labels on the child route tables that now overwrite the labels that otherwise are inherited from the parent route tables. For example:

To review the labels on your routes and route tables, you can run commands similar to the following. Be sure to update the namespace with the location of your route tables.

kubectl get rt -n gloo-mesh-gateways -o=jsonpath='{range .items[*]}[{.metadata.name}, {.spec.http[*].name}, {.spec.http[*].labels}]{"\n"}{end}'

Example output:

  • The tracks-rt route table has a usagePlans: dev-portal label on its tracks-api route. If this is a child route table, note that the route-level label takes precedence over any label the route might otherwise inherit.
  • The other route tables do not have route-level labels.
[api-example-com-rt, , ]
[petstore-rt, pets-api users-api store-api, ]
[tracks-rt, tracks-api, {"usagePlans":"dev-portal"}]
kubectl get rt -n gloo-mesh-gateways -o=jsonpath='{range .items[*]}[{.metadata.name}, {.metadata.labels}]{"\n"}{end}'

Example output:

  • The api-example-com-rt route table does not have any labels.
  • The petstore-rt route table has two labels, api: petstore and portal: dev-portal. If this is a child route table, confirm that these labels do not overwrite any labels that are inherited from the parent route table.
  • The tracks-rt route table has two labels, api: tracks and portal: dev-portal. If this is a child route table, confirm that these labels do not overwrite any labels that are inherited from the parent route table.
[api-example-com-rt, ]
[petstore-rt, {"api":"petstore","portal":"dev-portal"}]
[tracks-rt, {"api":"tracks","portal":"dev-portal"}]

Improved error logging

The Gloo management server translates your Gloo custom resources into many underlying Istio resources. When the management server cannot translate a resource, it returns debug logs that vary in severity from errors to warnings or info.

In this release, the management server logs are improved in the following ways:

For example, you might have a service that does not select any existing workloads. This scenario might be intentional, such as if you use a CI/CD tool like ArgoCD to deploy your environment in phases. Translation does not complete until you update the service's selector or create the workload. Previously, the translation error would show up many times in the management server logs, even though the situation is intentional and the management server is healthy and can translate other objects. Now, the translation error is logged less verbosely at the debug level.

You can still review translation errors in the following ways:

New features

Redis safe mode

In versions 2.5.3 and lower, a race condition was identified that can be triggered during simultaneous restarts of the management plane and Redis, including an upgrade to a newer Gloo Mesh Gateway version. If hit, this failure mode can lead to partial translations on the Gloo management server which can result in Istio resources being temporarily deleted from the output snapshots that are sent to the Gloo agents. For more information about this failure scenario, see Redis and Gloo management server restart. To resolve this issue, a new safe mode feature was added that you can enable by setting glooMgmtServer.safeMode Helm chart option to true.

If safe mode is enabled, translation of input snapshots halts until the input snapshots of all registered Gloo agents are present in the Redis cache. This feature improves management plane stability during disaster scenarios and upgrades. For more information, see Safe mode. The safe mode feature is disabled by default.

Redis safe start window

With safe mode, the Gloo management server halts translation until the input snapshots of all workload clusters are present in the Redis cache. However, if clusters have connectivity issues, translation might be halted for a long time, even for healthy clusters. You might want translation to resume after a certain period of time, even if some input snapshots are missing in the Redis cache. To do so, you must use the glooMgmtServer.safeStartWindow field in your Gloo management server Helm values file to specify the time in seconds to halt translation. Note that this setting is ignored if glooMgmtServer.safeMode is set to true. The default value is 180 seconds. You can disable the wait time by setting this field to 0 (zero). For more information, see Option 2: Safe start window.

I/O threads for Redis

A new Helm value redis.deployment.ioThreads was introduced to specify the number of I/O threads to use for the built-in Redis instance. Redis is mostly single threaded, however some operations, such as UNLINK or slow I/O accesses can be performed on side threads. Increasing the number of side threads can help improve and maximize the performance of Redis as these operations can run in parallel.

The default and minimum valid value for this setting is 1. If you plan to increase the number of I/O side threads, make sure that you also change the CPU requests and CPU limits for the Redis pod. Set the CPU requests and limits to the same number that you use for the I/O side threads plus 1. That way, you can ensure that each side thread has an available CPU core, and that an additional CPU core is left for the main Redis thread. For example, if you want to set I/O threads to 2, make sure to add 3 CPU cores to the resource requests and limits for the Redis pod. You can find further recommendations regarding I/O threads in this Redis configuration example.

If you set I/O threads, the Redis pod must be restarted during the upgrade so that the changes can be applied. During the restart, the input snapshots from all connected Gloo agents are removed from the Redis cache. If you also update settings in the Gloo management server that require the management server pod to restart, the management server's local memory is cleared and all Gloo agents are disconnected. Although the Gloo agents attempt to reconnect to send their input snapshots and re-populate the Redis cache, some agents might take longer to connect or fail to connect at all. To ensure that the Gloo management server halts translation until the input snapshots of all workload cluster agents are present in Redis, it is recommended to enable safe mode on the management server alongside updating the I/O threads for the Redis pod. For more information, see Safe mode. Note that in version 2.6.0 and later, safe mode is enabled by default.

To update I/O side threads in Redis as part of your Gloo Mesh Gateway upgrade:

  1. Scale down the number of Gloo management server pods to 0.

    kubectl scale deployment gloo-mesh-mgmt-server --replicas=0 -n gloo-mesh
    
  2. Upgrade Gloo Mesh Gateway and use the following settings in your Helm values file for the management server. Make sure to also increase the number of CPU cores to one core per thread, and add an additional CPU core for the main Redis thread. The following example also enables safe mode on the Gloo management server to ensure translation is done with the complete context of all workload clusters.

    glooMgmtServer:
      safeMode: true
    redis: 
      deployment: 
        ioThreads: 2
        resources: 
          requests: 
            cpu: 3
          limits: 
            cpu: 3
    

Known issues

The Solo team fixes bugs, delivers new features, and makes changes on a regular basis as described in the changelog. Some issues, however, might impact many users for common use cases. These known issues are as follows: