Envoy filter policies too large
Several Gloo policies are implemented through Envoy filters in the gateway proxy. If the Envoy filter has an error, your traffic can be affected.
What’s happening
Some policies that depend on Envoy filters no longer take effect. For example, a request that previously had an external auth policy might stop requiring authentication. Even if you did not modify any policies or route tables, you might notice this behavior.
When you check the Gloo agent logs, you notice an error similar to the following:
"msg":"failed upserting resource"
...
"err":"etcdserver: request is too large"
You might notice this behavior with one or more of following policies, which depend on Envoy filters:
- CORS
- CSRF
- DLP
- External auth
- Fault injection
- JWT
- Rate limiting
- Transformation
- WAF
Why it’s happening
Some Gloo policies, such as JWT or other external auth policies are translated into Envoy filters during the Gloo translation process. These Envoy filters are created per proxy and are then applied to the ingress gateway or sidecar proxy to enforce the policies. In environments where you apply policies to a lot of apps and routes, the size of the Envoy filter can become very large and exceed the maximum file size limit in etcd. When the maximum file size limit is reached, new configuration is rejected in etcd and Istio, which leads to policies not being applied and enforced properly.
How to fix it
To prevent this issue, the experimental environment variable EXPERIMENTAL_SEGMENT_ENVOY_FILTERS_BY_MATCHER
was introduced in Gloo Mesh Gateway version 2.3 and 2.4. If enabled on the Gloo management server, the server automatically breaks up large Envoy filters and creates an Envoy filter per matcher. If the environment variable is not set, Envoy filters are created per proxy.
Starting in version 2.5.0, the experimental environment variable is deprecated and its functionality is promoted to standard behavior. The Gloo management server now automatically creates Envoy filters for each matcher. If you did not previously enable the EXPERIMENTAL_SEGMENT_ENVOY_FILTERS_BY_MATCHER
environment variable to create Envoy filters per matcher, your Envoy filters must be re-created as part of the upgrade to version 2.5.0.
To safely upgrade and ensure existing Envoy filters are correctly re-created, the Gloo management server, and the Istio control plane istiod must temporarily be scaled down to 0 replicas. This upgrade procedure can have the following implications for your environment:
- Delayed configuration updates: During the upgrade, the Gloo management server and istiod control plane are temporarily scaled down. Because of that, the propagation of configuration changes to the sidecar or gateway proxy, such as new routing rules or security policies, is delayed. This can cause inconsistencies in traffic management and policy enforcement.
- Complex environments with long translation times: If you have a complex environment and your average translation time regularly takes more than 60 seconds, scaling down
istiod
might have unexpected impacts and delay the time for your traffic to continue as normal. - New pods cannot be added to the mesh: The Istio control plane istiod implements the sidecar injection webhook. When the control plane is scaled down, sidecar injection does not work and new pods cannot be added to the service mesh. You can manually inject sidecars into your pods. However, keep in mind that these pods do not receive traffic as endpoint discovery is also disabled when the Istio control plane is scaled down. After the control plane is scaled back up, pods are automatically injected with sidecars and added to the mesh.
- mTLS certificate issues: If certificates expire while the Istio control plane is not available, mutual TLS between services in the mesh might be impacted.
If you want to enable this feature, use the upgrade steps in version 2.5 as a general guidance for how to safely scale down the Gloo management server, Gloo agent, and istiod, and re-create the Envoy filters in your environment.