Local rate limiting
Use the upstream APIs to implement local rate limiting.
Limit the number of requests that are allowed to enter the cluster before global rate limiting and external auth policies are applied.
The steps in this section use the Envoy-based kgateway data plane. The steps do not work with the agentgateway data plane.
About
Local rate limiting is a coarse-grained rate limiting capability that is primarily used as a first line of defense mechanism to limit the number of requests that are forwarded to your rate limit servers.
Without local rate limiting, all requests are directly forwarded to a rate limit server that you set up where the request is either denied or allowed based on the global rate limiting settings that you configured. However, during an attack, too many requests might be forwarded to your rate limit servers and can cause overload or even failure.
To protect your rate limit servers from being overloaded and to optimize their resource utilization, you can set up local rate limiting in conjunction with global rate limiting. Because local rate limiting is enforced in each Envoy instance that makes up your gateway, no rate limit server is required in this setup. For example, if you have 5 Envoy instances that together represent your gateway, each instance is configured with the limit that you set.
For more information about local rate limiting, see the Envoy documentation.
Architecture
The following image shows how local rate limiting works in Gloo Gateway. As clients send requests to a backend destination, they first reach the Envoy instance that represents your gateway. Local rate limiting settings are applied to an Envoy pod or process. Note that limits are applied to each pod or process. For example, if you have 5 Envoy instances that are configured with a local rate limit of 10 requests per second, the total number of allowed requests per second is 50 (5 x 10). In a global rate limiting setup, this limit is shared between all Envoy instances, so the total number of allowed requests per second is 10.
Depending on your setup, each Envoy instance or pod is configured with a number of tokens in a token bucket. To allow a request, a token must be available in the bucket so that it can be assigned to a downstream connection. Token buckets are refilled occasionally as defined in the refill setting of the local rate limiting configuration. If no token is available, the connection is closed immediately, and a 429 HTTP response code is returned to the client.
When a token is available in the token bucket it can be assigned to an incoming connection. The request is then forwarded to your rate limit server to enforce any global rate limiting settings. For example, the request might be further rate limited based on headers or query parameters. Only requests that are within the local and global rate limits are forwarded to the backend destination in the cluster.
Local rate limiting
In kgateway, you use a GlooTrafficPolicy to set up local rate limiting for your routes. You can choose between the following attachment options:
- A particular route in an HTTPRoute resource: Use the
extensionReffilter in the HTTPRoute to attach the GlooTrafficPolicy to the route you want to rate limit. For an example, see Route configuration. - All routes in an HTTPRoute: Use the
targetRefssection in the GlooTrafficPolicy to attach the policy to a particular HTTPRoute resource. - All routes that the Gateway serves: Use the
targetRefssection in the GlooTrafficPolicy to attach the policy to a Gateway. For an example, see Gateway configuration.
Note that if you apply aGlooTrafficPolicy to an HTTPRoute and to a Gateway at the same time, the HTTPRoute policy takes precedence. For more information, see Multiple targetRefs GlooTrafficPolicy.
Before you begin
Follow the Get started guide to install Gloo Gateway.
Follow the Sample app guide to create a gateway proxy with an HTTP listener and deploy the httpbin sample app.
Get the external address of the gateway and save it in an environment variable.
Route configuration
Set up local rate limiting for a particular route.
Create a GlooTrafficPolicy with your local rate limiting settings.
kubectl apply -f- <<EOF apiVersion: gloo.solo.io/v1alpha1 kind: GlooTrafficPolicy metadata: name: local-ratelimit namespace: httpbin spec: rateLimit: local: tokenBucket: maxTokens: 1 tokensPerFill: 1 fillInterval: 100s EOFSetting Description maxTokensThe maximum number of tokens that are available to use. tokensPerFillThe number of tokens that are added during a refill. fillIntervallThe number of seconds, after which the token bucket is refilled. Create an HTTPRoute that limits requests to the httpbin app along the
ratelimit.exampledomain. To apply the GlooTrafficPolicy that created earlier, you use theextensionReffilter.kubectl apply -f- <<EOF apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: httpbin-ratelimit namespace: httpbin spec: parentRefs: - name: http namespace: gloo-system hostnames: - ratelimit.example rules: - matches: - path: type: PathPrefix value: / filters: - type: ExtensionRef extensionRef: name: local-ratelimit group: gloo.solo.io kind: GlooTrafficPolicy backendRefs: - name: httpbin port: 8000 EOFSend a request to the httpbin app. Verify that you get back a 200 HTTP response code.
Example output:
* Request completely sent off < HTTP/1.1 200 OK HTTP/1.1 200 OK < access-control-allow-credentials: true access-control-allow-credentials: true < access-control-allow-origin: * access-control-allow-origin: * < content-length: 0 content-length: 0 < x-envoy-upstream-service-time: 1 x-envoy-upstream-service-time: 1 < server: envoy server: envoySend another request to the httpbin app. Note that this time the request is denied with a 429 HTTP response code and a
local_rate_limitedmessage in your CLI output. Because the route is configured with only 1 token that is refilled every 100 seconds, the token was assigned to the connection of the first request. No tokens were available to be assigned to the second request. If you wait for 100 seconds, the token bucket is refilled and a new connection can be accepted by the route.Example output:
... * Mark bundle as not supporting multiuse < HTTP/1.1 429 Too Many Requests HTTP/1.1 429 Too Many Requests < x-ratelimit-limit: 1 x-ratelimit-limit: 1 < x-ratelimit-remaining: 0 x-ratelimit-remaining: 0 < x-ratelimit-reset: 79 x-ratelimit-reset: 79 ... Connection #0 to host 34.XXX.XX.XXX left intact local_rate_limited
Gateway configuration
Instead of applying local rate limiting to a particular route, you can also apply it to an entire gateway. This way, the local rate limiting settings are applied to all the routes that the gateway serves.
Create a GlooTrafficPolicy with your local rate limiting settings. Use the
targetRefssection to apply the policy to a specific Gateway. The policy automatically applies to all the routes that the Gateway serves.kubectl apply -f- <<EOF apiVersion: gloo.solo.io/v1alpha1 kind: GlooTrafficPolicy metadata: name: local-ratelimit namespace: gloo-system spec: targetRefs: - group: gateway.networking.k8s.io kind: Gateway name: http rateLimit: local: tokenBucket: maxTokens: 1 tokensPerFill: 1 fillInterval: 100s EOFSetting Description targetRefsSelect the Gateway that you want to apply your local rate limiting configuration to. In this example, the policy is applied to all the routes that the httpgateway serves.maxTokensThe maximum number of tokens that are available to use. tokensPerFillThe number of tokens that are added during a refill. fillIntervallThe number of seconds, after which the token bucket is refilled. Send a request to the httpbin app alongside the
www.example.comdomain that you set up as part of the getting started tutorial. Verify that the request succeeds.Example output:
* Request completely sent off < HTTP/1.1 200 OK HTTP/1.1 200 OK < access-control-allow-credentials: true access-control-allow-credentials: true < access-control-allow-origin: * access-control-allow-origin: * < content-length: 0 content-length: 0 < x-envoy-upstream-service-time: 1 x-envoy-upstream-service-time: 1 < server: envoy server: envoySend another request to the httpbin app. Note that this time the request is denied with a 429 HTTP response code and a
local_rate_limitedmessage in your CLI output. Because the gateway is configured with only 1 token that is refilled every 100 seconds, the token was assigned to the connection of the first request. No tokens were available to be assigned to the second request. If you wait for 100 seconds, the token bucket is refilled and a new connection can be accepted by the gateway.Example output:
... * Mark bundle as not supporting multiuse < HTTP/1.1 429 Too Many Requests HTTP/1.1 429 Too Many Requests < x-ratelimit-limit: 1 x-ratelimit-limit: 1 < x-ratelimit-remaining: 0 x-ratelimit-remaining: 0 < x-ratelimit-reset: 79 x-ratelimit-reset: 79 ... Connection #0 to host 34.XXX.XX.XXX left intact local_rate_limited
Disable rate limiting for a route
Sometimes, you might want to disable
rate limiting
for a route. For example, you might have system critical routes that should be accessible even under high traffic conditions, such as a health check or admin endpoints. You can exclude a route from rate limiting by setting rateLimit.local to {} in the GlooTrafficPolicy.
Create a Gateway-level GlooTrafficPolicy to enforce local rate limiting on all routes. For more information, refer to the Gateway configuration.
kubectl apply -f- <<EOF apiVersion: gloo.solo.io/v1alpha1 kind: GlooTrafficPolicy metadata: name: local-ratelimit namespace: gloo-system spec: targetRefs: - group: gateway.networking.k8s.io kind: Gateway name: http rateLimit: local: tokenBucket: maxTokens: 1 tokensPerFill: 1 fillInterval: 100s EOFCreate an HTTPRoute for the route that you want to exclude from rate limiting, such as
/anythingon thehttpbinapp. Note that because no GlooTrafficPolicy applies to this HTTPRoute yet, the Gateway-level rate limit policy is enforced for the/anythingroute.kubectl apply -f- <<EOF apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: httpbin-anything namespace: httpbin spec: parentRefs: - name: http namespace: gloo-system hostnames: - www.example.com rules: - matches: - path: type: PathPrefix value: /anything backendRefs: - name: httpbin port: 8000 EOFSend two requests to verify that the route is rate limited due to the Gateway-level GlooTrafficPolicy that allows only 1 request per 100 seconds.
Example output: Verify that the first request succeeds and the second request is rate limited.
Request 1:
< HTTP/1.1 200 OK HTTP/1.1 200 OK ...Request 2:
< HTTP/1.1 429 Too Many Requests HTTP/1.1 429 Too Many Requests < x-ratelimit-limit: 1 x-ratelimit-limit: 1 < x-ratelimit-remaining: 0 x-ratelimit-remaining: 0 < x-ratelimit-reset: 79 x-ratelimit-reset: 79 ... Connection #0 to host 34.XXX.XX.XXX left intact local_rate_limitedCreate a GlooTrafficPolicy to disable rate limiting for the HTTPRoute.
kubectl apply -f- <<EOF apiVersion: gloo.solo.io/v1alpha1 kind: GlooTrafficPolicy metadata: name: disable-ratelimit namespace: httpbin spec: targetRefs: - group: gateway.networking.k8s.io kind: HTTPRoute name: httpbin-anything rateLimit: local: {} EOFRepeat the requests. This time, the requests succeed because the HTTPRoute is excluded from rate limiting.
Example output:
Request 1:
< HTTP/1.1 200 OK HTTP/1.1 200 OK ...Request 2:
< HTTP/1.1 200 OK HTTP/1.1 200 OK ...
Cleanup
You can remove the resources that you created in this guide.
kubectl delete GlooTrafficPolicy local-ratelimit -n gloo-system
kubectl delete GlooTrafficPolicy disable-ratelimit -n httpbin
kubectl delete httproute httpbin-ratelimit -n httpbin
kubectl delete httproute httpbin-anything -n httpbin