Envoy API
Apply global rate limits in the Envoy API style.
Gloo Gateway provides an enterprise rate limiting service that you can use to configure Envoy API global rate limiting rules. For more information, see the About topic.
Before you begin
Follow the Get started guide to install Gloo Gateway.
Follow the Sample app guide to create a gateway proxy with an HTTP listener and deploy the httpbin sample app.
Get the external address of the gateway and save it in an environment variable.
Step 1: Create a RateLimitConfig
Prepare a RateLimitConfig resource that defines the descriptors and actions for your rate limiting rules.
The following subsections provide examples for different types of rate limiting rules, as well as ways to use the rules in combination with each other for more complex scenarios. For more information on specific fields, see the Ratelimit API in the Gloo Mesh Enterprise docs.
Generic key
A generic key is a specific string literal that is used to match an action to a descriptor.
In the following example, you create a policy that rate limits requests to one request per minute.
kubectl apply -f - <<EOF
apiVersion: ratelimit.solo.io/v1alpha1
kind: RateLimitConfig
metadata:
name: ratelimit-config
namespace: gloo-system
spec:
raw:
descriptors:
- key: generic_key
value: counter
rateLimit:
requestsPerUnit: 1
unit: MINUTE
rateLimits:
- actions:
- genericKey:
descriptorValue: counter
EOF
Request headers
Limit requests based on a specific header that is present in your request.
In the following example, you create a policy that rate limits requests that include an x-type request header to one request per minute.
kubectl apply -f - <<EOF
apiVersion: ratelimit.solo.io/v1alpha1
kind: RateLimitConfig
metadata:
name: ratelimit-config
namespace: gloo-system
spec:
raw:
descriptors:
- key: type
value:
rateLimit:
requestsPerUnit: 1
unit: MINUTE
rateLimits:
- actions:
- requestHeaders:
descriptorKey: type
headerName: x-type
EOF
Remote address
Limit requests based on the remote address that sends the request. The remote address is populated from the x-forwarded-for request header.
In the following example, you create a policy that rate limits requests based on the remote address to one request per minute.
kubectl apply -f - <<EOF
apiVersion: ratelimit.solo.io/v1alpha1
kind: RateLimitConfig
metadata:
name: ratelimit-config
namespace: gloo-system
spec:
raw:
descriptors:
- key: remote_address
value:
rateLimit:
requestsPerUnit: 1
unit: MINUTE
rateLimits:
- actions:
- remoteAddress: {}
EOF
Multiple limits per remote address
As shown in previous example, you can use the remote_address descriptor to rate limit based on the downstream client address. In practice, you might want to express multiple rules, such as a per-second and per-minute limit.
To do so, you can make remote_address a nested descriptor, with distinct generic keys.
kubectl apply -f - <<EOF
apiVersion: ratelimit.solo.io/v1alpha1
kind: RateLimitConfig
metadata:
name: ratelimit-config
namespace: gloo-system
spec:
raw:
descriptors:
- key: generic_key
value: "per-minute"
descriptors:
- key: remote_address
rateLimit:
requestsPerUnit: 20
unit: MINUTE
- key: generic_key
value: "per-second"
descriptors:
- key: remote_address
rateLimit:
requestsPerUnit: 2
unit: SECOND
rateLimits:
- actions:
- genericKey:
descriptorValue: "per-minute"
- remoteAddress: {}
- actions:
- genericKey:
descriptorValue: "per-second"
- remoteAddress: {}
EOF
Tuples in headers
The following example nests descriptors to express rules based on tuples instead of a single value. This rule enforces a limit of 1 request per minute for any unique combination of type and number values in the request header.
If a request has both the x-type and x-number headers, it is counted towards the limit. If the request does not have one or both headers, then no rate limit is enforced.
kubectl apply -f - <<EOF
apiVersion: ratelimit.solo.io/v1alpha1
kind: RateLimitConfig
metadata:
name: ratelimit-config
namespace: gloo-system
spec:
raw:
descriptors:
- key: type
descriptors:
- key: number
rateLimit:
requestsPerUnit: 1
unit: MINUTE
rateLimits:
- actions:
- requestHeaders:
descriptorKey: type
headerName: x-type
- requestHeaders:
descriptorKey: number
headerName: x-number
EOF
Nested descriptors
Building off the tuples in headers example, you might want to enforce a limit if the type is provided but the number is not.
You can nest the number descriptor within the type descriptor.
Then, define actions for two separate rate limits:
- One to increment the counter for the type limit.
- One to increment the counter for the type and number pair, when both are present.
The request results in a 429 rate limit error response if either limit is reached.
Matching is attempted against the key and value pair before matching against only the key.
Note that in the rate limit configuration “tree,” only the leaf values serve as wildcards that set up a unique limit. The nested, non-leaf descriptors that do not have values serve as a catch-all.
If you use nested descriptors and the descriptor has no value, the cache key does not append the value for the nested, non-leaf configuration. In the nested descriptors example, no value is set for type or number. In this case, the same limit is used regardless of the x-type header value that is sent. However, the x-number header value has a different limit per value, because this field is the leaf node in the descriptor tree.
kubectl apply -f - <<EOF
apiVersion: ratelimit.solo.io/v1alpha1
kind: RateLimitConfig
metadata:
name: ratelimit-config
namespace: gloo-system
spec:
raw:
descriptors:
- key: type
rateLimit:
requestsPerUnit: 3
unit: MINUTE
descriptors:
- key: number
rateLimit:
requestsPerUnit: 1
unit: MINUTE
rateLimits:
- actions:
- requestHeaders:
descriptorKey: type
headerName: x-type
- actions:
- requestHeaders:
descriptorKey: type
headerName: x-type
- requestHeaders:
descriptorKey: number
headerName: x-number
EOF
Priority and weights
You can specify weights on descriptors. For a particular request that has multiple sets of matching actions, the server evaluates each and then increments only the matching rules with the highest weight. By default, the weight is 0.
The following example adds a weight: 1 field to the server config. When a request has both the x-type and x-number headers, then the server evaluates both limits: the limit on type alone, and the limit on the combination of type and number.
Because the number has a higher weight, the server increments only that counter. In this setup, requests with a unique type and number are allowed 10 requests per minute, but requests that have only a type are limited to 1 per minute.
To make sure a rule is always applied, you can add the alwaysApply option to the descriptor.
Priority based on HTTP method
A useful tactic for building resilient, distributed systems is to implement different rate limits for different “priorities” or “classes” of traffic. This practice is related to the concept of load shedding.
Suppose you have exposed an API that supports both GET and POST methods for listing data and creating resources. Although both functions are important, ultimately the POST action is more important to your business. Therefore, you want to protect the availability of the POST function at the expense of the less important GET function.
- In the server config,
GETrequests are limited to 2 per minute. - In the client config, the actions are configured to extract the method from the request headers.
kubectl apply -f - <<EOF
apiVersion: ratelimit.solo.io/v1alpha1
kind: RateLimitConfig
metadata:
name: ratelimit-config
namespace: gloo-system
spec:
raw:
descriptors:
# allow 5 calls per minute for any unique host
- key: remote_address
rateLimit:
requestsPerUnit: 5
unit: MINUTE
# specifically limit GET requests from unique hosts to 2 per min
- key: method
value: GET
descriptors:
- key: remote_address
rateLimit:
requestsPerUnit: 2
unit: MINUTE
rateLimits:
- actions:
- remoteAddress: {}
- actions:
- requestHeaders:
descriptorKey: method
headerName: :method
- remoteAddress: {}
EOF
Step 2: Create the policy
Now that you have a RateLimitConfig, create a GlooTrafficPolicy to apply the policy to the routes that you want to rate limit.
The following policy targets the Gateway, but for more options, see Policy attachment.
kubectl apply -f- <<EOF
apiVersion: gloo.solo.io/v1alpha1
kind: GlooTrafficPolicy
metadata:
name: ratelimit
namespace: gloo-system
spec:
targetRefs:
- group: gateway.networking.k8s.io
kind: Gateway
name: http
glooRateLimit:
global:
rateLimitConfigRef:
- name: ratelimit-config
EOF
Step 3: Verify the rate limit
Test the rate limit on a sample route.
Create an HTTPRoute resource for the httpbin app along the
ratelimit.exampledomain, whose parent refers to the same Gateway that the GlooTrafficPolicy applies to.kubectl apply -f- <<EOF apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: httpbin-ratelimit namespace: httpbin spec: parentRefs: - name: http namespace: gloo-system hostnames: - ratelimit.example rules: - backendRefs: - name: httpbin port: 8000 EOFSend a few requests to the httpbin app on the
ratelimit.exampledomain. Verify that your first request succeeds and you get back a 200 HTTP response code. Because you limited requests to one request per minute, subsequent requests within the same minute fail with a 429 HTTP response code.The format of the request varies depending on the type of rate limit that you configured.
Example output for a successful response:
* Mark bundle as not supporting multiuse < HTTP/1.1 200 OK < access-control-allow-credentials: true < access-control-allow-origin: * < date: Mon, 22 Apr 2024 18:36:31 GMT < content-length: 0 < x-envoy-upstream-service-time: 0 < server: envoyExample output when rate limited:
* Mark bundle as not supporting multiuse < HTTP/1.1 429 Too Many Requests < x-envoy-ratelimited: true < date: Mon, 22 Apr 2024 18:33:09 GMT < server: envoy < content-length: 0
Cleanup
You can remove the resources that you created in this guide.
kubectl delete RateLimitConfig ratelimit-config -n gloo-system
kubectl delete HTTPRoute httpbin-ratelimit -n httpbin
kubectl delete GlooTrafficPolicy ratelimit -n gloo-system