Envoy API
Apply global rate limits in the Envoy API style.
Solo Enterprise for kgateway provides an enterprise rate limiting service that you can use to configure Envoy API global rate limiting rules. For more information, see the About topic.
Before you begin
Follow the Get started guide to install Solo Enterprise for kgateway.
Follow the Sample app guide to create a gateway proxy with an HTTP listener and deploy the httpbin sample app.
Get the external address of the gateway and save it in an environment variable.
export INGRESS_GW_ADDRESS=$(kubectl get svc -n kgateway-system http -o jsonpath="{.status.loadBalancer.ingress[0]['hostname','ip']}") echo $INGRESS_GW_ADDRESSkubectl port-forward deployment/http -n kgateway-system 8080:8080
Step 1: Create a RateLimitConfig
Prepare a RateLimitConfig resource that defines the descriptors and actions for your rate limiting rules.
The following subsections provide examples for different types of rate limiting rules, as well as ways to use the rules in combination with each other for more complex scenarios. For more information on specific fields, see the Ratelimit API in the Gloo Mesh Enterprise docs.
Generic key
A generic key is a specific string literal that is used to match an action to a descriptor.
In the following example, you create a policy that rate limits requests to one request per minute.
kubectl apply -f - <<EOF
apiVersion: ratelimit.solo.io/v1alpha1
kind: RateLimitConfig
metadata:
name: ratelimit-config
namespace: kgateway-system
spec:
raw:
descriptors:
- key: generic_key
value: counter
rateLimit:
requestsPerUnit: 1
unit: MINUTE
rateLimits:
- actions:
- genericKey:
descriptorValue: counter
EOFRequest headers
Limit requests based on a specific header that is present in your request.
In the following example, you create a policy that rate limits requests that include an x-type request header to one request per minute.
kubectl apply -f - <<EOF
apiVersion: ratelimit.solo.io/v1alpha1
kind: RateLimitConfig
metadata:
name: ratelimit-config
namespace: kgateway-system
spec:
raw:
descriptors:
- key: type
value:
rateLimit:
requestsPerUnit: 1
unit: MINUTE
rateLimits:
- actions:
- requestHeaders:
descriptorKey: type
headerName: x-type
EOFRemote address
Limit requests based on the remote address that sends the request. The remote address is populated from the x-forwarded-for request header.
In the following example, you create a policy that rate limits requests based on the remote address to one request per minute.
kubectl apply -f - <<EOF
apiVersion: ratelimit.solo.io/v1alpha1
kind: RateLimitConfig
metadata:
name: ratelimit-config
namespace: kgateway-system
spec:
raw:
descriptors:
- key: remote_address
value:
rateLimit:
requestsPerUnit: 1
unit: MINUTE
rateLimits:
- actions:
- remoteAddress: {}
EOFMultiple limits per remote address
As shown in previous example, you can use the remote_address descriptor to rate limit based on the downstream client address. In practice, you might want to express multiple rules, such as a per-second and per-minute limit.
To do so, you can make remote_address a nested descriptor, with distinct generic keys.
kubectl apply -f - <<EOF
apiVersion: ratelimit.solo.io/v1alpha1
kind: RateLimitConfig
metadata:
name: ratelimit-config
namespace: kgateway-system
spec:
raw:
descriptors:
- key: generic_key
value: "per-minute"
descriptors:
- key: remote_address
rateLimit:
requestsPerUnit: 20
unit: MINUTE
- key: generic_key
value: "per-second"
descriptors:
- key: remote_address
rateLimit:
requestsPerUnit: 2
unit: SECOND
rateLimits:
- actions:
- genericKey:
descriptorValue: "per-minute"
- remoteAddress: {}
- actions:
- genericKey:
descriptorValue: "per-second"
- remoteAddress: {}
EOFTuples in headers
The following example nests descriptors to express rules based on tuples instead of a single value. This rule enforces a limit of 1 request per minute for any unique combination of type and number values in the request header.
If a request has both the x-type and x-number headers, it is counted towards the limit. If the request does not have one or both headers, then no rate limit is enforced.
kubectl apply -f - <<EOF
apiVersion: ratelimit.solo.io/v1alpha1
kind: RateLimitConfig
metadata:
name: ratelimit-config
namespace: kgateway-system
spec:
raw:
descriptors:
- key: type
descriptors:
- key: number
rateLimit:
requestsPerUnit: 1
unit: MINUTE
rateLimits:
- actions:
- requestHeaders:
descriptorKey: type
headerName: x-type
- requestHeaders:
descriptorKey: number
headerName: x-number
EOFNested descriptors
Building off the tuples in headers example, you might want to enforce a limit if the type is provided but the number is not.
You can nest the number descriptor within the type descriptor.
Then, define actions for two separate rate limits:
- One to increment the counter for the type limit.
- One to increment the counter for the type and number pair, when both are present.
The request results in a 429 rate limit error response if either limit is reached.
Matching is attempted against the key and value pair before matching against only the key.
Note that in the rate limit configuration “tree,” only the leaf values serve as wildcards that set up a unique limit. The nested, non-leaf descriptors that do not have values serve as a catch-all.
If you use nested descriptors and the descriptor has no value, the cache key does not append the value for the nested, non-leaf configuration. In the nested descriptors example, no value is set for type or number. In this case, the same limit is used regardless of the x-type header value that is sent. However, the x-number header value has a different limit per value, because this field is the leaf node in the descriptor tree.
kubectl apply -f - <<EOF
apiVersion: ratelimit.solo.io/v1alpha1
kind: RateLimitConfig
metadata:
name: ratelimit-config
namespace: kgateway-system
spec:
raw:
descriptors:
- key: type
rateLimit:
requestsPerUnit: 3
unit: MINUTE
descriptors:
- key: number
rateLimit:
requestsPerUnit: 1
unit: MINUTE
rateLimits:
- actions:
- requestHeaders:
descriptorKey: type
headerName: x-type
- actions:
- requestHeaders:
descriptorKey: type
headerName: x-type
- requestHeaders:
descriptorKey: number
headerName: x-number
EOFPriority and weights
You can specify weights on descriptors. For a particular request that has multiple sets of matching actions, the server evaluates each and then increments only the matching rules with the highest weight. By default, the weight is 0.
The following example adds a weight: 1 field to the server config. When a request has both the x-type and x-number headers, then the server evaluates both limits: the limit on type alone, and the limit on the combination of type and number.
Because the number has a higher weight, the server increments only that counter. In this setup, requests with a unique type and number are allowed 10 requests per minute, but requests that have only a type are limited to 1 per minute.
To make sure a rule is always applied, you can add the alwaysApply option to the descriptor.
kubectl apply -f - <<EOF
apiVersion: ratelimit.solo.io/v1alpha1
kind: RateLimitConfig
metadata:
name: ratelimit-config
namespace: kgateway-system
spec:
raw:
descriptors:
- key: type
rateLimit:
requestsPerUnit: 1
unit: MINUTE
descriptors:
- key: number
weight: 1
rateLimit:
requestsPerUnit: 10
unit: MINUTE
rateLimits:
- actions:
- requestHeaders:
descriptorKey: type
headerName: x-type
- requestHeaders:
descriptorKey: number
headerName: x-number
EOFkubectl apply -f - <<EOF
apiVersion: ratelimit.solo.io/v1alpha1
kind: RateLimitConfig
metadata:
name: ratelimit-config
namespace: kgateway-system
spec:
raw:
descriptors:
- key: type
alwaysApply: true
rateLimit:
requestsPerUnit: 1
unit: MINUTE
descriptors:
- key: number
weight: 1
rateLimit:
requestsPerUnit: 10
unit: MINUTE
rateLimits:
- actions:
- requestHeaders:
descriptorKey: type
headerName: x-type
- requestHeaders:
descriptorKey: number
headerName: x-number
EOFPriority based on HTTP method
A useful tactic for building resilient, distributed systems is to implement different rate limits for different “priorities” or “classes” of traffic. This practice is related to the concept of load shedding.
Suppose you have exposed an API that supports both GET and POST methods for listing data and creating resources. Although both functions are important, ultimately the POST action is more important to your business. Therefore, you want to protect the availability of the POST function at the expense of the less important GET function.
- In the server config,
GETrequests are limited to 2 per minute. - In the client config, the actions are configured to extract the method from the request headers.
kubectl apply -f - <<EOF
apiVersion: ratelimit.solo.io/v1alpha1
kind: RateLimitConfig
metadata:
name: ratelimit-config
namespace: kgateway-system
spec:
raw:
descriptors:
# allow 5 calls per minute for any unique host
- key: remote_address
rateLimit:
requestsPerUnit: 5
unit: MINUTE
# specifically limit GET requests from unique hosts to 2 per min
- key: method
value: GET
descriptors:
- key: remote_address
rateLimit:
requestsPerUnit: 2
unit: MINUTE
rateLimits:
- actions:
- remoteAddress: {}
- actions:
- requestHeaders:
descriptorKey: method
headerName: :method
- remoteAddress: {}
EOFStep 2: Create the policy
Now that you have a RateLimitConfig, create a EnterpriseKgatewayTrafficPolicy to apply the policy to the routes that you want to rate limit.
The following policy targets the Gateway, but for more options, see Policy attachment.
kubectl apply -f- <<EOF
apiVersion: enterprisekgateway.solo.io/v1alpha1
kind: EnterpriseKgatewayTrafficPolicy
metadata:
name: ratelimit
namespace: kgateway-system
spec:
targetRefs:
- group: gateway.networking.k8s.io
kind: Gateway
name: http
entRateLimit:
global:
rateLimitConfigRefs:
- name: ratelimit-config
EOFStep 3: Verify the rate limit
Test the rate limit on a sample route.
Create an HTTPRoute resource for the httpbin app along the
ratelimit.exampledomain, whose parent refers to the same Gateway that the EnterpriseKgatewayTrafficPolicy applies to.kubectl apply -f- <<EOF apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: httpbin-ratelimit namespace: httpbin spec: parentRefs: - name: http namespace: kgateway-system hostnames: - ratelimit.example rules: - backendRefs: - name: httpbin port: 8000 EOFSend a few requests to the httpbin app on the
ratelimit.exampledomain. Verify that your first request succeeds and you get back a 200 HTTP response code. Because you limited requests to one request per minute, subsequent requests within the same minute fail with a 429 HTTP response code.The format of the request varies depending on the type of rate limit that you configured.
For the generic key example, the rate limit is 1 request per minute.
- LoadBalancer IP address or hostname:
curl -v http://$INGRESS_GW_ADDRESS:8080/status/200 -H "host: ratelimit.example:8080"- Port-forward for local testing:
curl -v localhost:8080/status/200 -H "host: ratelimit.example"Include the
x-typerequest header to rate limit the request.- LoadBalancer IP address or hostname:
curl -v http://$INGRESS_GW_ADDRESS:8080/status/200 -H "host: ratelimit.example:8080" -H "x-type: mytype"- Port-forward for local testing:
curl -v localhost:8080/status/200 -H "host: ratelimit.example" -H "x-type: mytype"Include the
x-forwarded-forrequest header to rate limit the request.- LoadBalancer IP address or hostname:
curl -v http://$INGRESS_GW_ADDRESS:8080/status/200 -H "host: ratelimit.example:8080" -H "x-forwarded-for: my-remote-address.com"- Port-forward for local testing:
curl -v localhost:8080/status/200 -H "host: ratelimit.example" -H "x-forwarded-for: my-remote-address.com"Include both the
x-typeandx-numberrequest headers to rate limit the request.- LoadBalancer IP address or hostname:
curl -v http://$INGRESS_GW_ADDRESS:8080/status/200 -H "host: ratelimit.example:8080" -H "x-type: mytype" -H "x-number: 123"- Port-forward for local testing:
curl -v localhost:8080/status/200 -H "host: ratelimit.example" -H "x-type: mytype" -H "x-number: 123"Include the
-X GEToption to rate limit the request. On the third request, you get a 429 response code because theGETmethod is limited to 2 requests per minute.- LoadBalancer IP address or hostname:
curl -v http://$INGRESS_GW_ADDRESS:8080/status/200 -H "host: ratelimit.example:8080" -X GET- Port-forward for local testing:
curl -v localhost:8080/status/200 -H "host: ratelimit.example" -X GETExample output for a successful response:
* Mark bundle as not supporting multiuse < HTTP/1.1 200 OK < access-control-allow-credentials: true < access-control-allow-origin: * < date: Mon, 22 Apr 2024 18:36:31 GMT < content-length: 0 < x-envoy-upstream-service-time: 0 < server: envoyExample output when rate limited:
* Mark bundle as not supporting multiuse < HTTP/1.1 429 Too Many Requests < x-envoy-ratelimited: true < date: Mon, 22 Apr 2024 18:33:09 GMT < server: envoy < content-length: 0
Cleanup
You can remove the resources that you created in this guide.kubectl delete RateLimitConfig ratelimit-config -n kgateway-system
kubectl delete HTTPRoute httpbin-ratelimit -n httpbin
kubectl delete EnterpriseKgatewayTrafficPolicy ratelimit -n kgateway-system