BYO global rate limit service
Bring your own rate limit service for global rate limiting.
The following guide shows you how to bring your own rate limit service. Your rate limit service must implement the Envoy protocol to integrate with the upstream kgateway project that powers Gloo Gateway. To use the built-in global rate limiting service in Gloo Gateway, see the enterprise global rate limit guides.
Configure global rate limiting rules across all of your Gateways to protect the backing services in your cluster.
About
Global rate limiting in Gloo Gateway is powered by Envoy’s rate limiting service protocol. With global rate limiting, you can apply distributed, consistent rate limits across multiple Gateways. Unlike local rate limiting, which operates per Gateway instance, global rate limiting uses a central service to coordinate rate limits. Therefore, to use global rate limiting, you must bring your own rate limit service that implements the Envoy protocol.
With your own rate limit service in place, you get benefits such as:
- Coordinated rate limiting across multiple Gateways.
- Centralized rate limit management with shared counters.
- Dynamic descriptor-based rate limits that can consider multiple request attributes.
- Consistent user experience, regardless of which Gateway receives the request.
Request flow
Review the following sequence diagram to understand the request flow with global rate limiting.
sequenceDiagram
participant Client
participant Gateway
participant RateLimiter
participant App
Client->>Gateway: 1. Send request to protected App
Gateway->>Gateway: 2. Receive request
Gateway->>RateLimiter: 3. Extract descriptors & send to Rate Limit Service
RateLimiter->>RateLimiter: 4. Apply configured limits for descriptors
RateLimiter->>Gateway: Return decision
alt Request allowed
Gateway->>App: 5. Forward request to App
App->>Gateway: Return response
Gateway->>Client: Return response to Client
else Rate limit reached
Gateway->>Client: 6. Deny request & return rate limit message
end
- The Client sends a request to an App that is protected by the Gateway.
- The Gateway receives the request.
- The Gateway extracts descriptors and sends them to the Rate Limit Service.
- The Rate Limit Service applies configured limits for those descriptors and returns a decision to the Gateway.
- If allowed, the Gateway forwards the request to the App.
- If the rate limit is reached, the Gateway denies the request and returns a message to the Client.
Architecture
The global rate limiting feature consists of three components:
- GlooTrafficPolicy with rateLimit.global: Configure your rate limit policy in a GlooTrafficPolicy. The rate limit policy includes the descriptors to extract from requests for the Gateway to send to the Rate Limit Service.
- GatewayExtension: Connect Gloo Gateway with the Rate Limit Service by using a kgateway GatewayExtension.
- Rate Limit Service - An external service that you set up to implement the Envoy Rate Limit protocol. The Rate Limit Service has the actual rate limit values to enforce on requests, based on the descriptors that the GlooTrafficPolicy includes.
Response headers
When rate limiting is enabled, Gloo Gateway adds the following headers to responses. These headers help clients understand their current rate limit status and adapt their behavior accordingly.
| Header | Description | Example |
|---|---|---|
| x-ratelimit-limit | The rate limit ceiling for the given request | 10, 10;w=60 (10 requests per 60 seconds) |
| x-ratelimit-remaining | The number of requests left for the time window | 5 (5 requests remaining) |
| x-ratelimit-reset | The time in seconds until the rate limit resets | 30 (rate limit resets in 30 seconds) |
| x-envoy-ratelimited | Present when the request is rate limited | true |
Before you begin
Follow the Get started guide to install Gloo Gateway.
Follow the Sample app guide to create a gateway proxy with an HTTP listener and deploy the httpbin sample app.
Get the external address of the gateway and save it in an environment variable.
Step 1: Deploy your Rate Limit Service
You can bring your own rate limit service that implements the Envoy Rate Limit gRPC protocol.
To get started, you can try out a demo rate limit service from the kgateway project. For more information, see the GitHub repo.
Create the
kgateway-test-extensionsnamespace.kubectl create namespace kgateway-test-extensionsDeploy the rate limit service.
kubectl apply -f https://raw.githubusercontent.com/kgateway-dev/kgateway/refs/heads/main/test/e2e/features/rate_limit/global/testdata/rate-limit-server.yaml
Step 2: Define the rate limits
Define the actual rate limit values (requests per unit time) in your Rate Limit Service. For example, using the Envoy Rate Limit service, you configure the rate limits in its configuration file.
The kgateway example that you deployed in the previous step includes the following rate limit configuration as a Kubernetes ConfigMap.
kubectl describe configmap ratelimit-config -n kgateway-test-extensions
| Field | Description |
|---|---|
| domain | Required. A globally unique identifier to group together a set of rate limit rules. This way, different teams can have their own set of rate limits that don’t conflict with each other. Later, you set the domain to use in the kgateway GatewayExtension. If you have different domains for different teams, each team can create their own GatewayExtension that their GlooTrafficPolicy can reference. |
| descriptors | A list of key-value pairs that the Rate Limit Service uses to select which rate limit to use on matching requests. Descriptors are case-sensitive. |
| key | Required. The name for the descriptor to use when matching requests. Later, you use the descriptor key in the GlooTrafficPolicy to decide which rate limits to apply to requests. The Rate Limit Service expects one of the following values for the descriptor key: remote_address for a client IP address (RemoteAddress in the GlooTrafficPolicy), path for path matching (Path in the GlooTrafficPolicy), a key name for header matching (Header in the GlooTrafficPolicy), or a custom key name for matching on a generic key-value pair (Generic in the GlooTrafficPolicy). |
| value | Optional. Each descriptor can have a value for more specific matching. For example, you might have two descriptor keys that are both for an X-User-ID header, but one also has a value of user1. This way, you can apply different rate limits to the specific value, such as to further restrict or permit a particular user. Similarly, you can take this approach for remote addresses, paths, and generic key-value pairs such as for service plans. |
| rate_limit | Optional. The actual rate limit rule to apply. The example sets different rate limits for each descriptor key. If a descriptor key does not have a rate limit, the GlooTrafficPolicy cannot apply a rate limit to requests, and the requests that match the descriptor are allowed. |
| unit | The unit of time for the rate limit, such as second, minute, hour, or day. |
| requests_per_unit | The number of requests to allow per unit of time. |
Example output:
Data
====
config.yaml:
----
domain: api-gateway
descriptors:
- key: remote_address
rate_limit:
unit: minute
requests_per_unit: 1
- key: path
value: "/path1"
rate_limit:
unit: minute
requests_per_unit: 1
- key: path
value: "/path2"
rate_limit:
unit: minute
requests_per_unit: 2
- key: X-User-ID
rate_limit:
unit: minute
requests_per_unit: 1
- key: X-User-ID
value: user1
rate_limit:
unit: minute
requests_per_unit: 1
- key: service
value: premium-api
rate_limit:
unit: minute
requests_per_unit: 2
BinaryData
====
Step 3: Create a GatewayExtension
Create a GatewayExtension resource that points to your Rate Limit Service.
Create a GatewayExtension.
kubectl apply -f - <<EOF apiVersion: gateway.kgateway.dev/v1alpha1 kind: GatewayExtension metadata: namespace: gloo-system name: global-ratelimit spec: type: RateLimit rateLimit: grpcService: backendRef: name: ratelimit namespace: kgateway-test-extensions port: 8081 domain: "api-gateway" timeout: "100ms" failOpen: false EOFReview the following table to understand this configuration. For more information, see the API docs.
Field Description Required grpcService Configuration for connecting to the gRPC rate limit service. Yes domain Domain identity for the rate limit service. If you have different domains for different teams, each team can create their own GatewayExtension that their own GlooTrafficPolicy can reference. Yes timeout Timeout for rate limit service calls, such as 100ms.No failOpen When true, requests continue even if the rate limit service is unavailable.No (defaults to false)Create a Kubernetes ReferenceGrant to allow the GatewayExtension to access the Rate Limit Service. Otherwise, you can create the GatewayExtension and GlooTrafficPolicy in the same namespace as the Rate Limit Service.
kubectl apply -f - <<EOF apiVersion: gateway.networking.k8s.io/v1beta1 kind: ReferenceGrant metadata: name: global-ratelimit namespace: kgateway-test-extensions spec: from: - group: gateway.kgateway.dev kind: GatewayExtension namespace: gloo-system to: - group: "" kind: Service EOF
Step 4: Create a GlooTrafficPolicy
Create a GlooTrafficPolicy resource that applies rate limits to your routes. Note that the GlooTrafficPolicy must be in the same namespace as the GatewayExtension to select it.
The GlooTrafficPolicy configures the descriptors that define the dimensions for rate limiting. Each descriptor consists of one or more entries that help categorize and count requests. The descriptor entries match on the descriptor keys that you defined previously in the Rate Limit Service.
Entries can be of one of the following types: RemoteAddress, Path, Header, or Generic. You can combine different entry types so that they are applied together as a rate limit, such as RemoteAddress and Generic or Header and Path. The following table describes the different descriptor entry types. For more information, see the API docs.
| Type | Description | Additional Fields |
|---|---|---|
| Header | Extract the descriptor value from a request header. The header name must match a descriptor key in the Rate Limit Service. | header: The name of the header to extract. |
| Generic | Use a static key-value pair that you define as the descriptor. | generic.key: The descriptor key that matches the descriptor key in the Rate Limit Service.generic.value: The static value for more specific matching. |
| Path | Use the request path as the descriptor value. The Path entry type is mapped to the path descriptor key in the Rate Limit Service. | None |
| RemoteAddress | Use the client’s IP address as the descriptor value. The RemoteAddress entry type is mapped to the remote_address descriptor key in the Rate Limit Service. | None |
Flip through the tabs for different example rate limit policies. Note that the examples apply to the Gateway that you created before you began, but you can also apply a GlooTrafficPolicy to an HTTPRoute or specific route.
Step 5: Test the rate limits
Test the rate limits by sending requests to the Gateway. The following steps assume that you created the client IP address example GlooTrafficPolicy, which limits requests to 1 request per minute for a particular client IP address.
Send a test request to the httpbin sample app. The request succeeds because you did not exceed the rate limit of 1 request per minute.
Example output:
HTTP/1.1 200 OK ...Repeat the request. The request fails because you exceeded the rate limit of 1 request per minute.
Example output:
HTTP/1.1 429 Too Many Requests ...
Cleanup
You can remove the resources that you created in this guide.
kubectl delete -f https://raw.githubusercontent.com/kgateway-dev/kgateway/refs/heads/main/test/e2e/features/rate_limit/global/testdata/rate-limit-server.yaml
kubectl delete gatewayextension global-ratelimit -n gloo-system
kubectl delete GlooTrafficPolicy ip-rate-limit user-rate-limit combined-rate-limit local-global-rate-limit -n gloo-system