Configure global rate limiting rules across all of your Gateways to protect the backing services in your cluster.

About

Global rate limiting in Gloo Gateway is powered by Envoy’s rate limiting service protocol. With global rate limiting, you can apply distributed, consistent rate limits across multiple Gateways. Unlike local rate limiting, which operates per Gateway instance, global rate limiting uses a central service to coordinate rate limits. Therefore, to use global rate limiting, you must bring your own rate limit service that implements the Envoy protocol.

With your own rate limit service in place, you get benefits such as:

  • Coordinated rate limiting across multiple Gateways.
  • Centralized rate limit management with shared counters.
  • Dynamic descriptor-based rate limits that can consider multiple request attributes.
  • Consistent user experience, regardless of which Gateway receives the request.

Request flow

Review the following sequence diagram to understand the request flow with global rate limiting.

sequenceDiagram
    participant Client
    participant Gateway
    participant RateLimiter
    participant App

    Client->>Gateway: 1. Send request to protected App
    Gateway->>Gateway: 2. Receive request
    Gateway->>RateLimiter: 3. Extract descriptors & send to Rate Limit Service
    RateLimiter->>RateLimiter: 4. Apply configured limits for descriptors
    RateLimiter->>Gateway: Return decision
    alt Request allowed
        Gateway->>App: 5. Forward request to App
        App->>Gateway: Return response
        Gateway->>Client: Return response to Client
    else Rate limit reached
        Gateway->>Client: 6. Deny request & return rate limit message
    end
  1. The Client sends a request to an App that is protected by the Gateway.
  2. The Gateway receives the request.
  3. The Gateway extracts descriptors and sends them to the Rate Limit Service.
  4. The Rate Limit Service applies configured limits for those descriptors and returns a decision to the Gateway.
  5. If allowed, the Gateway forwards the request to the App.
  6. If the rate limit is reached, the Gateway denies the request and returns a message to the Client.

Architecture

The global rate limiting feature consists of three components:

  1. GlooTrafficPolicy with rateLimit.global: Configure your rate limit policy in a GlooTrafficPolicy. The rate limit policy includes the descriptors to extract from requests for the Gateway to send to the Rate Limit Service.
  2. GatewayExtension: Connect Gloo Gateway with the Rate Limit Service by using a kgateway GatewayExtension.
  3. Rate Limit Service - An external service that you set up to implement the Envoy Rate Limit protocol. The Rate Limit Service has the actual rate limit values to enforce on requests, based on the descriptors that the GlooTrafficPolicy includes.

Response headers

When rate limiting is enabled, Gloo Gateway adds the following headers to responses. These headers help clients understand their current rate limit status and adapt their behavior accordingly.

HeaderDescriptionExample
x-ratelimit-limitThe rate limit ceiling for the given request10, 10;w=60 (10 requests per 60 seconds)
x-ratelimit-remainingThe number of requests left for the time window5 (5 requests remaining)
x-ratelimit-resetThe time in seconds until the rate limit resets30 (rate limit resets in 30 seconds)
x-envoy-ratelimitedPresent when the request is rate limitedtrue

Before you begin

  1. Follow the Get started guide to install Gloo Gateway.

  2. Follow the Sample app guide to create a gateway proxy with an HTTP listener and deploy the httpbin sample app.

  3. Get the external address of the gateway and save it in an environment variable.

Step 1: Deploy your Rate Limit Service

You can bring your own rate limit service that implements the Envoy Rate Limit gRPC protocol.

To get started, you can try out a demo rate limit service from the kgateway project. For more information, see the GitHub repo.

  1. Create the kgateway-test-extensions namespace.

      kubectl create namespace kgateway-test-extensions
      
  2. Deploy the rate limit service.

      kubectl apply -f https://raw.githubusercontent.com/kgateway-dev/kgateway/refs/heads/main/test/e2e/features/rate_limit/global/testdata/rate-limit-server.yaml
      

Step 2: Define the rate limits

Define the actual rate limit values (requests per unit time) in your Rate Limit Service. For example, using the Envoy Rate Limit service, you configure the rate limits in its configuration file.

The kgateway example that you deployed in the previous step includes the following rate limit configuration as a Kubernetes ConfigMap.

  kubectl describe configmap ratelimit-config -n kgateway-test-extensions
  
Review the following table to understand this configuration.
FieldDescription
domainRequired. A globally unique identifier to group together a set of rate limit rules. This way, different teams can have their own set of rate limits that don’t conflict with each other. Later, you set the domain to use in the kgateway GatewayExtension. If you have different domains for different teams, each team can create their own GatewayExtension that their GlooTrafficPolicy can reference.
descriptorsA list of key-value pairs that the Rate Limit Service uses to select which rate limit to use on matching requests. Descriptors are case-sensitive.
keyRequired. The name for the descriptor to use when matching requests. Later, you use the descriptor key in the GlooTrafficPolicy to decide which rate limits to apply to requests. The Rate Limit Service expects one of the following values for the descriptor key: remote_address for a client IP address (RemoteAddress in the GlooTrafficPolicy), path for path matching (Path in the GlooTrafficPolicy), a key name for header matching (Header in the GlooTrafficPolicy), or a custom key name for matching on a generic key-value pair (Generic in the GlooTrafficPolicy).
valueOptional. Each descriptor can have a value for more specific matching. For example, you might have two descriptor keys that are both for an X-User-ID header, but one also has a value of user1. This way, you can apply different rate limits to the specific value, such as to further restrict or permit a particular user. Similarly, you can take this approach for remote addresses, paths, and generic key-value pairs such as for service plans.
rate_limitOptional. The actual rate limit rule to apply. The example sets different rate limits for each descriptor key. If a descriptor key does not have a rate limit, the GlooTrafficPolicy cannot apply a rate limit to requests, and the requests that match the descriptor are allowed.
unitThe unit of time for the rate limit, such as second, minute, hour, or day.
requests_per_unitThe number of requests to allow per unit of time.

Example output:

  Data
====
config.yaml:
----
domain: api-gateway
descriptors:
  - key: remote_address
    rate_limit:
      unit: minute
      requests_per_unit: 1
  - key: path
    value: "/path1"
    rate_limit:
      unit: minute
      requests_per_unit: 1
  - key: path
    value: "/path2"
    rate_limit:
      unit: minute
      requests_per_unit: 2
  - key: X-User-ID
    rate_limit:
      unit: minute
      requests_per_unit: 1
  - key: X-User-ID
    value: user1
    rate_limit:
      unit: minute
      requests_per_unit: 1
  - key: service
    value: premium-api
    rate_limit:
      unit: minute
      requests_per_unit: 2


BinaryData
====
  

Step 3: Create a GatewayExtension

Create a GatewayExtension resource that points to your Rate Limit Service.

  1. Create a GatewayExtension.

      kubectl apply -f - <<EOF
    apiVersion: gateway.kgateway.dev/v1alpha1
    kind: GatewayExtension
    metadata:
      namespace: gloo-system
      name: global-ratelimit
    spec:
      type: RateLimit
      rateLimit:
        grpcService:
          backendRef:
            name: ratelimit
            namespace: kgateway-test-extensions
            port: 8081
        domain: "api-gateway"
        timeout: "100ms"
        failOpen: false
    EOF
      

    Review the following table to understand this configuration. For more information, see the API docs.

    FieldDescriptionRequired
    grpcServiceConfiguration for connecting to the gRPC rate limit service.Yes
    domainDomain identity for the rate limit service. If you have different domains for different teams, each team can create their own GatewayExtension that their own GlooTrafficPolicy can reference.Yes
    timeoutTimeout for rate limit service calls, such as 100ms.No
    failOpenWhen true, requests continue even if the rate limit service is unavailable.No (defaults to false)
  2. Create a Kubernetes ReferenceGrant to allow the GatewayExtension to access the Rate Limit Service. Otherwise, you can create the GatewayExtension and GlooTrafficPolicy in the same namespace as the Rate Limit Service.

      kubectl apply -f - <<EOF
    apiVersion: gateway.networking.k8s.io/v1beta1
    kind: ReferenceGrant
    metadata:
      name: global-ratelimit
      namespace: kgateway-test-extensions
    spec:
      from:
      - group: gateway.kgateway.dev
        kind: GatewayExtension
        namespace: gloo-system
      to:
      - group: ""
        kind: Service
    EOF
      

Step 4: Create a GlooTrafficPolicy

Create a GlooTrafficPolicy resource that applies rate limits to your routes. Note that the GlooTrafficPolicy must be in the same namespace as the GatewayExtension to select it.

The GlooTrafficPolicy configures the descriptors that define the dimensions for rate limiting. Each descriptor consists of one or more entries that help categorize and count requests. The descriptor entries match on the descriptor keys that you defined previously in the Rate Limit Service.

Entries can be of one of the following types: RemoteAddress, Path, Header, or Generic. You can combine different entry types so that they are applied together as a rate limit, such as RemoteAddress and Generic or Header and Path. The following table describes the different descriptor entry types. For more information, see the API docs.

TypeDescriptionAdditional Fields
HeaderExtract the descriptor value from a request header. The header name must match a descriptor key in the Rate Limit Service.header: The name of the header to extract.
GenericUse a static key-value pair that you define as the descriptor.generic.key: The descriptor key that matches the descriptor key in the Rate Limit Service.
generic.value: The static value for more specific matching.
PathUse the request path as the descriptor value. The Path entry type is mapped to the path descriptor key in the Rate Limit Service.None
RemoteAddressUse the client’s IP address as the descriptor value. The RemoteAddress entry type is mapped to the remote_address descriptor key in the Rate Limit Service.None

Flip through the tabs for different example rate limit policies. Note that the examples apply to the Gateway that you created before you began, but you can also apply a GlooTrafficPolicy to an HTTPRoute or specific route.

Step 5: Test the rate limits

Test the rate limits by sending requests to the Gateway. The following steps assume that you created the client IP address example GlooTrafficPolicy, which limits requests to 1 request per minute for a particular client IP address.

  1. Send a test request to the httpbin sample app. The request succeeds because you did not exceed the rate limit of 1 request per minute.

    Example output:

      HTTP/1.1 200 OK
    ...
      
  2. Repeat the request. The request fails because you exceeded the rate limit of 1 request per minute.

    Example output:

      HTTP/1.1 429 Too Many Requests
    ...
      

Cleanup

You can remove the resources that you created in this guide.
  kubectl delete -f https://raw.githubusercontent.com/kgateway-dev/kgateway/refs/heads/main/test/e2e/features/rate_limit/global/testdata/rate-limit-server.yaml
kubectl delete gatewayextension global-ratelimit -n gloo-system
kubectl delete GlooTrafficPolicy ip-rate-limit user-rate-limit combined-rate-limit local-global-rate-limit -n gloo-system