Adaptive request concurrency

Dynamically adjust the maximum number of concurrent requests that can be sent to a destination.

About adaptive concurrency

The adaptive concurrency filter automatically adjusts the rate of requests sent to a backend service based on the observed latency of the responses from that service. It calculates the concurrency values by sampling the latency of completed requests in a time window, and comparing the measured samples against the expected latency for the service.

For example, consider a web service that receives incoming user requests. Under normal load conditions, it can handle 100 requests per second. During peak times, the number of requests might increase significantly. With adaptive concurrency, the listener starts by allowing a certain number of concurrent requests, such as 100. If the response times are within acceptable limits, the service maintains this rate. However, if the response times begin to increase, indicating potential strain on the service, the listener automatically reduces the number of concurrent requests it allows, such as to 80. This reduction helps prevent the service from becoming overwhelmed, and keeps response times in check.

In this way, the policy can help you protect apps from being overloaded by too many requests, optimize response times, and balance resource utilization. Keep in mind that with this policy enabled, clients can start to notice degraded service before your backend apps actually are unavailable. You can continue to fine-tune the policy settings based on usage to find the right balance between service available and concurrency protection for your apps.

How is the filter configured?

The filter uses a gradient controller to calculate forwarding decisions based on an ideal, minimum round-trip time (minRTT) for a service. The controller periodically measures the minRTT by allowing only a very low outstanding request count to a service, and measuring the latency under these ideal conditions. Most of the settings in the policy configure how the gradient controller measures the minRTT. The controller then uses the minRTT value to calculate the concurrency limit. For more information about the formulas that the controller calculates, see the Envoy adaptive concurrency docs.

How is this different than Envoy circuit breaking?

The goal of adaptive concurrency is to optimize throughput without overloading backend services. By dynamically adjusting the concurrency of traffic, the filter continually finds the ideal number of concurrent connections to balance between underutilization and overloading.

Circuit breaking monitors for certain thresholds, such as the number of failed requests, timeouts, or system errors. When these thresholds are exceeded, the circuit trips, and further requests are blocked until the circuit breaker configured threshold is no longer met. Circuit breaking does not continuously adapt these thresholds based on fluctuating network conditions.

More information

For more information, see the following resources.

If you import or export resources across workspaces, your policies might not apply. For more information, see Import and export policies.

Before you begin

This guide assumes that you use the same names for components like clusters, workspaces, and namespaces as in the getting started. If you have different names, make sure to update the sample configuration files in this guide.
  1. Complete the multicluster getting started guide to set up the following testing environment.
    • Three clusters along with environment variables for the clusters and their Kubernetes contexts.
    • The Gloo Platform CLI, meshctl, along with other CLI tools such as kubectl and istioctl.
    • The Gloo management server in the management cluster, and the Gloo agents in the workload clusters.
    • Istio installed in the workload clusters.
    • A simple Gloo workspace setup.
  2. Install Bookinfo and other sample apps.

Configure adaptive request concurrency policies

You can apply an adaptive request concurrency policy at the destination level. For more information, see Applying policies.

Review the following sample configuration file.

apiVersion: resilience.policy.gloo.solo.io/v2
kind: AdaptiveRequestConcurrencyPolicy
metadata:
  annotations:
    cluster.solo.io/cluster: ""
  name: simple-adaptive-request-concurrency-policy
  namespace: httpbin
spec:
  applyToDestinations:
  - port:
      number: 8000
    selector:
      labels:
        app: httpbin
  config:
    concurrencyLimitExceededStatus: 503
    concurrencyUpdateIntervalMillis: 3000
    maxConcurrencyLimit: 6
    minRttCalcParams:
      bufferPercentile: 0
      intervalMillis: 1000
      jitterPercentile: 0
      minConcurrency: 6
      requestCount: 1

Review the following table to understand this configuration. For more information, see the API docs.

Setting Description
spec.applyToDestinations Configure which destinations to apply the policy to, by using labels. Destinations can be a Kubernetes service, VirtualDestination, or ExternalService. If you do not specify any destinations or routes, the policy applies to all destinations in the workspace by default. If you do not specify any destinations but you do specify a route, the policy applies to the route but to no destinations.
spec.config.concurrencyLimitExceededStatus Return a custom HTTP response status code to the client when the concurrency limit is exceeded. In this example, the service returns a 503 HTTP response code to any new client requests that exceed the limit for the current timeframe.
spec.config.concurrencyUpdateIntervalMillis The period of time during which request latency samples are taken to recalculate the concurrency limit. This example gathers request latency samples in a timefame of 3000ms.
spec.config.maxConcurrencyLimit The allowed upper-bound on the calculated concurrency limit. This example caps the concurrency limit to a maximum of 6 connections, in the case that the calculated concurrency limit exceeds this value.
spec.config.minRttCalcParams.bufferPercentile Add a buffer to the measured minRTT to stabilize natural variability in latency. The buffer is represented as a percentage of the measured value, and can be adjusted to allow more or less tolerance to the sampled latency values. This example does not add a buffer.
spec.config.minRttCalcParams.intervalMillis The amount of time between each minRTT remeasurement. This example recalculates the minRTT every 1000ms.
spec.config.minRttCalcParams.jitterPercentile Add a random delay to the start of each minRTT measurement, represented as a percentage of the interval between each remeasurement (spec.config.minRttCalcParams.intervalMillis). For example, if the interval is 1000ms and the jitter is 15%, the next minRTT measurement begins in the range of 1000ms - 1150ms, because a delay between 0ms - 150ms is added to the 1000ms interval. This example does not add a jitter.
spec.config.minRttCalcParams.minConcurrency Temporarily set the concurrency limit until the latest minRTT measurement is complete. This example temporarily sets the concurrency limit to 6 connections.
spec.config.minRttCalcParams.requestCount The number of requests to sample during the spec.config.concurrencyUpdateIntervalMillis timeframe. This example gathers 1 request latency sample.

Verify adaptive request concurrency policies

Verify that the policy configures adaptive concurrency in the Envoy filter of the selected destination.

  1. Create a virtual destination for the httpbin sample app.

    kubectl apply --context $REMOTE_CONTEXT1 -f- <<EOF
    apiVersion: networking.gloo.solo.io/v2
    kind: VirtualDestination
    metadata:
      name: httpbin
      namespace: httpbin
    spec:
      hosts:
      - httpbin.httpbin
      ports:
      - number: 80
        protocol: HTTP
        targetPort:
          name: http
      services:
      - labels:
          app: httpbin
    EOF
    
  2. Apply the example adaptive request concurrency policy for the httpbin app.

    kubectl apply --context $REMOTE_CONTEXT1 -f- <<EOF
    apiVersion: resilience.policy.gloo.solo.io/v2
    kind: AdaptiveRequestConcurrencyPolicy
    metadata:
      annotations:
        cluster.solo.io/cluster: ""
      name: simple-adaptive-request-concurrency-policy
      namespace: httpbin
    spec:
      applyToDestinations:
      - port:
          number: 8000
        selector:
          labels:
            app: httpbin
      config:
        concurrencyLimitExceededStatus: 503
        concurrencyUpdateIntervalMillis: 3000
        maxConcurrencyLimit: 6
        minRttCalcParams:
          bufferPercentile: 0
          intervalMillis: 1000
          jitterPercentile: 0
          minConcurrency: 6
          requestCount: 1
    EOF
    
  3. Verify that the configuration is applied in the Envoy filter.

    kubectl get envoyfilter -n httpbin --context $REMOTE_CONTEXT1 -o yaml
    

    Example output:

    ...
      patch:
        operation: INSERT_BEFORE
        value:
          name: envoy.filters.http.adaptive_concurrency
          typedConfig:
            '@type': type.googleapis.com/envoy.extensions.filters.http.adaptive_concurrency.v3.AdaptiveConcurrency
            concurrencyLimitExceededStatus:
              code: ServiceUnavailable
            gradientControllerConfig:
              concurrencyLimitParams:
                concurrencyUpdateInterval: 3s
                maxConcurrencyLimit: 6
              minRttCalcParams:
                buffer: {}
                interval: 1s
                jitter: {}
                minConcurrency: 6
                requestCount: 1
              sampleAggregatePercentile:
                value: 50
    workloadSelector:
      labels:
        app: httpbin
    ...
    
  4. Optional: Clean up the resources that you created.

    kubectl -n httpbin delete virtualdestination httpbin --context $REMOTE_CONTEXT1
    kubectl -n httpbin delete AdaptiveRequestConcurrencyPolicy simple-adaptive-request-concurrency-policy --context $REMOTE_CONTEXT1