Adaptive request concurrency
Dynamically adjust the maximum number of concurrent requests that can be sent to a destination.
About adaptive concurrency
The adaptive concurrency filter automatically adjusts the rate of requests sent to a backend service based on the observed latency of the responses from that service. It calculates the concurrency values by sampling the latency of completed requests in a time window, and comparing the measured samples against the expected latency for the service.
For example, consider a web service that receives incoming user requests. Under normal load conditions, it can handle 100 requests per second. During peak times, the number of requests might increase significantly. With adaptive concurrency, the listener starts by allowing a certain number of concurrent requests, such as 100. If the response times are within acceptable limits, the service maintains this rate. However, if the response times begin to increase, indicating potential strain on the service, the listener automatically reduces the number of concurrent requests it allows, such as to 80. This reduction helps prevent the service from becoming overwhelmed, and keeps response times in check.
In this way, the policy can help you protect apps from being overloaded by too many requests, optimize response times, and balance resource utilization. Keep in mind that with this policy enabled, clients can start to notice degraded service before your backend apps actually are unavailable. You can continue to fine-tune the policy settings based on usage to find the right balance between service available and concurrency protection for your apps.
How is the filter configured?
The filter uses a gradient controller to calculate forwarding decisions based on an ideal, minimum round-trip time (minRTT) for a service. The controller periodically measures the minRTT by allowing only a very low outstanding request count to a service, and measuring the latency under these ideal conditions. Most of the settings in the policy configure how the gradient controller measures the minRTT. The controller then uses the minRTT value to calculate the concurrency limit. For more information about the formulas that the controller calculates, see the Envoy adaptive concurrency docs.
How is this different than Envoy circuit breaking?
The goal of adaptive concurrency is to optimize throughput without overloading backend services. By dynamically adjusting the concurrency of traffic, the filter continually finds the ideal number of concurrent connections to balance between underutilization and overloading.
Circuit breaking monitors for certain thresholds, such as the number of failed requests, timeouts, or system errors. When these thresholds are exceeded, the circuit trips, and further requests are blocked until the circuit breaker configured threshold is no longer met. Circuit breaking does not continuously adapt these thresholds based on fluctuating network conditions.
More information
For more information, see the following resources.
If you import or export resources across workspaces, your policies might not apply. For more information, see Import and export policies.
Before you begin
This guide assumes that you use the same names for components like clusters, workspaces, and namespaces as in the getting started. If you have different names, make sure to update the sample configuration files in this guide.
Complete the multicluster getting started guide to set up the following testing environment.
- Three clusters along with environment variables for the clusters and their Kubernetes contexts.
- The Gloo
meshctl
CLI, along with other CLI tools such askubectl
andistioctl
. - The Gloo management server in the management cluster, and the Gloo agents in the workload clusters.
- Istio installed in the workload clusters.
- A simple Gloo workspace setup.
- Install Bookinfo and other sample apps.
Configure adaptive request concurrency policies
You can apply an adaptive request concurrency policy at the destination level. For more information, see Applying policies.
Review the following sample configuration file.
apiVersion: resilience.policy.gloo.solo.io/v2
kind: AdaptiveRequestConcurrencyPolicy
metadata:
annotations:
cluster.solo.io/cluster: ""
name: simple-adaptive-request-concurrency-policy
namespace: httpbin
spec:
applyToDestinations:
- port:
number: 8000
selector:
labels:
app: httpbin
config:
concurrencyLimitExceededStatus: 503
concurrencyUpdateIntervalMillis: 3000
maxConcurrencyLimit: 6
minRttCalcParams:
bufferPercentile: 0
intervalMillis: 1000
jitterPercentile: 0
minConcurrency: 6
requestCount: 1
Review the following table to understand this configuration. For more information, see the API docs.
Setting | Description |
---|---|
spec.applyToDestinations | Use labels to apply the policy to destinations. Destinations might be a Kubernetes service, VirtualDestination, or ExternalService (if supported by the policy). If you do not specify any destinations or routes, the policy applies to all destinations in the workspace by default. If you do not specify any destinations but you do specify a route, the policy applies to the route but to no destinations. |
spec.config.concurrencyLimitExceededStatus | Return a custom HTTP response status code to the client when the concurrency limit is exceeded. In this example, the service returns a 503 HTTP response code to any new client requests that exceed the limit for the current timeframe. |
spec.config.concurrencyUpdateIntervalMillis | The period of time during which request latency samples are taken to recalculate the concurrency limit. This example gathers request latency samples in a timeframe of 3000ms. |
spec.config.maxConcurrencyLimit | The allowed upper-bound on the calculated concurrency limit. This example caps the concurrency limit to a maximum of 6 connections, in the case that the calculated concurrency limit exceeds this value. |
spec.config.minRttCalcParams.bufferPercentile | Add a buffer to the measured minRTT to stabilize natural variability in latency. The buffer is represented as a percentage of the measured value, and can be adjusted to allow more or less tolerance to the sampled latency values. This example does not add a buffer. |
spec.config.minRttCalcParams.intervalMillis | The amount of time between each minRTT remeasurement. This example recalculates the minRTT every 1000ms. |
spec.config.minRttCalcParams.jitterPercentile | Add a random delay to the start of each minRTT measurement, represented as a percentage of the interval between each remeasurement (spec.config.minRttCalcParams.intervalMillis ). For example, if the interval is 1000ms and the jitter is 15%, the next minRTT measurement begins in the range of 1000ms - 1150ms, because a delay between 0ms - 150ms is added to the 1000ms interval. This example does not add a jitter. |
spec.config.minRttCalcParams.minConcurrency | Temporarily set the concurrency limit until the latest minRTT measurement is complete. This example temporarily sets the concurrency limit to 6 connections. |
spec.config.minRttCalcParams.requestCount | The number of requests to sample during the spec.config.concurrencyUpdateIntervalMillis timeframe. This example gathers 1 request latency sample. |
Verify adaptive request concurrency policies
Verify that the policy configures adaptive concurrency in the Envoy filter of the selected destination.
Create a virtual destination for the httpbin sample app.
kubectl apply --context $REMOTE_CONTEXT1 -f- <<EOF apiVersion: networking.gloo.solo.io/v2 kind: VirtualDestination metadata: name: httpbin namespace: httpbin spec: hosts: - httpbin.httpbin ports: - number: 80 protocol: HTTP targetPort: name: http services: - labels: app: httpbin EOF
Apply the example adaptive request concurrency policy for the httpbin app.
kubectl apply --context $REMOTE_CONTEXT1 -f- <<EOF apiVersion: resilience.policy.gloo.solo.io/v2 kind: AdaptiveRequestConcurrencyPolicy metadata: annotations: cluster.solo.io/cluster: "" name: simple-adaptive-request-concurrency-policy namespace: httpbin spec: applyToDestinations: - port: number: 8000 selector: labels: app: httpbin config: concurrencyLimitExceededStatus: 503 concurrencyUpdateIntervalMillis: 3000 maxConcurrencyLimit: 6 minRttCalcParams: bufferPercentile: 0 intervalMillis: 1000 jitterPercentile: 0 minConcurrency: 6 requestCount: 1 EOF
Verify that the configuration is applied in the Envoy filter.
kubectl get envoyfilter -n httpbin --context $REMOTE_CONTEXT1 -o yaml
Example output:
... patch: operation: INSERT_BEFORE value: name: envoy.filters.http.adaptive_concurrency typedConfig: '@type': type.googleapis.com/envoy.extensions.filters.http.adaptive_concurrency.v3.AdaptiveConcurrency concurrencyLimitExceededStatus: code: ServiceUnavailable gradientControllerConfig: concurrencyLimitParams: concurrencyUpdateInterval: 3s maxConcurrencyLimit: 6 minRttCalcParams: buffer: {} interval: 1s jitter: {} minConcurrency: 6 requestCount: 1 sampleAggregatePercentile: value: 50 workloadSelector: labels: app: httpbin ...
Cleanup
You can optionally remove the resources that you set up as part of this guide.
kubectl -n httpbin delete virtualdestination httpbin --context $REMOTE_CONTEXT1
kubectl -n httpbin delete AdaptiveRequestConcurrencyPolicy simple-adaptive-request-concurrency-policy --context $REMOTE_CONTEXT1