Retry and timeout

Reduce transient failures and hanging systems by setting retries and timeouts. For more information, see the API docs.

If you import or export resources across workspaces, your policies might not apply. For more information, see Import and export policies.

About timeouts

A timeout is the amount of time that Gloo Gateway waits for replies from an upstream service before the service is considered unavailable. This setting can be useful to avoid your apps from hanging or fail if no response is returned in a specific timeframe. With timeouts, calls either succeed or fail within a predicatble timeframe.

The time an app needs to process a request can vary a lot which is why applying the same timeout across services can cause a variety of issues. For example, a timeout that is too long can result in excessive latency from waiting for replies from failing services. On the other hand, a timeout that is too short can result in calls failing unnecessarily while waiting for an operation that needs responses from multiple services.

You can use the timeout settings in this policy to set timeouts for each route that your Gateway serves.

By default, request timeouts are disabled in Gloo Gateway which means that no timeouts are enforced for your routes.

About retries

A retry specifies the maximum number of times Gloo Gateway attempts to connect to an upstream service if the initial call fails. Retries can enhance service availability and application performance by making sure that calls don’t fail permanently because of transient problems such as a temporarily overloaded service or network.

The interval between retries (25ms+) is variable and determined automatically by Gloo Gateway, to prevent the called service from being overwhelmed with requests. The default retry behavior for HTTP requests is to retry twice before returning the error.

Like timeouts, Gloo Gateway's default retry behavior might not suit your application needs in terms of latency or availability. For example, too many retries to a failed service can slow things down. Also like timeouts, you can adjust your retry settings on a per-route basis.

Before you begin

This guide assumes that you use the same names for components like clusters, workspaces, and namespaces as in the getting started, and that your Kubernetes context is set to the cluster you store your Gloo config in (typically the management cluster). If you have different names, make sure to update the sample configuration files in this guide.

Follow the getting started instructions to:

  1. Set up Gloo Gateway in a single cluster.
  2. Deploy sample apps.
  3. Configure an HTTP listener on your gateway and set up basic routing for the sample apps.

Configure retry and timeout policies

You can apply a retry or timeout policy at the route level. For more information, see Applying policies.

Review the following sample configuration files.

apiVersion: resilience.policy.gloo.solo.io/v2
kind: RetryTimeoutPolicy
metadata:
  name: retry-only
  namespace: bookinfo
spec:
  applyToRoutes:
    - route:
        labels:
          route: ratings # matches on route table route's labels
  config:
    retries:
      attempts: 5 # optional (default is 2)
      perTryTimeout: 2s
      # retryOn specifies the conditions under which retry takes place. One or more policies can be specified using a ‘,’ delimited list.
      retryOn: "connect-failure,refused-stream,unavailable,cancelled,retriable-status-codes"
      # retryRemoteLocalities specifies whether the retries should retry to other localities, will default to false
      retryRemoteLocalities: true
apiVersion: resilience.policy.gloo.solo.io/v2
kind: RetryTimeoutPolicy
metadata:
  name: retry-timeout
  namespace: bookinfo
spec:
  applyToRoutes:
    - route:
        labels:
          route: ratings # matches on route table route's labels
  config:
    requestTimeout: 2s

Verify retry and timeout policies

  1. Apply the previous example retry policy in the cluster with the Bookinfo workspace in your example setup.

    kubectl apply -f - <<EOF
    apiVersion: resilience.policy.gloo.solo.io/v2
    kind: RetryTimeoutPolicy
    metadata:
      name: retry-only
      namespace: bookinfo
    spec:
      applyToRoutes:
        - route:
            labels:
              route: ratings # matches on route table route's labels
      config:
        retries:
          attempts: 5 # optional (default is 2)
          perTryTimeout: 2s
          # retryOn specifies the conditions under which retry takes place. One or more policies can be specified using a ‘,’ delimited list.
          retryOn: "connect-failure,refused-stream,unavailable,cancelled,retriable-status-codes"
          # retryRemoteLocalities specifies whether the retries should retry to other localities, will default to false
          retryRemoteLocalities: true 
    EOF
    
  2. Verify that you can still send requests to the ratings app.

    curl -vik --resolve www.example.com:80:${INGRESS_GW_IP} http://www.example.com:80/ratings/1
    
    curl -vik --resolve www.example.com:443:${INGRESS_GW_IP} https://www.example.com:443/ratings/1
    

  3. Send the ratings app to sleep to mimic an unresponsive app.

    kubectl -n bookinfo patch deploy ratings-v1 --patch '{"spec":{"template":{"spec":{"containers":[{"name":"ratings","command":["sleep","20h"]}]}}}}' 
    
  4. Verify that requests to ratings now fail.

    curl -vik --resolve www.example.com:80:${INGRESS_GW_IP} http://www.example.com:80/ratings/1
    
    curl -vik --resolve www.example.com:443:${INGRESS_GW_IP} https://www.example.com:443/ratings/1
    

  5. Optional: Clean up the retry policy.

    kubectl delete RetryTimeoutPolicy retry-only -n bookinfo