On this page

Retry and timeout

Set up the amount of time and number of times the proxy waits for replies with retries and timeouts.

Reduce transient failures and hanging systems by setting retries and timeouts. For more information, see the API docs.

check_circle

If you import or export resources across workspaces, your policies might not apply. For more information, see Import and export policies.

About

You can use failover, outlier detection, and retry timeout policies together to build a more resilient application network. For example, an outlier detection policy can remove unhealthy destinations, a failover policy can redirect traffic to healthy destinations, and a retry policy can retry requests in case of failure. Review the following table to understand what each policy does.

Policy	Purpose
Failover	Choose destinations to re-route traffic to, based on the closest locality.
Outlier detection	Determine when and for how long to remove unhealthy destinations from the pool of healthy destinations.
Retry timeout	Decide how many times to retry requests before the outlier detection policy considers the request as failing and removes the service from the pool of healthy destinations.

About timeouts

A timeout is the amount of time that an Envoy proxy waits for replies from a service, ensuring that services don’t hang around waiting for replies forever. This allows calls to succeed or fail within a predictable timeframe.

By default, the Envoy timeout for HTTP requests is disabled in Istio. This impacts the default timeouts depending on the type of gateway as follows:

For north-south traffic through the ingress gateway, no default timeout is applied.
For service mesh traffic through the Istio east-west gateway, the Istio default timeout applies. For some applications and services, Istio’s default timeout might not be appropriate. For example, a timeout that is too long can result in excessive latency from waiting for replies from failing services. On the other hand, a timeout that is too short can result in calls failing unnecessarily while waiting for an operation that needs responses from multiple services.

To find and use your optimal timeout settings, you can set timeouts dynamically per route with Gloo’s retry timeout policy.

For more information, see the Istio documentation.

About retries

A retry specifies the maximum number of times a gateway’s Envoy proxy attempts to connect to an upstream service if the initial call fails. Retries can enhance service availability and application performance by making sure that calls don’t fail permanently because of transient problems such as a temporarily overloaded service or network.

The interval between retries (25ms+) is variable and determined automatically by the gateway, to prevent the called service from being overwhelmed with requests. The default retry behavior for HTTP requests is to retry twice before returning the error.

Like timeouts, the gateway’s default retry behavior might not suit your application needs in terms of latency or availability. For example, too many retries to a failed service can slow things down. Also like timeouts, you can adjust your retry settings on a per-route basis.

For more information, see the Istio documentation.

Before you begin

info

This guide assumes that you use the same names for components like clusters, workspaces, and namespaces as in the getting started. If you have different names, make sure to update the sample configuration files in this guide.

Complete the multicluster getting started guide to set up the following testing environment.
- Three clusters along with environment variables for the clusters and their Kubernetes contexts.
- The Gloo meshctl CLI, along with other CLI tools such as kubectl and istioctl.
- The Gloo management server in the management cluster, and the Gloo agents in the workload clusters.
- Istio installed in the workload clusters.
- A simple Gloo workspace setup.
Install Bookinfo and other sample apps.

Configure retry and timeout policies

You can apply a retry or timeout policy at the route level. For more information, see Applying policies.

info

When you apply this custom resource to your cluster, Gloo Mesh Enterprise automatically checks the configuration against validation rules and value constraints. You can also run a pre-admission validation check by using the meshctl x validate resources command. For more information, see the resource validation overview and the CLI command reference.

notifications

You cannot apply this policy to a route that already has a redirect, rewrite, or direct response action. Keep in mind that these actions might not be explicitly defined in the route configuration. For example, invalid routes are automatically replaced with a direct response action, such as when the backing destination is wrong. First, verify that your route configuration is correct. Then, decide whether to apply the policy. To apply the policy, remove any redirect, rewrite, or direct response actions. To keep the actions and not apply the policy, change the route labels of either the policy or the route.

Review the following sample configuration files.

  apiVersion: resilience.policy.gloo.solo.io/v2
kind: RetryTimeoutPolicy
metadata:
  name: retry-only
  namespace: bookinfo
  annotations:
    cluster.solo.io/cluster: $REMOTE_CLUSTER1
spec:
  applyToRoutes:
    - route:
        labels:
          route: ratings # matches on route table route's labels
  config:
    retries:
      attempts: 5 # optional (default is 2)
      perTryTimeout: 2s
      # retryOn specifies the conditions under which retry takes place. One or more policies can be specified using a ‘,’ delimited list.
      retryOn: "connect-failure,refused-stream,unavailable,cancelled,retriable-status-codes"
      # retryRemoteLocalities specifies whether the retries should retry to other localities, will default to false
      retryRemoteLocalities: true

  apiVersion: resilience.policy.gloo.solo.io/v2
kind: RetryTimeoutPolicy
metadata:
  name: retry-timeout
  namespace: bookinfo
  annotations:
    cluster.solo.io/cluster: $REMOTE_CLUSTER1
spec:
  applyToRoutes:
    - route:
        labels:
          route: ratings # matches on route table route's labels
  config:
    requestTimeout: 2s

Verify retry and timeout policies

Apply the example retry policy in the cluster with the Bookinfo workspace in your example setup.

  kubectl apply --context ${REMOTE_CONTEXT1} -f - << EOF
apiVersion: resilience.policy.gloo.solo.io/v2
kind: RetryTimeoutPolicy
metadata:
  name: retry-only
  namespace: bookinfo
  annotations:
    cluster.solo.io/cluster: $REMOTE_CLUSTER1
spec:
  applyToRoutes:
    - route:
        labels:
          route: reviews # matches on route table route's labels
  config:
    retries:
      attempts: 5 # optional (default is 2)
      perTryTimeout: 2s
      # retryOn specifies the conditions under which retry takes place. One or more policies can be specified using a ‘,’ delimited list.
      retryOn: "connect-failure,refused-stream,unavailable,cancelled,retriable-status-codes,5xx"
      # retryRemoteLocalities specifies whether the retries should retry to other localities, will default to false
      retryRemoteLocalities: true
EOF

Create a route table for the reviews app. Because retry and timeout policies apply at the route level, Gloo checks for the route in a route table resource.

  kubectl apply --context ${REMOTE_CONTEXT1} -f - << EOF
apiVersion: networking.gloo.solo.io/v2
kind: RouteTable
metadata:
  name: reviews-rt
  namespace: bookinfo
spec:
  hosts:
  - reviews
  http:
  - forwardTo:
      destinations:
      - ref:
          name: reviews
          namespace: bookinfo
          cluster: ${REMOTE_CLUSTER1}
    labels:
      route: reviews
  workloadSelectors:
  - {}
EOF

Review the following table to understand this configuration. For more information, see the API docs.

Setting	Description
`hosts`	The host that the route table routes traffic for. In this example, the `ratings` host matches the ratings service within the mesh.
`http.forwardTo.destinations`	The destination to forward requests that come in along the host route. In this example, the ratings service is selected.
`http.labels`	The label for the route. This label must match the label that the policy selects.
`workloadSelectors`	The source workloads within the mesh that this route table routes traffic for. In the example, all workloads are selected. This way, the curl container that you create in subsequent steps can send a request along the ratings route.

Send the reviews v1 and v2 apps to sleep, to mimic an app failure.

  kubectl --context ${REMOTE_CONTEXT1} -n bookinfo patch deploy reviews-v1 --patch '{"spec":{"template":{"spec":{"containers":[{"name":"reviews","command":["sleep","20h"]}]}}}}'
kubectl --context ${REMOTE_CONTEXT1} -n bookinfo patch deploy reviews-v2 --patch '{"spec":{"template":{"spec":{"containers":[{"name":"reviews","command":["sleep","20h"]}]}}}}'

Enable Istio debug logging on the reviews v1 app.

  istioctl pc log --level debug deploy/reviews-v1 -n bookinfo --context $REMOTE_CONTEXT1

Send a request to the reviews app from within the mesh. Create a temporary curl pod in the bookinfo namespace, so that you can test the app setup. You can also use this method in Kubernetes 1.23 or later, but an ephemeral container might be simpler.
1. Create the curl pod.
```
  kubectl run -it -n bookinfo --context ${REMOTE_CONTEXT1} curl --image=curlimages/curl:7.73.0 --rm  -- sh
  
```
2. Send a request to the httpbin app.
```
  curl -v http://reviews:9080/reviews/1
  
```
3. Exit the temporary pod. The pod deletes itself.
```
  exit
  
```

Verify that you notice the retries in the logs for the reviews v1 app.

  kubectl logs deploy/reviews-v1 -c istio-proxy -n bookinfo --context $REMOTE_CONTEXT1

Example output:

  'x-envoy-attempt-count', '5'

Cleanup

You can optionally remove the resources that you set up as part of this guide.

  istioctl pc log -n bookinfo --context $REMOTE_CONTEXT1 --level off deploy/reviews-v1
kubectl --context ${REMOTE_CONTEXT1} -n bookinfo patch deploy reviews-v1 --patch '{"spec":{"template":{"spec":{"containers":[{"name":"reviews","command":[]}]}}}}'
kubectl --context ${REMOTE_CONTEXT1} -n bookinfo patch deploy reviews-v2 --patch '{"spec":{"template":{"spec":{"containers":[{"name":"reviews","command":[]}]}}}}'
kubectl --context ${REMOTE_CONTEXT1} -n bookinfo delete routetable reviews-rt
kubectl --context ${REMOTE_CONTEXT1} -n bookinfo delete RetryTimeoutPolicy retry-only

Retry and timeout

About link

About timeouts link

About retries link

Before you begin link

Configure retry and timeout policies link

Verify retry and timeout policies link

Cleanup link