On this page

Failover

Use a failover policy to determine where to reroute traffic in case of failure.

About

Failover is an important part of building resilient apps in multicluster environments. You set up locality-aware failover by specifying regions, zones, and subzones to reroute traffic. In the event of a failure in the closest locality, responses can be served from the next closest locality.

Failover with other policies

You can use failover, outlier detection, and retry timeout policies together to build a more resilient application network. For example, an outlier detection policy can remove unhealthy destinations, a failover policy can redirect traffic to healthy destinations, and a retry policy can retry requests in case of failure. Review the following table to understand what each policy does.

Policy	Purpose
Failover	Choose destinations to re-route traffic to, based on the closest locality.
Outlier detection	Determine when and for how long to remove unhealthy destinations from the pool of healthy destinations.
Retry timeout	Decide how many times to retry requests before the outlier detection policy considers the request as failing and removes the service from the pool of healthy destinations.

Locality settings

To fine-tune traffic flow according to your operational needs, Istio has several locality settings that you can use. Review the following table to understand these settings and how to use them. These settings are mutually exclusive, so choose the best one for your use case. For more information, see the Istio docs for LocalityLoadBalancerSetting and locality failover.

Istio locality settings	Description	Usage Case	Configuration in Gloo FailoverPolicy
`distribute`	Controls the percentage of traffic sent to endpoints in specific localities. You can set specific load balancing weights across different zones and regions.	Distribute traffic based on locality to enhance user experience in terms of performance, or to manage the load across resources more efficiently. Note that this setting does not fail over traffic, but instead distributes traffic.	In the `localityMappings.to` section to specify a region or zone to distribute traffic to, set a `weight` field. The weights should add up to `100` across all the localities included in the mapping. For more information, see the example.
`failover`	Specifies an alternative region to redirect traffic to when local endpoints become unhealthy.	Increase high availability and resilience by ensuring traffic is served from the next best locality on failure. Note that this setting does not distribute traffic, but instead redirects traffic in case of failure.	Use the `localityMappings` to specify the regions to fail over traffic to. Do not include a `weight`, which changes the setting to the Istio distribute load balancing setting. Apply an `OutlierDetectionPolicy` along with the `FailoverPolicy` so that unhealthy destinations are removed from the pool of available destinations. For more information, see the example.

Before you begin

info

This guide assumes that you use the same names for components like clusters, workspaces, and namespaces as in the getting started. If you have different names, make sure to update the sample configuration files in this guide.

Complete the multicluster getting started guide to set up the following testing environment.
- Three clusters along with environment variables for the clusters and their Kubernetes contexts.
- The Gloo meshctl CLI, along with other CLI tools such as kubectl and istioctl.
- The Gloo management server in the management cluster, and the Gloo agents in the workload clusters.
- Istio installed in the workload clusters.
- A simple Gloo workspace setup.
Install Bookinfo and other sample apps.

check_circle

If you import or export resources across workspaces, your policies might not apply. For more information, see Import and export policies.

Configure failover policies

You can apply a failover policy at the destination level. For more information, see Applying policies. Note that for one destination, you cannot apply both a failover policy that specifies zones and subzones and a failover policy that only specifies regions. For one destination, you can specify multiple failover policies that specify zones and subzones, or multiple that specify regions. However, ensure that the configuration does not overlap between multiple policies.

For example, if one failover policy reroutes traffic from us-east-1 to us-east-2, and another reroutes traffic from us-east-2 to eu-west-1, the configurations do not overlap. But if one failover policy reroutes traffic from us-east-1 to us-east-2, and another reroutes traffic from us-east-1 to eu-west-1, then the configurations overlap, and traffic might not be correctly rerouted.

warning

This policy currently does not support selecting ExternalServices as a destination.

Distribute with weight example

Review the following example that distributes traffic evenly across us-east and us-west regions.

  apiVersion: resilience.policy.gloo.solo.io/v2
kind: FailoverPolicy
metadata:
  annotations:
    cluster.solo.io/cluster: ""
  name: locality-based-failover
  namespace: bookinfo
spec:
  applyToDestinations:
  - kind: VIRTUAL_DESTINATION
    selector: {}
  config:
    localityMappings:
    - from:
        region: us-east
      to:
      - region: us-west
        weight: 50
      - region: us-east
        weight: 50

Review the following table to understand this configuration. For more information, see the API docs and the Istio docs.

Setting	Description
`applyToDestinations`	Use labels to apply the policy to destinations. Destinations might be a Kubernetes service, VirtualDestination, or ExternalService (if supported by the policy). If you do not specify any destinations or routes, the policy applies to all destinations in the workspace by default. If you do not specify any destinations but you do specify a route, the policy applies to the route but to no destinations.
`localityMappings`	Map the localities to fail over traffic from one region, zone, or subzone to another in case of failure. The locality is determined by the Kubernetes labels on the node where the destination’s app runs. For more information, see the Istio docs.
`from`	The locality of the destination where Gloo Mesh Enterprise originally tried to fulfill the request. In this example, the policy distributes traffic for the `us-east` region.
`to`	The localities of the destination where Gloo Mesh Enterprise can distribute requests. You must specify the region, and optionally the zone and subzone. Include the original region to keep the region in the distribution. In this example, the policy distributes traffic in an equal 50/50 split between the `us-east` and `us-west` regions.

Failover example

Review the following example that fails over traffic from the us-east to us-west region in case of a failure.

  apiVersion: resilience.policy.gloo.solo.io/v2
kind: FailoverPolicy
metadata:
  annotations:
    cluster.solo.io/cluster: ""
  name: locality-based-failover
  namespace: bookinfo
spec:
  applyToDestinations:
  - kind: VIRTUAL_DESTINATION
    selector: {}
  config:
    localityMappings:
    - from:
        region: us-east
      to:
      - region: us-west

Review the following table to understand this configuration. For more information, see the API docs and the Istio docs.

Setting	Description
`applyToDestinations`	Use labels to apply the policy to destinations. Destinations might be a Kubernetes service, VirtualDestination, or ExternalService (if supported by the policy). If you do not specify any destinations or routes, the policy applies to all destinations in the workspace by default. If you do not specify any destinations but you do specify a route, the policy applies to the route but to no destinations.
`localityMappings`	Map the localities to fail over traffic from one region, zone, or subzone to another in case of failure. The locality is determined by the Kubernetes labels on the node where the destination’s app runs. For more information, see the Istio docs.
`from`	The locality of the destination where Gloo Mesh Enterprise originally tried to fulfill the request. In this example, the policy fails over traffic from any destinations served in the `us-east` region.
`to`	The localities of the destination where Gloo Mesh Enterprise can reroute requests. You must specify the region, and optionally the zone and subzone. In this example, the policy reroutes traffic to any matching destinations only in the `us-west` region.

Verify failover policies

You can test how failover works by opening the Bookinfo app in your browser and observing the reviews app behavior after applying various resources.

Verify that your clusters have topology.kubernetes.io/region locality labels. If not, see Configure the locality labels for nodes.

  kubectl get nodes --context $REMOTE_CONTEXT1 -o jsonpath='{.items[*].metadata.labels}'
kubectl get nodes --context $REMOTE_CONTEXT2 -o jsonpath='{.items[*].metadata.labels}'

Create a virtual destination for the reviews app. The virtual destination enables multicluster traffic routing.

  kubectl --context ${REMOTE_CONTEXT1} apply -f - <<EOF
apiVersion: networking.gloo.solo.io/v2
kind: VirtualDestination
metadata:
  annotations:
    cluster.solo.io/cluster: ""
  name: reviews-global
  namespace: bookinfo
spec:
  hosts:
  - reviews.vd
  ports:
  - number: 80
    protocol: HTTP
    targetPort:
      name: http
  services:
  - labels:
      app: reviews
EOF

Create an outlier detection policy to use with the failover policy so that unhealthy destinations are removed.

  kubectl --context ${REMOTE_CONTEXT1} apply -f - <<EOF
apiVersion: resilience.policy.gloo.solo.io/v2
kind: OutlierDetectionPolicy
metadata:
  annotations:
    cluster.solo.io/cluster: ""
  name: outlier-detection
  namespace: bookinfo
spec:
  applyToDestinations:
  - kind: VIRTUAL_DESTINATION
    selector: {}
  config:
    baseEjectionTime: 30s
    consecutiveErrors: 2
    interval: 1s
    maxEjectionPercent: 100
EOF

Create your failover policy. The following policy uses the failover example to redirect traffic from us-east to us-west in case of failure.

info

If your clusters have different region labels than us-east and us-west, update those values accordingly.

  kubectl --context ${REMOTE_CONTEXT1} apply -f - <<EOF
apiVersion: resilience.policy.gloo.solo.io/v2
kind: FailoverPolicy
metadata:
  annotations:
    cluster.solo.io/cluster: ""
  name: locality-based-failover
  namespace: bookinfo
spec:
  applyToDestinations:
  - kind: VIRTUAL_DESTINATION
    selector: {}
  config:
    localityMappings:
    - from:
        region: us-east
      to:
      - region: us-west
EOF

Send a request to the reviews app from the ratings app several times. Notice that you get responses with no stars (v1), black stars (v2), and red stars (v3) from all three reviews apps across clusters.

  kubectl exec $(kubectl get pod -l app=ratings -n bookinfo -o jsonpath='{.items[].metadata.name}' --context ${REMOTE_CONTEXT1}) -n bookinfo -c ratings --context ${REMOTE_CONTEXT1} -- curl -sS reviews.global:80/reviews/1 -v

Send the reviews v1 and v2 apps in cluster-1 to sleep, to mimic an app failure.

  kubectl --context ${REMOTE_CONTEXT1} -n bookinfo patch deploy reviews-v1 --patch '{"spec":{"template":{"spec":{"containers":[{"name":"reviews","command":["sleep","20h"]}]}}}}'
kubectl --context ${REMOTE_CONTEXT1} -n bookinfo patch deploy reviews-v2 --patch '{"spec":{"template":{"spec":{"containers":[{"name":"reviews","command":["sleep","20h"]}]}}}}'

Repeat the request to the reviews app. Notice that you get responses with only red stars (v3). The unhealthy reviews v1 and v2 apps are removed, and the traffic fails over to v3 in the locality that the failover policy specifies.

  kubectl exec $(kubectl get pod -l app=ratings -n bookinfo -o jsonpath='{.items[].metadata.name}' --context ${REMOTE_CONTEXT1}) -n bookinfo -c ratings --context ${REMOTE_CONTEXT1} -- curl -sS reviews.global:80/reviews/1 -v

Cleanup

You can optionally remove the resources that you set up as part of this guide.

Remove the sleep command from the reviews apps to restore normal behavior.

  kubectl --context ${REMOTE_CONTEXT1} -n bookinfo patch deploy reviews-v1 --patch '{"spec":{"template":{"spec":{"containers":[{"name":"reviews","command":[]}]}}}}'
kubectl --context ${REMOTE_CONTEXT1} -n bookinfo patch deploy reviews-v2 --patch '{"spec":{"template":{"spec":{"containers":[{"name":"reviews","command":[]}]}}}}'

Clean up the Gloo resources that you created.

  kubectl --context $REMOTE_CONTEXT1 -n bookinfo delete VirtualDestination reviews-global
kubectl --context $REMOTE_CONTEXT1 -n bookinfo delete OutlierDetectionPolicy outlier-detection
kubectl --context $REMOTE_CONTEXT1 -n bookinfo delete FailoverPolicy locality-based-failover

Failover

About link

Failover with other policies link

Locality settings link

Before you begin link

Configure failover policies link

Distribute with weight example link

Failover example link

Verify failover policies link

Cleanup link