Failover

Use a failover policy to reroute traffic to a different service in case of failure.

Failover is an important part of building resilient apps. You set up locality-aware failover by specifying regions, zones, and subzones to reroute traffic. In the event of a failure in the closest locality, responses can be served from the next closest locality.

You can use failover policies in combination with other policies, such as outlier detection or retry policies. In case of a failure, the failover policy tells Gloo Mesh which healthy destinations to reroute traffic to, based on the closest locality. The outlier detection policy tells Gloo Mesh when and for how long to remove unhealthy services. The retry policy tells Gloo Mesh how many times to retry requests before the outlier detection policy considers the request as failing and removes the service from the pool of healthy destinations.

For more information, see the following resources.

Before you begin

  1. Complete the demo setup to install Gloo Mesh, Istio, and Bookinfo in your cluster.

  2. Create the Gloo Mesh resources for this policy in the management and workload clusters.

    The following files are examples only for testing purposes. Your actual setup might vary. You can use the files as a reference for creating your own tests.

    1. Download the following Gloo Mesh resources:
    2. Apply the files to your management cluster.
      kubectl apply -f kubernetes-cluster_gloo-mesh_cluster-1.yaml --context ${MGMT_CONTEXT}
      kubectl apply -f kubernetes-cluster_gloo-mesh_cluster-2.yaml --context ${MGMT_CONTEXT}
      kubectl apply -f workspace_gloo-mesh_anything.yaml --context ${MGMT_CONTEXT}
      
    1. Download the following Gloo Mesh resources:
    2. Apply the files to your workload cluster.
      kubectl apply -f workspace-settings_bookinfo_federated-anything.yaml --context ${REMOTE_CONTEXT1}
      

Configure failover policies

You can apply a failover policy at the destination level. For more information, see Applying policies.

Review the following sample configuration file.

apiVersion: resilience.policy.gloo.solo.io/v2
kind: FailoverPolicy
metadata:
  name: locality-based-failover
  namespace: bookinfo
spec:
  applyToDestinations:
  - kind: VIRTUAL_DESTINATION
    selector: {}
  config:
    localityMappings:
    - from:
        region: us-east
      to:
      - region: us-west

Review the following table to understand this configuration. For more information, see the API docs.

Setting Description
applyToDestinations Configure which destinations to apply the policy to, by using labels. Destinations can be a Kubernetes service, VirtualService, or ExternalService. If you do not specify any destinations or routes, the rate limit policy applies to all destinations in the workspace by default. If you do not specify any destinations but you do specify a route, the rate limit applies to the route but to no destinations. This example selects all virtual destinations in the workspace, including the one that you previously created.
localityMappings Map the localities to fail over traffic from one region, zone, or subzone to another in case of failure. The locality is determined by the Kubernetes labels on the node where the destination's app runs. For more information, see the Istio docs.
from The locality of the destination where Gloo Mesh originally tried to fulfill the request. In this example, the policy fails over traffic from any destinations served in the us-east region.
to The localities of the destination where Gloo Mesh can reroute requests. You must specify the region, and optionally the zone and subzone. If you have multiple to destinations, you can optionally set a weight. In this example, the policy reroutes traffic to any matching destinations only in the us-west region.

Verify failover policies

You can test how failover works by opening the Bookinfo app in your browser and observing the reviews app behavior after applying various resources.

  1. Verify that your clusters have topology.kubernetes.io/region locality labels. If not, see the demo setup for an example of how to apply labels.
    kubectl get nodes --context $REMOTE_CONTEXT1 -o jsonpath='{.items[*].metadata.labels}'
    kubectl get nodes --context $REMOTE_CONTEXT2 -o jsonpath='{.items[*].metadata.labels}'
    
  2. Open the Bookinfo product page from your local host.
    1. Enable port-forwarding on the product page deployment.
      kubectl --context ${REMOTE_CONTEXT1} -n bookinfo port-forward deployment/productpage-v1 9080:9080
      
    2. Open your browser to http://localhost:9080/. You might need to click Normal user to open the app.
    3. Refresh your page a few times to see the reviews change from no stars to black stars, depending on which version of the reviews service is accessed. Currently, requests to the reviews app are served from the same cluster as the product page app, cluster-1.
  3. In another tab in your terminal, create a virtual destination for the reviews app. The virtual destination enables multicluster traffic routing.
    kubectl --context ${REMOTE_CONTEXT1} apply -f - <<EOF
    apiVersion: networking.gloo.solo.io/v2
    kind: VirtualDestination
    metadata:
      name: reviews-global
      namespace: bookinfo
    spec:
      hosts:
      - reviews.global
      ports:
      - number: 80
        protocol: HTTP
        targetPort:
          name: http
      services:
      - labels:
          app: reviews
    EOF
    
  4. Create the failover policy that you previously reviewed.
    If your clusters have different region labels than us-east and us-west, update those values accordingly.
    kubectl --context ${REMOTE_CONTEXT1} apply -f - <<EOF
    apiVersion: resilience.policy.gloo.solo.io/v2
    kind: FailoverPolicy
    metadata:
      name: locality-based-failover
      namespace: bookinfo
    spec:
      applyToDestinations:
      - kind: VIRTUAL_DESTINATION
        selector: {}
      config:
        localityMappings:
        - from:
            region: us-east
          to:
          - region: us-west
    EOF
    
  5. Send the reviews v1 and v2 apps in cluster-1 to sleep, to mimic an app failure.
    kubectl --context ${REMOTE_CONTEXT1} -n bookinfo patch deploy reviews-v1 --patch '{"spec":{"template":{"spec":{"containers":[{"name":"reviews","command":["sleep","20h"]}]}}}}'
    kubectl --context ${REMOTE_CONTEXT1} -n bookinfo patch deploy reviews-v2 --patch '{"spec":{"template":{"spec":{"containers":[{"name":"reviews","command":["sleep","20h"]}]}}}}'
    
  6. In your browser, refresh the Bookinfo product page a few times. Notice that you eventually only see the red star reviews from v3 in cluster-2.
  7. Optional: Remove the sleep command from the reviews apps to restore normal behavior.
    kubectl --context ${REMOTE_CONTEXT1} -n bookinfo patch deploy reviews-v1 --patch '{"spec":{"template":{"spec":{"containers":[{"name":"reviews","command":[]}]}}}}'
    kubectl --context ${REMOTE_CONTEXT1} -n bookinfo patch deploy reviews-v2 --patch '{"spec":{"template":{"spec":{"containers":[{"name":"reviews","command":[]}]}}}}'