Failover
Use a failover policy to determine where to reroute traffic in case of failure.Failover is an important part of building resilient apps in multicluster environments. You set up locality-aware failover by specifying regions, zones, and subzones to reroute traffic. In the event of a failure in the closest locality, responses can be served from the next closest locality.
You can use failover policies in combination with other policies, such as outlier detection or retry policies. In case of a failure, the failover policy tells Gloo Mesh which healthy destinations to reroute traffic to, based on the closest locality. The outlier detection policy tells Gloo Mesh when and for how long to remove unhealthy services. The retry policy tells Gloo Mesh how many times to retry requests before the outlier detection policy considers the request as failing and removes the service from the pool of healthy destinations.
For more information, see the following resources.
Before you begin
- Complete the multicluster getting started guide to set up the following testing environment.
- Three clusters along with environment variables for the clusters and their Kubernetes contexts.
- The Gloo Platform CLI,
meshctl
, along with other CLI tools such askubectl
andistioctl
. - The Gloo management server in the management cluster, and the Gloo agents in the workload clusters.
- Istio installed in the workload clusters.
- A simple Gloo workspace setup.
- Install Bookinfo and other sample apps.
Configure failover policies
You can apply a failover policy at the destination level. For more information, see Applying policies.
The failover policy currently supports selecting Gloo virtual destinations only. Selecting Kubernetes services or Gloo external services is not supported.
Review the following sample configuration file.
apiVersion: resilience.policy.gloo.solo.io/v2
kind: FailoverPolicy
metadata:
name: locality-based-failover
namespace: bookinfo
spec:
applyToDestinations:
- kind: VIRTUAL_DESTINATION
selector: {}
config:
localityMappings:
- from:
region: us-east
to:
- region: us-west
Review the following table to understand this configuration. For more information, see the API docs.
Setting | Description |
---|---|
applyToDestinations |
Configure which destinations to apply the policy to, by using labels. Destinations can be a Kubernetes service, VirtualDestination, or ExternalService. If you do not specify any destinations or routes, the rate limit policy applies to all destinations in the workspace by default. If you do not specify any destinations but you do specify a route, the rate limit applies to the route but to no destinations. This example selects all virtual destinations in the workspace, including the one that you previously created. |
localityMappings |
Map the localities to fail over traffic from one region, zone, or subzone to another in case of failure. The locality is determined by the Kubernetes labels on the node where the destination's app runs. For more information, see the Istio docs. |
from |
The locality of the destination where Gloo Mesh originally tried to fulfill the request. In this example, the policy fails over traffic from any destinations served in the us-east region. |
to |
The localities of the destination where Gloo Mesh can reroute requests. You must specify the region, and optionally the zone and subzone. If you have multiple to destinations, you can optionally set a weight. In this example, the policy reroutes traffic to any matching destinations only in the us-west region. |
Verify failover policies
You can test how failover works by opening the Bookinfo app in your browser and observing the reviews app behavior after applying various resources.
- Verify that your clusters have
topology.kubernetes.io/region
locality labels. If not, see Configure the locality labels for nodes.kubectl get nodes --context $REMOTE_CONTEXT1 -o jsonpath='{.items[*].metadata.labels}' kubectl get nodes --context $REMOTE_CONTEXT2 -o jsonpath='{.items[*].metadata.labels}'
- Create a virtual destination for the reviews app. The virtual destination enables multicluster traffic routing.
kubectl --context ${REMOTE_CONTEXT1} apply -f - <<EOF apiVersion: networking.gloo.solo.io/v2 kind: VirtualDestination metadata: name: reviews-global namespace: bookinfo spec: hosts: - reviews.vd ports: - number: 80 protocol: HTTP targetPort: name: http services: - labels: app: reviews EOF
- Create an outlier detection policy to use with the failover policy so that unhealthy destinations are removed.
kubectl --context ${REMOTE_CONTEXT1} apply -f - <<EOF apiVersion: resilience.policy.gloo.solo.io/v2 kind: OutlierDetectionPolicy metadata: name: outlier-detection namespace: bookinfo spec: applyToDestinations: - kind: VIRTUAL_DESTINATION selector: {} config: baseEjectionTime: 30s consecutiveErrors: 2 interval: 1s maxEjectionPercent: 100 EOF
- Create the failover policy that you previously reviewed.
If your clusters have different region labels than
us-east
andus-west
, update those values accordingly.kubectl --context ${REMOTE_CONTEXT1} apply -f - <<EOF apiVersion: resilience.policy.gloo.solo.io/v2 kind: FailoverPolicy metadata: name: locality-based-failover namespace: bookinfo spec: applyToDestinations: - kind: VIRTUAL_DESTINATION selector: {} config: localityMappings: - from: region: us-east to: - region: us-west EOF
- Send a request to the reviews app from the ratings app several times. Notice that you get responses with no stars (v1), black stars (v2), and red stars (v3) from all three reviews apps across clusters.
kubectl exec $(kubectl get pod -l app=ratings -n bookinfo -o jsonpath='{.items[].metadata.name}' --context ${REMOTE_CONTEXT1}) -n bookinfo -c ratings --context ${REMOTE_CONTEXT1} -- curl -sS reviews.global:80/reviews/1 -v
- Send the reviews v1 and v2 apps in
cluster-1
to sleep, to mimic an app failure.kubectl --context ${REMOTE_CONTEXT1} -n bookinfo patch deploy reviews-v1 --patch '{"spec":{"template":{"spec":{"containers":[{"name":"reviews","command":["sleep","20h"]}]}}}}' kubectl --context ${REMOTE_CONTEXT1} -n bookinfo patch deploy reviews-v2 --patch '{"spec":{"template":{"spec":{"containers":[{"name":"reviews","command":["sleep","20h"]}]}}}}'
- Repeat the request to the reviews app. Notice that you get responses with only red stars (v3). The unhealthy reviews v1 and v2 apps are removed, and the traffic fails over to v3 in the locality that the failover policy specifies.
kubectl exec $(kubectl get pod -l app=ratings -n bookinfo -o jsonpath='{.items[].metadata.name}' --context ${REMOTE_CONTEXT1}) -n bookinfo -c ratings --context ${REMOTE_CONTEXT1} -- curl -sS reviews.global:80/reviews/1 -v
- Optional: Remove the sleep command from the reviews apps to restore normal behavior.
kubectl --context ${REMOTE_CONTEXT1} -n bookinfo patch deploy reviews-v1 --patch '{"spec":{"template":{"spec":{"containers":[{"name":"reviews","command":[]}]}}}}' kubectl --context ${REMOTE_CONTEXT1} -n bookinfo patch deploy reviews-v2 --patch '{"spec":{"template":{"spec":{"containers":[{"name":"reviews","command":[]}]}}}}'
- Optional: Clean up the Gloo resources that you created.
kubectl --context $REMOTE_CONTEXT1 -n bookinfo delete VirtualDestination reviews-global
kubectl --context $REMOTE_CONTEXT1 -n bookinfo delete OutlierDetectionPolicy outlier-detection
kubectl --context $REMOTE_CONTEXT1 -n bookinfo delete FailoverPolicy locality-based-failover