Failover Service
Gloo Mesh provides the ability to configure a FailoverService. A FailoverService is a virtual traffic destination that is composed of a list of services ordered in decreasing priority. The composing services are configured with outlier detection, the ability of the system to detect unresponsive services, read more here.Traffic will automatically be shifted over to services next in the priority order. Currently this feature is only supported for Istio meshes.
In this guide we will first enable outlier detection so the service mesh knows when a failure has ocurred. Then we will create the failover configuration to indicate which services are part of the failover process. Finally, we will test the failover configuration by generating errors on one of the instances of the service.
Before you begin
To illustrate these concepts, we will assume that:
- Gloo Mesh is installed and running on the
mgmt-cluster
- Istio is installed on both the
mgmt-cluster
andremote-cluster
clusters - Both the
mgmt-cluster
andremote-cluster
clusters are registered with Gloo Mesh under the namesmgmt-cluster
andremote-cluster
respectively - The
bookinfo
app is installed into both clusters - You have run through the guides for Federated Trust and Identity and Access Control.
Be sure to review the assumptions and satisfy the pre-requisites from the Guides top-level document.
Istio versions 1.8.0, 1.8.1, and 1.8.2 have a known issue where sidecar proxies may fail to start under specific circumstances. This bug may surface in sidecars configured by Failover Services. This issue is resolved in Istio 1.8.3.
Configure Outlier Detection
The services composing a FailoverService must be configured with outlier detection, which is done through a TrafficPolicy custom resource. We are going to apply the following config on the mgmt-cluster
cluster:
kubectl apply -f - << EOF
apiVersion: networking.mesh.gloo.solo.io/v1alpha2
kind: TrafficPolicy
metadata:
namespace: gloo-mesh
name: mgmt-reviews-outlier
spec:
destinationSelector:
- kubeServiceRefs:
services:
- name: reviews
namespace: bookinfo
clusterName: mgmt-cluster
- name: reviews
namespace: bookinfo
clusterName: remote-cluster
outlierDetection:
consecutiveErrors: 1
EOF
For demonstration purposes, we're setting consecutiveErrors
to 1 to more easily trigger the failover. Once applied, run the following:
kubectl -n gloo-mesh get trafficpolicy/mgmt-reviews-outlier -oyaml
You should see the following status indicating that the TrafficPolicy is valid and has been translated into mesh-specific config:
status:
observedGeneration: "1"
state: ACCEPTED
trafficTargets:
reviews-bookinfo-mgmt-cluster.gloo-mesh.:
acceptanceOrder: 1
state: ACCEPTED
reviews-bookinfo-remote-cluster.gloo-mesh.:
state: ACCEPTED
Create the FailoverService
Now we will create the FailoverService for the reviews
service, composed of the reviews service on the mgmt-cluster
cluster in first priority and on remote-cluster
in second priority. If the reviews
service on the mgmt-cluster
cluster is unhealthy, requests will automatically be shifted over to the service on remote-cluster
.
Apply the following config to the mgmt-cluster
cluster:
kubectl apply -f - << EOF
apiVersion: networking.mesh.gloo.solo.io/v1alpha2
kind: FailoverService
metadata:
name: reviews-failover
namespace: gloo-mesh
spec:
hostname: reviews-failover.bookinfo.global
port:
number: 9080
protocol: http
meshes:
- name: istiod-istio-system-mgmt-cluster
namespace: gloo-mesh
backingServices:
- kubeService:
name: reviews
namespace: bookinfo
clusterName: mgmt-cluster
- kubeService:
name: reviews
namespace: bookinfo
clusterName: remote-cluster
EOF
The .global
suffix is needed if you want the failoverService's hostname to be resolvable in the Kubernetes context (e.g. with commands like curl
), assuming your Istio installation has istiocoredns enabled. This leverages Istio's default DNS setup, which creates DNS configuration for hostnames with .global
suffix.
Notice that the services referenced under failoverServices
matches the services we just configured for outlier detection. This is a requirement for all services composing aFailoverService.
The meshes
field indicates the control planes that the FailoverService is visible to. If multiple meshes are listed, they must be grouped under a common VirtualMesh.
Once applied, run the following:
kubectl -n gloo-mesh get failoverservice/reviews-failover -oyaml
and you should see the following status:
status:
meshes:
istiod-istio-system-mgmt-cluster.gloo-mesh.:
state: ACCEPTED
observedGeneration: "1"
state: ACCEPTED
Demonstrating Failover Functionality
To demonstrate failover functionality, we configure a traffic shift such that all requests targeting the reviews
service will instead be routed to the reviews FailoverService we created above.
Apply the following TrafficPolicy:
kubectl apply -f - << EOF
apiVersion: networking.mesh.gloo.solo.io/v1alpha2
kind: TrafficPolicy
metadata:
name: reviews-shift-failover
namespace: bookinfo
spec:
destinationSelector:
- kubeServiceRefs:
services:
- clusterName: mgmt-cluster
name: reviews
namespace: bookinfo
trafficShift:
destinations:
- failoverService:
name: reviews-failover
namespace: gloo-mesh
EOF
Port forward the productpage
pod with the following command and open your web browser to localhost:9080.
kubectl -n bookinfo port-forward deployments/productpage-v1 9080
Reloading the page a few times should show the “Book Reviews” section showing either no stars (for requests routed to the reviews-v1
pod) and black stars (for requests routed to the reviews-v2
pod). This shows that the productpage
is routing to the first service listed in the reviews-failover FailoverService. Recall from the multicluster setup guide that reviews-v1
and reviews-v2
only exist on the mgmt-cluster
and reviews-v3
only exists on the remote-cluster
, which we'll use to distinguish requests routing to either cluster.
Now, to trigger the failover, we'll modify the reviews-v1
and reviews-v2
deployment to disable the web servers.
Run the following commands:
kubectl -n bookinfo patch deploy reviews-v1 --patch '{"spec": {"template": {"spec": {"containers": [{"name": "reviews","command": ["sleep", "20h"]}]}}}}'
kubectl -n bookinfo patch deploy reviews-v2 --patch '{"spec": {"template": {"spec": {"containers": [{"name": "reviews","command": ["sleep", "20h"]}]}}}}'
Once the modified deployment has rolled out, refresh the productpage
and you should see reviews with red stars, corresponding to reviews-v3
, which only exists on remote-cluster
, demonstrating that the requests are indeed failing over to the second service in the list.
To restore the disabled reviews-v1
and reviews-v2
, run the following:
kubectl -n bookinfo patch deployment reviews-v1 --type json -p '[{"op": "remove", "path": "/spec/template/spec/containers/0/command"}]'
kubectl -n bookinfo patch deployment reviews-v2 --type json -p '[{"op": "remove", "path": "/spec/template/spec/containers/0/command"}]'
Once the deployment has rolled out, reloading the productpage
should show reviews with no stars or black stars, indicating that the failover service is routing requests to the first listed service in the mgmt-cluster
.
Next Steps
In this guide, you successfully configured cross-cluster failover using a VirtualMesh and Traffic Policies. To explore more about Gloo Mesh, we recommend checking out the concepts section of the docs.