Outlier detection
Configure Gloo to remove unhealthy destinations from the connection pool, and add the destinations back when they become healthy again.
About
Outlier detection is an important part of building resilient apps. An outlier detection policy sets up several conditions, such as retries and ejection percentages, that Gloo Mesh Gateway uses to determine if a service is unhealthy. In case an unhealthy service is detected, the outlier detection policy defines how Gloo Mesh Gateway removes (ejects) services from the pool of healthy destinations to send traffic to. Your apps then have time to recover before they are added back to the load-balancing pool and checked again for consecutive errors.
You can use failover, outlier detection, and retry timeout policies together to build a more resilient application network. For example, an outlier detection policy can remove unhealthy destinations, a failover policy can redirect traffic to healthy destinations, and a retry policy can retry requests in case of failure. Review the following table to understand what each policy does.
Policy | Purpose |
---|---|
Failover | Choose destinations to re-route traffic to, based on the closest locality. |
Outlier detection | Determine when and for how long to remove unhealthy destinations from the pool of healthy destinations. |
Retry timeout | Decide how many times to retry requests before the outlier detection policy considers the request as failing and removes the service from the pool of healthy destinations. |
For more information, see the following resources.
- Gloo Mesh Gateway outlier detection API docs
- Istio docs for circuit breaking
- Envoy docs for outlier detection
If you import or export resources across workspaces, your policies might not apply. For more information, see Import and export policies.
Before you begin
This guide assumes that you use the same names for components like clusters, workspaces, and namespaces as in the getting started. If you have different names, make sure to update the sample configuration files in this guide.
- Set up Gloo Mesh Gateway in a single cluster.
- Install Bookinfo and other sample apps.
Configure an HTTP listener on your gateway and set up basic routing for the sample apps.
Configure outlier detection policies
You can apply an outlier detection policy at the destination level. For more information, see Applying policies.
The outlier detection policy currently supports selecting virtual destinations only. Selecting Kubernetes services or external services is not supported.
Review the following sample configuration file.
apiVersion: resilience.policy.gloo.solo.io/v2
kind: OutlierDetectionPolicy
metadata:
annotations:
cluster.solo.io/cluster: ""
name: outlier-detection
namespace: bookinfo
spec:
applyToDestinations:
- kind: VIRTUAL_DESTINATION
selector: {}
config:
baseEjectionTime: 30s
consecutiveErrors: 2
interval: 1s
maxEjectionPercent: 100
Review the following table to understand this configuration. For more information, see the API docs.
Setting | Description |
---|---|
applyToDestinations | Configure which destinations to apply the policy to, by using labels. The destination must be a virtual destination, not a Kubernetes service or external service. This example selects all virtual destinations in the workspace, including the one that you previously created. |
baseEjectionTime | The minimum time duration for ejection, or the time when a destination is considered unhealthy and not used for load balancing. Set this value as an integer plus a unit of time, in the format 1h , 1m , 1s , or 1ms . The value must be at least 1ms , and defaults to 30s . |
consecutiveErrors | The number of errors before a destination is removed from the healthy connection pool. The default is 5. |
interval | The amount of time between analyzing destinations for ejection. Set this value as an integer plus a unit of time, in the format 1h , 1m , 1s , or 1ms . The value must be at least 1ms , and defaults to 10s . |
maxEjectionPercent | The maximum percentage of destinations that can be removed from the healthy connection pool at a time. For example, if you have 10 total destinations that the policy selects, and set this value to 50 percent, 5 destinations can be removed at once. At least 1 destination can always be removed, regardless of the value you set. You can set this value between 0 and 100 , with a default of 100 . |
Verify outlier detection policies
You can test how outlier detection works by opening the Bookinfo app in your browser and observing the reviews app behavior after applying various resources.
Create a virtual destination for the reviews app. The virtual destination allows for multicluster routing across clusters for the three different
reviews
apps in your Bookinfo setup.kubectl apply -f - <<EOF apiVersion: networking.gloo.solo.io/v2 kind: VirtualDestination metadata: annotations: cluster.solo.io/cluster: "" name: reviews-global namespace: bookinfo spec: hosts: - reviews.vd ports: - number: 80 protocol: HTTP targetPort: name: http services: - labels: app: reviews EOF
Modify the existing route table to point to the Gloo virtual destination that you created for the
reviews
app, instead of the Kubernetes service.kubectl apply -f- <<EOF apiVersion: networking.gloo.solo.io/v2 kind: RouteTable metadata: name: www-example-com namespace: bookinfo spec: hosts: - www.example.com # Selects the virtual gateway you previously created virtualGateways: - name: istio-ingressgateway namespace: bookinfo http: # Route for the main productpage app - name: productpage matchers: - uri: prefix: /productpage forwardTo: destinations: - ref: name: productpage namespace: bookinfo cluster: $CLUSTER_NAME port: number: 9080 # Routes all /reviews requests to the reviews-v1, reviews-v2, and reviews-v3 apps - name: reviews labels: route: reviews matchers: - uri: prefix: /reviews forwardTo: destinations: - ref: name: reviews-global namespace: bookinfo kind: VIRTUAL_DESTINATION port: number: 80 # Routes all /ratings requests to the ratings-v1 app - name: ratings-ingress labels: route: ratings matchers: - uri: prefix: /ratings forwardTo: destinations: - ref: name: ratings namespace: bookinfo cluster: $CLUSTER_NAME port: number: 9080 # Route for the httpbin app - name: httpbin-ingress labels: route: httpbin matchers: - headers: - name: X-httpbin forwardTo: destinations: - ref: name: httpbin namespace: httpbin cluster: $CLUSTER_NAME port: number: 8000 EOF
Send a request to the reviews app from the ratings app several times. Notice that you get responses with no stars (v1), black stars (v2), and red stars (v3) from all three reviews apps across clusters.
- HTTP:
curl -vik --resolve www.example.com:80:${INGRESS_GW_IP} http://www.example.com:80/reviews/1
- HTTPS:
curl -vik --resolve www.example.com:443:${INGRESS_GW_IP} https://www.example.com:443/reviews/1
- HTTP:
Create the outlier detection policy that you previously reviewed.
kubectl apply -f - <<EOF apiVersion: resilience.policy.gloo.solo.io/v2 kind: OutlierDetectionPolicy metadata: annotations: cluster.solo.io/cluster: "" name: outlier-detection namespace: bookinfo spec: applyToDestinations: - kind: VIRTUAL_DESTINATION selector: {} config: baseEjectionTime: 30s consecutiveErrors: 2 interval: 1s maxEjectionPercent: 100 EOF
Repeat the request to the reviews app. Notice that you get responses with no stars (v1) and black stars (v2), but no longer any red stars (v3 in the second cluster). When you apply an outlier detection policy, Gloo enforces locality preference. The request is fulfilled by the reviews apps local to the requesting ratings app when the reviews apps are healthy.
- HTTP:
curl -vik --resolve www.example.com:80:${INGRESS_GW_IP} http://www.example.com:80/reviews/1
- HTTPS:
curl -vik --resolve www.example.com:443:${INGRESS_GW_IP} https://www.example.com:443/reviews/1
- HTTP:
Send the reviews v1 app to sleep, to mimic an app failure.
kubectl -n bookinfo patch deploy reviews-v1 --patch '{"spec":{"template":{"spec":{"containers":[{"name":"reviews","command":["sleep","20h"]}]}}}}'
Repeat the request to the reviews app. Now, you get responses from black stars (v2) and red stars (v3) because reviews v1 is unhealthy and the virtual destination permits load balancing.
- HTTP:
curl -vik --resolve www.example.com:80:${INGRESS_GW_IP} http://www.example.com:80/reviews/1
- HTTPS:
curl -vik --resolve www.example.com:443:${INGRESS_GW_IP} https://www.example.com:443/reviews/1
- HTTP:
Remove the sleep command from the reviews v1 app to restore normal behavior.
kubectl -n bookinfo patch deploy reviews-v1 --patch '{"spec":{"template":{"spec":{"containers":[{"name":"reviews","command":[]}]}}}}'
Keep sending requests to the reviews app. Notice that for the first 30 seconds, you only get responses from black stars (v2) and red stars (v3). After 30 seconds, you also get responses with no stars (v1). The outlier detection policy set the
baseEjectionTime
to 30s, so the reviews v1 is not returned to the healthy connection pool until after this time elapses.- HTTP:
curl -vik --resolve www.example.com:80:${INGRESS_GW_IP} http://www.example.com:80/reviews/1
- HTTPS:
curl -vik --resolve www.example.com:443:${INGRESS_GW_IP} https://www.example.com:443/reviews/1
- HTTP:
Cleanup
You can optionally remove the resources that you set up as part of this guide.- Delete the virtual destination and outlier detection policy.
kubectl -n bookinfo delete virtualdestination reviews-global kubectl -n bookinfo delete outlierdetectionpolicy outlier-detection
- Revert the route table back to using the Kubernetes service for the reviews app.
kubectl apply -f- <<EOF apiVersion: networking.gloo.solo.io/v2 kind: RouteTable metadata: name: www-example-com namespace: bookinfo spec: hosts: - www.example.com # Selects the virtual gateway you previously created virtualGateways: - name: istio-ingressgateway namespace: bookinfo http: # Route for the main productpage app - name: productpage matchers: - uri: prefix: /productpage forwardTo: destinations: - ref: name: productpage namespace: bookinfo port: number: 9080 # Routes all /reviews requests to the reviews-v1 or reviews-v2 apps - name: reviews labels: route: reviews matchers: - uri: prefix: /reviews forwardTo: destinations: - ref: name: reviews namespace: bookinfo port: number: 9080 # Routes all /ratings requests to the ratings-v1 app - name: ratings-ingress labels: route: ratings matchers: - uri: prefix: /ratings forwardTo: destinations: - ref: name: ratings namespace: bookinfo port: number: 9080 # Route for the httpbin app - name: httpbin-ingress labels: route: httpbin matchers: - headers: - name: X-httpbin forwardTo: destinations: - ref: name: httpbin namespace: httpbin port: number: 8000 EOF