Outlier detection

Track the status of each upstream destination so that you can temporarily remove unhealthy destinations.

Outlier detection is an important part of building resilient apps. An outlier detection policy sets up several conditions, such as retries and ejection percentages, that Gloo Mesh uses to determine if a service is unhealthy. In case an unhealthy service is detected, the outlier policy defines how Gloo Mesh removes (ejects) services from the pool of healthy destinations to send traffic to. Your apps then have time to recover before they are added back to the load-balancing pool and checked again for consecutive errors.

You can use outlier detection policies in combination with other policies, such as failover or retry policies. In case of a failure, the outlier detection policy tells Gloo Mesh when and for how long to remove unhealthy services. The retry policy tells Gloo Mesh how many times to retry requests before the outlier detection policy considers the request as failing and removes the service from the pool of healthy destinations. The failover policy tells Gloo Mesh which healthy destinations to reroute traffic to, based on the closest locality.

For more information, see the following resources.

Before you begin

  1. Complete the demo setup to install Gloo Mesh, Istio, and Bookinfo in your cluster.

  2. Create the Gloo Mesh resources for this policy in the management and workload clusters.

    The following files are examples only for testing purposes. Your actual setup might vary. You can use the files as a reference for creating your own tests.

    1. Download the following Gloo Mesh resources:
    2. Apply the files to your management cluster.
      kubectl apply -f kubernetes-cluster_gloo-mesh_cluster-1.yaml --context ${MGMT_CONTEXT}
      kubectl apply -f kubernetes-cluster_gloo-mesh_cluster-2.yaml --context ${MGMT_CONTEXT}
      kubectl apply -f workspace_gloo-mesh_anything.yaml --context ${MGMT_CONTEXT}
      
    1. Download the following Gloo Mesh resources:
    2. Apply the files to your workload cluster.
      kubectl apply -f workspace-settings_bookinfo_federated-anything.yaml --context ${REMOTE_CONTEXT1}
      

Configure outlier detection policies

You can apply an outlier detection policy at the destination level. For more information, see Applying policies.

Review the following sample configuration file.

apiVersion: resilience.policy.gloo.solo.io/v2
kind: OutlierDetectionPolicy
metadata:
  name: outlier-detection
  namespace: bookinfo
spec:
  applyToDestinations:
  - kind: VIRTUAL_DESTINATION
    selector: {}
  config:
    baseEjectionTime: 30s
    consecutiveErrors: 2
    interval: 1s
    maxEjectionPercent: 100

Review the following table to understand this configuration. For more information, see the API docs.

Setting Description
applyToDestinations Configure which destinations to apply the policy to, by using labels. Destinations can be a Kubernetes service, VirtualService, or ExternalService. If you do not specify any destinations or routes, the rate limit policy applies to all destinations in the workspace by default. If you do not specify any destinations but you do specify a route, the rate limit applies to the route but to no destinations. This example selects all virtual destinations in the workspace, including the one that you previously created.
baseEjectionTime The minimum time duration for ejection, or the time when a destination is considered unhealthy and not used for load balancing. Set this value as an integer plus a unit of time, in the format 1h, 1m, 1s, or 1ms. The value must be at least 1ms, and defaults to 30s.
consecutiveErrors The number of errors before a destination is removed from the healthy connection pool. The default is 5.
interval The amount of time between analyzing destinations for ejection. Set this value as an integer plus a unit of time, in the format 1h, 1m, 1s, or 1ms. The value must be at least 1ms, and defaults to 10s.
maxEjectionPercent The maximum percentage of destinations that can be removed from the healthy connection pool at a time. For example, if you have 10 total destinations that the policy selects, and set this value to 50 percent, 5 destinations can be removed at once. At least 1 destination can always be removed, regardless of the value you set. You can set this value between 0 and 100, with a default of 100.

Verify outlier detection policies

You can test how outlier detection works by opening the Bookinfo app in your browser and observing the reviews app behavior after applying various resources.

  1. Open the Bookinfo product page from your local host.
    1. Enable port-forwarding on the product page deployment.
      kubectl --context ${REMOTE_CONTEXT1} -n bookinfo port-forward deployment/productpage-v1 9080:9080
      
    2. Open your browser to http://localhost:9080/. You might need to click Normal user to open the app.
    3. Refresh your page a few times to see the reviews change from no stars to black stars, depending on which version of the reviews service is accessed.
  2. In another tab in your terminal, create a virtual destination for the reviews app.
    kubectl --context ${REMOTE_CONTEXT1} apply -f - <<EOF
    apiVersion: networking.gloo.solo.io/v2
    kind: VirtualDestination
    metadata:
      name: reviews-global
      namespace: bookinfo
    spec:
      hosts:
      - reviews.global
      ports:
      - number: 80
        protocol: HTTP
        targetPort:
          name: http
      services:
      - labels:
          app: reviews
    EOF
    
  3. Send the reviews v1 app to sleep, to mimic an app failure.
    kubectl --context ${REMOTE_CONTEXT1} -n bookinfo patch deploy reviews-v1 --patch '{"spec":{"template":{"spec":{"containers":[{"name":"reviews","command":["sleep","20h"]}]}}}}'
    
  4. In your browser, refresh the Bookinfo product page a few times. Notice that you see alternating error messages (v1) and black star reviews (v2).
  5. In your terminal, create the outlier detection policy that you previously reviewed.
    kubectl --context ${REMOTE_CONTEXT1} apply -f - <<EOF
    apiVersion: resilience.policy.gloo.solo.io/v2
    kind: OutlierDetectionPolicy
    metadata:
      name: outlier-detection
      namespace: bookinfo
    spec:
      applyToDestinations:
      - kind: VIRTUAL_DESTINATION
        selector: {}
      config:
        baseEjectionTime: 30s
        consecutiveErrors: 2
        interval: 1s
        maxEjectionPercent: 100
    EOF
    
  6. In your browser, refresh the Bookinfo product page a few times. Notice that you eventually only see the black star reviews (v2), because the policy takes the v1 destinations out of the healthy connection pool.
  7. Optional: Remove the sleep command from the reviews v1 app to restore normal behavior.
    kubectl --context ${REMOTE_CONTEXT1} -n bookinfo patch deploy reviews-v1 --patch '{"spec":{"template":{"spec":{"containers":[{"name":"reviews","command":[]}]}}}}'