Active healthcheck

Use the ingress gateway to periodically check the health of an upstream service in your cluster. If an upstream service is unavailable, the service is removed from the load balancing pool until health is re-established.

For more information, see the following resources.

Before you begin

This guide assumes that you use the same names for components like clusters, workspaces, and namespaces as in the getting started, and that your Kubernetes context is set to the cluster you store your Gloo config in (typically the management cluster). If you have different names, make sure to update the sample configuration files in this guide.

Follow the getting started instructions to:

  1. Set up Gloo Gateway in a single cluster.
  2. Deploy sample apps.
  3. Configure an HTTP listener on your gateway and set up basic routing for the sample apps.

Configure active healthcheck policies

You can apply an active healthcheck policy at the destination level. For more information, see Applying policies.

apiVersion: resilience.policy.gloo.solo.io/v2
kind: ActiveHealthCheckPolicy
metadata:
  annotations:
    cluster.solo.io/cluster: ""
  name: active-health-check-policy-httbin
  namespace: httpbin
spec:
  applyToDestinations:
  - port:
      number: 8000
    selector:
      name: httpbin
      namespace: httpbin
  config:
    healthCheck:
      alwaysLogHealthCheckFailures: true
      eventLogPath: /dev/stdout
      healthyThreshold: 1
      httpHealthCheck:
        host: httpbin.httpbin.svc.cluser.local
        path: /anything
      interval: 1s
      noTrafficInterval: 1s
      timeout: 5s
      unhealthyThreshold: 1
    virtualGateways:
    - cluster: $CLUSTER_NAME
      name: istio-ingressgateway
      namespace: bookinfo
Review the following table to understand this configuration.
Setting Description
alwaysLogHealthCheckFailures Log healthcheck failure events. If set to true, all healthcheck failure events are logged. If set to false, only the initial healthcheck failure event is logged. Any subsequent failure check events are not logged. The default value is false.
eventLogPath The path of the healthcheck event log.
healthyThreshold The number of healthy healthchecks that are required before an upstream server is marked as healthy.
httpHealthCheck.host The value of the host header in the HTTP healthcheck request. If you use this policy to check the health of a Kubernetes service, make sure to set this field to the hostname of your upstream server to avoid unexpected results. If this field is left empty, the hostname is set to the name of the cluster where the healthcheck is performed, which in Gloo Gateway, follows this format: |outbound|8080||my-svc.my-ns.svc.cluster.local. Depending on the upstream server that you have, a host header value with this format might not be supported.
httpHealthCheck.path The path in the upstream server that is used to perform the healthcheck.
interval The number of seconds between healthchecks.
noTrafficInterval The time interval that you want to use between healthchecks before traffic is received in the cluster. This setting can be useful if you want to keep the health information for your upstreams up-to-date and avoid sending a large number of active healthcheck requests for no reason. Once traffic is received, healh checks are performed by using the interval setting. The default value is 60s.
timeout The time to wait for a healthcheck response. If the timeout is reached, the healthcheck attempt is considered a failure.
unhealthyThreshold The number of unhealthy healthchecks that are required before an upstream server is marked as unhealthy. Note that if you check the health of an HTTP server, the response code must be in the expected_statuses or retriable_statuses list. If a response code is returned that is not part of this list, the upstream server is marked unhealthy immediately.

Verify active healthcheck policies

  1. Deploy an active healthcheck policy to your cluster.

    kubectl apply -f- <<EOF
    apiVersion: resilience.policy.gloo.solo.io/v2
    kind: ActiveHealthCheckPolicy
    metadata:
      annotations:
        cluster.solo.io/cluster: ""
      name: active-health-check-policy-httpbin
      namespace: httpbin
    spec:
      applyToDestinations:
      - port:
          number: 8000
        selector:
          name: httpbin
          namespace: httpbin
      config:
        healthCheck:
          alwaysLogHealthCheckFailures: true
          eventLogPath: /dev/stdout
          healthyThreshold: 1
          httpHealthCheck:
            host: httpbin.httpbin.svc.cluser.local
            path: /anything
          interval: 1s
          noTrafficInterval: 1s
          timeout: 5s
          unhealthyThreshold: 1
        virtualGateways:
        - cluster: $CLUSTER_NAME
          name: istio-ingressgateway
          namespace: bookinfo
    EOF
    
  2. Wait a few seconds. Then, get the logs of the httpbin app to see the successful healthcheck.

    kubectl logs $(kubectl get pod -l app=istio-ingressgateway -A -o jsonpath='{.items[0].metadata.name}') -n gloo-mesh-gateways
    

    Example output:

    {"health_checker_type":"HTTP","host":{"socket_address":{"protocol":"TCP","address":"10.16.0.20","resolver_name":"","ipv4_compat":false,"port_value":80}},"cluster_name":"outbound|8000||httpbin.httpbin.svc.cluster.local","add_healthy_event":{"first_check":true},"timestamp":"2023-05-23T18:29:37.285Z"}
    
  3. Modify the active healthcheck policy to use the /status/500 path to perform the healthcheck on the httpbin app. Because this endpoint returns a 500 HTTP response code, the healthcheck fails and the upstream server is marked as unhealthy.

    kubectl apply -f- <<EOF
    apiVersion: resilience.policy.gloo.solo.io/v2
    kind: ActiveHealthCheckPolicy
    metadata:
      annotations:
        cluster.solo.io/cluster: ""
      name: active-health-check-policy-httpbin
      namespace: httpbin
    spec:
      applyToDestinations:
      - port:
          number: 8000
        selector:
          name: httpbin
          namespace: httpbin
      config:
        healthCheck:
          alwaysLogHealthCheckFailures: true
          eventLogPath: /dev/stdout
          healthyThreshold: 1
          httpHealthCheck:
            host: httpbin.httpbin.svc.cluser.local
            path: /status/500
          interval: 1s
          noTrafficInterval: 1s
          timeout: 5s
          unhealthyThreshold: 1
        virtualGateways:
        - cluster: $CLUSTER_NAME
          name: istio-ingressgateway
          namespace: bookinfo
    EOF
    
  4. Wait a few seconds. Then, get the logs again and verify that you see the health_check_failure_event.

    kubectl logs $(kubectl get pod -l app=istio-ingressgateway -A -o jsonpath='{.items[0].metadata.name}') -n gloo-mesh-gateways
    

    Example output:

    {"health_checker_type":"HTTP","host":{"socket_address":{"protocol":"TCP","address":"10.16.0.20","resolver_name":"","ipv4_compat":false,"port_value":80}},"cluster_name":"outbound|8000||httpbin.httpbin.svc.cluster.local","health_check_failure_event":{"failure_type":"ACTIVE","first_check":false},"timestamp":"2023-05-23T18:41:00.410Z"}
    

Cleanup

You can optionally remove the resources that you set up as part of this guide.
kubectl delete activehealthcheckpolicy active-health-check-policy-httpbin -n httpbin