Active healthcheck
Periodically check the health of an upstream service.
If an upstream service is unavailable, the service is removed from the load balancing pool until health is re-established.
For more information, see the following resources.
Before you begin
This guide assumes that you use the same names for components like clusters, workspaces, and namespaces as in the getting started. If you have different names, make sure to update the sample configuration files in this guide.
- Set up Gloo Mesh Gateway in a single cluster.
- Install Bookinfo and other sample apps.
Configure an HTTP listener on your gateway and set up basic routing for the sample apps.
Configure active healthcheck policies
You can apply an active healthcheck policy at the destination level. For more information, see Applying policies.
apiVersion: resilience.policy.gloo.solo.io/v2
kind: ActiveHealthCheckPolicy
metadata:
annotations:
cluster.solo.io/cluster: ""
name: active-health-check-policy-httbin
namespace: httpbin
spec:
applyToDestinations:
- port:
number: 8000
selector:
name: httpbin
namespace: httpbin
config:
healthCheck:
alwaysLogHealthCheckFailures: true
eventLogPath: /dev/stdout
healthyThreshold: 1
httpHealthCheck:
host: httpbin.httpbin.svc.cluser.local
path: /anything
interval: 1s
noTrafficInterval: 1s
timeout: 5s
unhealthyThreshold: 1
virtualGateways:
- cluster: $CLUSTER_NAME
name: istio-ingressgateway
namespace: bookinfo
Setting | Description |
---|---|
alwaysLogHealthCheckFailures | Log healthcheck failure events. If set to true , all healthcheck failure events are logged. If set to false , only the initial healthcheck failure event is logged. Any subsequent failure check events are not logged. The default value is false . |
eventLogPath | The path of the healthcheck event log. |
healthyThreshold | The number of healthy healthchecks that are required before an upstream server is marked as healthy. |
httpHealthCheck.host | The value of the host header in the HTTP healthcheck request. If you use this policy to check the health of a Kubernetes service, make sure to set this field to the hostname of your upstream server to avoid unexpected results. If this field is left empty, the hostname is set to the name of the cluster where the healthcheck is performed, which in Gloo Gateway, follows this format: ` |
httpHealthCheck.path | The path in the upstream server that is used to perform the healthcheck. |
interval | The number of seconds between healthchecks. |
noTrafficInterval | The time interval that you want to use between healthchecks before traffic is received in the cluster. This setting can be useful if you want to keep the health information for your upstreams up-to-date and avoid sending a large number of active healthcheck requests for no reason. Once traffic is received, health checks are performed by using the interval setting. The default value is 60s. |
timeout | The time to wait for a healthcheck response. If the timeout is reached, the healthcheck attempt is considered a failure. |
unhealthyThreshold | The number of unhealthy healthchecks that are required before an upstream server is marked as unhealthy. Note that if you check the health of an HTTP server, the response code must be in the expected_statuses or retriable_statuses list. If a response code is returned that is not part of this list, the upstream server is marked unhealthy immediately. |
Verify active healthcheck policies
Deploy an active healthcheck policy to your cluster.
kubectl apply -f- <<EOF apiVersion: resilience.policy.gloo.solo.io/v2 kind: ActiveHealthCheckPolicy metadata: annotations: cluster.solo.io/cluster: "" name: active-health-check-policy-httpbin namespace: httpbin spec: applyToDestinations: - port: number: 8000 selector: name: httpbin namespace: httpbin config: healthCheck: alwaysLogHealthCheckFailures: true eventLogPath: /dev/stdout healthyThreshold: 1 httpHealthCheck: host: httpbin.httpbin.svc.cluser.local path: /anything interval: 1s noTrafficInterval: 1s timeout: 5s unhealthyThreshold: 1 virtualGateways: - cluster: $CLUSTER_NAME name: istio-ingressgateway namespace: bookinfo EOF
Wait a few seconds. Then, get the logs of the httpbin app to see the successful healthcheck.
kubectl logs $(kubectl get pod -l app=istio-ingressgateway -A -o jsonpath='{.items[0].metadata.name}') -n gloo-mesh-gateways
Example output:
{"health_checker_type":"HTTP","host":{"socket_address":{"protocol":"TCP","address":"10.16.0.20","resolver_name":"","ipv4_compat":false,"port_value":80}},"cluster_name":"outbound|8000||httpbin.httpbin.svc.cluster.local","add_healthy_event":{"first_check":true},"timestamp":"2023-05-23T18:29:37.285Z"}
Modify the active healthcheck policy to use the
/status/500
path to perform the healthcheck on the httpbin app. Because this endpoint returns a 500 HTTP response code, the healthcheck fails and the upstream server is marked as unhealthy.kubectl apply -f- <<EOF apiVersion: resilience.policy.gloo.solo.io/v2 kind: ActiveHealthCheckPolicy metadata: annotations: cluster.solo.io/cluster: "" name: active-health-check-policy-httpbin namespace: httpbin spec: applyToDestinations: - port: number: 8000 selector: name: httpbin namespace: httpbin config: healthCheck: alwaysLogHealthCheckFailures: true eventLogPath: /dev/stdout healthyThreshold: 1 httpHealthCheck: host: httpbin.httpbin.svc.cluser.local path: /status/500 interval: 1s noTrafficInterval: 1s timeout: 5s unhealthyThreshold: 1 virtualGateways: - cluster: $CLUSTER_NAME name: istio-ingressgateway namespace: bookinfo EOF
Wait a few seconds. Then, get the logs again and verify that you see the
health_check_failure_event
.kubectl logs $(kubectl get pod -l app=istio-ingressgateway -A -o jsonpath='{.items[0].metadata.name}') -n gloo-mesh-gateways
Example output:
{"health_checker_type":"HTTP","host":{"socket_address":{"protocol":"TCP","address":"10.16.0.20","resolver_name":"","ipv4_compat":false,"port_value":80}},"cluster_name":"outbound|8000||httpbin.httpbin.svc.cluster.local","health_check_failure_event":{"failure_type":"ACTIVE","first_check":false},"timestamp":"2023-05-23T18:41:00.410Z"}
Cleanup
You can optionally remove the resources that you set up as part of this guide.
kubectl delete activehealthcheckpolicy active-health-check-policy-httpbin -n httpbin