On this page

Fault injection

Test the resilience of your apps by injecting delays and connection failures.

Inject faults in a percentage of your requests to test how your app handles the errors. By using the policy, you can avoid deleting pods, delaying packets, or corrupting packets.

You can set two types of faults injection:

Delays: Delays are timing failures, such as network latency or overloaded upstreams.
Aborts: Aborts are crash failures, such as HTTP error codes or TCP connection failures.

For more information, see the following resources.

check_circle

If you import or export resources across workspaces, your policies might not apply. For more information, see Import and export policies.

Before you begin

info

This guide assumes that you use the same names for components like clusters, workspaces, and namespaces as in the getting started. If you have different names, make sure to update the sample configuration files in this guide.

Set up Gloo Mesh Gateway in a single cluster.
Install Bookinfo and other sample apps.
Configure an HTTP listener on your gateway and set up basic routing for the sample apps.
Get the external address of your ingress gateway. The steps vary depending on the type of load balancer that backs the ingress gateway.

  export INGRESS_GW_ADDRESS=$(kubectl get svc -n gloo-mesh-gateways istio-ingressgateway -o jsonpath="{.status.loadBalancer.ingress[0]['hostname','ip']}")
echo $INGRESS_GW_ADDRESS

Note: Depending on your environment, you might see <pending> instead of an external IP address. For example, if you are testing locally in kind or minikube, or if you have insufficient permissions in your cloud platform, you can instead port-forward the service port of the ingress gateway:

  kubectl -n gloo-mesh-gateways port-forward deploy/istio-ingressgateway-1-18 8081

Configure fault injection policies

You can apply a fault injection policy at the route level. For more information, see Applying policies.

info

When you apply this custom resource to your cluster, Gloo Mesh Gateway automatically checks the configuration against validation rules and value constraints. You can also run a pre-admission validation check by using the meshctl x validate resources command. For more information, see the resource validation overview and the CLI command reference.

Abort example

The following example is for a simple fault injection abort policy that returns a 418 HTTP response. No delay is configured.

  apiVersion: resilience.policy.gloo.solo.io/v2
kind: FaultInjectionPolicy
metadata:
  annotations:
    cluster.solo.io/cluster: ""
  name: faultinjection-basic
  namespace: bookinfo
spec:
  applyToRoutes:
  - route:
      labels:
        route: ratings
  config:
    abort:
      httpStatus: 418

Review the following table to understand this configuration. For more information, see the API docs.

Setting	Description
`spec.applyToRoutes`	Use labels to configure which routes to apply the policy to. This example label matches the app and route from the example route table that you apply separately. If omitted and you do not have another selector such as `applyToDestinations`, the policy applies to all routes in the workspace.
`spec.config.abort`	Because no `percentage` field is set, the policy defaults to aborting 100% of requests. The `httpStatus` field sets the HTTP response code to return that you want to send back. This example sets the `418` HTTP response code. The value must be an integer in the range `[200, 600]`. For HTTP response status codes, see the mdn web docs.

Delay example

The following example is for a simple fault injection delay policy with a default value for the percentage. No abort is configured.

  apiVersion: resilience.policy.gloo.solo.io/v2
kind: FaultInjectionPolicy
metadata:
  name: faultinjection-basic-delay
  namespace: bookinfo
spec:
  applyToRoutes:
    - route:
        labels:
          route: ratings
  config:
    delay:
      fixedDelay: 5s

Review the following table to understand this configuration. For more information, see the API docs.

Setting	Description
`spec.applyToRoutes`	Use labels to configure which routes to apply the policy to. This example label matches the app and route from the example route table that you apply separately. If omitted and you do not have another selector such as `applyToDestinations`, the policy applies to all routes in the workspace.
`spec.config.delay`	Because no `percentage` field is set, the policy defaults to delaying 100% of requests. The `fixedDelay` field is required, and sets the duration in seconds to delay the request.

Abort and delay example

The following example is for a fault injection policy that both delays and aborts requests. Delays and aborts are independent of one another. When both values are set, your requests are either delayed only, delayed and aborted, or aborted only.

  apiVersion: resilience.policy.gloo.solo.io/v2
kind: FaultInjectionPolicy
metadata:
  name: faultinjection-basic-abort-and-delay
  namespace: bookinfo
  annotations:
    cluster.solo.io/cluster: $REMOTE_CLUSTER1
spec:
  applyToRoutes:
    - route:
        labels:
          route: ratings
  config:
    abort:
      httpStatus: 418
      percentage: 10
    delay:
      percentage: 40
      fixedDelay: 5s

Review the following table to understand this configuration. For more information, see the API docs.

Setting	Description
`spec.applyToRoutes`	Use labels to configure which routes to apply the policy to. This example label matches the app and route from the example route table that you apply separately. If omitted and you do not have another selector such as `applyToDestinations`, the policy applies to all routes in the workspace.
`spec.config.abort`	The `httpStatus` field sets the HTTP response code to return that you want to send back. This example sets the `418` HTTP response code. The value must be an integer in the range `[200, 600]`. For HTTP response status codes, see the mdn web docs. The `percentage` field is set to `10`, so 10% of the requests are aborted. If the request is also chosen for a delay, the delay happens before the request is aborted.
`spec.config.delay`	The `fixedDelay` field is required, and sets the duration in seconds to delay the request. The `percentage` field is set to `40`, so 40% of the requests are delayed. If the request is also chosen to be aborted, the delay happens before the request is aborted.

Verify fault injection policies

Create the example fault injection policy for the ratings app.

  kubectl apply --context ${REMOTE_CONTEXT1} -f - << EOF
apiVersion: resilience.policy.gloo.solo.io/v2
kind: FaultInjectionPolicy
metadata:
  annotations:
    cluster.solo.io/cluster: ""
  name: faultinjection-basic
  namespace: bookinfo
spec:
  applyToRoutes:
  - route:
      labels:
        route: ratings
  config:
    abort:
      httpStatus: 418
EOF

Send a request to the app.

HTTP:

  curl -vik --resolve www.example.com:80:${INGRESS_GW_ADDRESS} http://www.example.com:80/ratings/1

HTTPS:

  curl -vik --resolve www.example.com:443:${INGRESS_GW_ADDRESS} https://www.example.com:443/ratings/1

Verify that you notice the fault from the previous examples.
- Abort: All inbound requests to the ratings service result in a 418 Unknown HTTP status code.
- Delay: All inbound requests to the ratings service have a five second delay.
- Both abort and delay: 10% of the calls return 418 Unknown HTTP status code responses, and 40% have a five second delay before they send a response.

Cleanup

You can optionally remove the resources that you set up as part of this guide.

  kubectl -n bookinfo delete RouteTable ratings-rt
kubectl -n bookinfo delete FaultInjectionPolicy faultinjection-basic

Fault injection

Before you begin link

Configure fault injection policies link

Abort example link

Delay example link

Abort and delay example link

Verify fault injection policies link

Cleanup link

Before you begin

Configure fault injection policies

Abort example

Delay example

Abort and delay example

Verify fault injection policies

Cleanup