Prioritize the failover of requests across different models from an LLM provider.

About failover

Failover is a way to keep services running smoothly by automatically switching to a backup system when the main one fails or becomes unavailable.

For Solo Enterprise for agentgateway, you can set up failover for the models of the LLM providers that you want to prioritize. If the main model from one provider goes down, slows, or has any issue, the system quickly switches to a backup model from that same provider. This keeps the service running without interruptions.

This approach increases the resiliency of your network environment by ensuring that apps that call LLMs can keep working without problems, even if one model has issues.

Before you begin

  1. Set up an agentgateway proxy.
  2. Set up API access to each LLM provider that you want to use. The example in this guide uses OpenAI.

Fail over to other models

You can configure failover across multiple models and providers by using priority groups. Each priority group represents a set of providers that share the same priority level. Failover priority is determined by the order in which the priority groups are listed in the Backend. The priority group that is listed first is assigned the highest priority. Models within the same priority group are load balanced (round-robin), not prioritized.

  1. Create or update the Backend for your LLM providers.

  2. Create an HTTPRoute resource that routes incoming traffic on the /model path to the Backend that you created in the previous step. In this example, the URLRewrite filter rewrites the path from /model to the path of the API in the LLM provider that you want to use, such as /v1/chat/completions for OpenAI.

      kubectl apply -f- <<EOF
    apiVersion: gateway.networking.k8s.io/v1
    kind: HTTPRoute
    metadata:
      name: model-failover
      namespace: gloo-system
    spec:
      parentRefs:
        - name: agentgateway-proxy
          namespace: gloo-system
      rules:
      - matches:
        - path:
            type: PathPrefix
            value: /model
        filters:
        - type: URLRewrite
          urlRewrite:
            path:
              type: ReplaceFullPath
              replaceFullPath: /v1/chat/completions
        backendRefs:
        - name: model-failover
          namespace: gloo-system
          group: gateway.kgateway.dev
          kind: Backend
    EOF
      

  3. Send a request to observe the failover. In your request, do not specify a model. Instead, the Backend automatically uses the model from the first priority group (highest priority).

    Example output:

Cleanup

You can remove the resources that you created in this guide.
  kubectl delete Backend model-failover -n gloo-system
kubectl delete httproute model-failover -n gloo-system
  

Next

Explore other agentgateway features.