About observability

Observability helps you understand how your system is performing, identify issues, and troubleshoot problems. Gloo AI Gateway provides a rich set of observability features that help you monitor and analyze the performance of your AI Gateway and the LLM providers that it interacts with.

In the following tutorial, you extract claims from the JWT tokens for Alice and Bob that you created in the Control access tutorial. Then, you learn how to gather and observe key metrics related to LLM provider usage. Note that all observability features in this tutorial are built on existing Gloo Gateway monitoring capabilities. For more information about traditional observability features, see the observability guide.

Dynamic metadata

Dynamic metadata is a powerful feature of Envoy that allows you to attach arbitrary key-value pairs to requests and responses as they flow through the Envoy proxy. Gloo AI Gateway uses dynamic metadata to expose key metrics related to LLM provider usage. These metrics can be used to monitor and analyze the performance of your AI Gateway and the LLM providers that it interacts with.

The sections in this tutorial leverage the following dynamic metadata fields that are exposed by the AI Gateway, which are listed here for reference.

  • ai.gloo.solo.io:total_tokens: The total number of tokens used in the request.
  • ai.gloo.solo.io:prompt_tokens: The number of tokens used in the prompt.
  • ai.gloo.solo.io:completion_tokens: The number of tokens used in the completion.
  • envoy.ratelimit:hits_addend: The number of tokens that were calculated to be rate limited.
  • ai.gloo.solo.io:model: The model which was specified by the user in the request.
  • ai.gloo.solo.io:provider_model: The model which the LLM provider used and returned in the response.
  • ai.gloo.solo.io:provider: The LLM provider being used, such as OpenAI, Anthropic, etc.
  • ai.gloo.solo.io:streaming: A boolean indicating whether the request was streamed.

Before you begin

Complete the Control access tutorial.

Access logging

Access logs, sometimes referred to as audit logs, represent all traffic requests that pass through the AI Gateway proxy. You can leverage the default Envoy access log collector to record logs for the AI Gateway. You can then review these logs to identify and troubleshoot issues as-needed, or scrape these logs to view them in your larger platform logging system. Auditors in your organization can use this information to better understand how users interact with your system, and to detect malicious activity or unusual amounts of requests to your gateway.

  1. Define access logging configuration for the gateway in a ListenerOptions resource. This resource configures Envoy to log access logs to stdout in JSON format by using DYNAMIC_METADATA fields that are specifically exposed for the AI Gateway.

      kubectl apply -f- <<EOF
    apiVersion: gateway.solo.io/v1
    kind: ListenerOption
    metadata:
      name: log-provider
      namespace: gloo-system
    spec:
      targetRefs:
      - group: gateway.networking.k8s.io
        kind: Gateway
        name: ai-gateway
      options:
        accessLoggingService:
          accessLog:
          - fileSink:
              jsonFormat:
                http_method: '%REQ(:METHOD)%'
                path: '%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%'
                user: '%DYNAMIC_METADATA(envoy.filters.http.jwt_authn:principal:sub)%'
                team: '%DYNAMIC_METADATA(envoy.filters.http.jwt_authn:principal:team)%'
                request_id: '%REQ(X-REQUEST-ID)%'
                response_code: '%RESPONSE_CODE%'
                system_time: '%START_TIME%'
                target_duration: '%RESPONSE_DURATION%'
                upstream_name: '%UPSTREAM_CLUSTER%'
                total_tokens: '%DYNAMIC_METADATA(ai.gloo.solo.io:total_tokens)%'
                prompt_tokens: '%DYNAMIC_METADATA(ai.gloo.solo.io:prompt_tokens)%'
                completion_tokens: '%DYNAMIC_METADATA(ai.gloo.solo.io:completion_tokens)%'
                rate_limited_tokens: '%DYNAMIC_METADATA(envoy.ratelimit:hits_addend)%'
                streaming: '%DYNAMIC_METADATA(ai.gloo.solo.io:streaming)%'
              path: /dev/stdout
    EOF
      
  2. Send a curl request with the JWT token for Alice to review access logs in action.

      curl "$INGRESS_GW_ADDRESS:8080/openai" -H "Authorization: Bearer $ALICE_TOKEN" -H content-type:application/json -d '{
     "model": "gpt-3.5-turbo",
     "messages": [
       {
         "role": "user",
         "content": "Please explain the movie Dr. Strangelove in 1 sentence."
       }
     ]
    }'
      
  3. After the request completes, view the access log for this request by getting the logs from the AI Gateway pod.

      kubectl logs -n gloo-system deploy/gloo-proxy-ai-gateway | tail -1 | jq --sort-keys
      

    Verify that a log for your request is returned and looks similar to the following. If a log with these fields doesn’t show up immediately, run the kubectl logs command again, as the logs are flushed asynchronously.

      {
      "completion_tokens":22,
      "http_method":"POST",
      "path":"/v1/chat/completions",
      "prompt_tokens":21,
      "rate_limited_tokens":23,
      "request_id":"ee53553a-ca2f-4e49-b426-325d0cfc05f5",
      "response_code":200,
      "streaming":false,
      "system_time":"2025-01-02T17:32:53.596Z",
      "target_duration":544,
      "team":"dev",
      "total_tokens":43,
      "upstream_name":"openai_gloo-system",
      "user":"alice"
    }
      
  4. To review a log for a streamed response, send a curl request that uses streaming with the JWT token for Bob.

      curl "$INGRESS_GW_ADDRESS:8080/openai" -H "Authorization: Bearer $BOB_TOKEN" -H content-type:application/json -d '{
     "model": "gpt-3.5-turbo",
     "messages": [
       {
         "role": "user",
         "content": "Please explain the movie Dr. Strangelove in 1 sentence."
       }
     ],
     "stream_options": {
       "include_usage": true
     },
     "stream": true
    }'
      
  5. Check the most recent access log again.

      kubectl logs -n gloo-system deploy/gloo-proxy-ai-gateway | tail -1 | jq --sort-keys
      

    This time, the streaming field is recorded as true and the user is bob, but all other token information is still available.

      {
      "completion_tokens":40,
      "http_method":"POST",
      "path":"/v1/chat/completions",
      "prompt_tokens":21,
      "rate_limited_tokens":23,
      "request_id":"8fa7950a-e609-4e3b-83f1-2cd38cf3c591",
      "response_code":200,
      "streaming":true,
      "system_time":"2025-01-02T17:34:13.213Z",
      "target_duration":297,
      "team":"ops",
      "total_tokens":61,
      "upstream_name":"openai_gloo-system",
      "user":"bob"
    }
      

Metrics

While access logs are great for understanding individual requests, metrics are better for understanding the overall health and performance of your system. Gloo AI Gateway provides a rich set of metrics that help you monitor and analyze the performance of your AI Gateway and the LLM providers that it interacts with. In addition, you can add custom labels to these metrics to help you better understand the context of the requests.

Default metrics

Before you modify the labels, first take a look at the default metrics that the system outputs.

  1. In another tab in your terminal, port-forward the ai-gateway container of the gateway proxy.

      kubectl port-forward -n gloo-system deploy/gloo-proxy-ai-gateway 9092
      
  2. In the previous tab, run the following command to view the metrics.

      curl localhost:9092/metrics
      
  3. In the output, search for the ai_completion_tokens_total and ai_prompt_tokens_total metrics. These metrics total the number of tokens used in the prompt and completion for the openai model gpt-3.5-turbo.

      # HELP ai_completion_tokens_total Completion tokens
    # TYPE ai_completion_tokens_total counter
    ai_completion_tokens_total{llm="openai",model="gpt-3.5-turbo"} 539.0
    ...
    # HELP ai_prompt_tokens_total Prompt tokens
    # TYPE ai_prompt_tokens_total counter
    ai_prompt_tokens_total{llm="openai",model="gpt-3.5-turbo"} 204.0
      
  4. Stop port-forwarding the ai-gateway container.

Customized metrics

Default metrics are useful for gauging LLM usage overtime, but don’t help you understand usage by each team. You can add that context by creating custom labels.

  1. To add custom labels to the metrics, update the GatewayParameters resource. In the stats.customLabels section, add a list of labels that contain the name of the label and the dynamic metadata field to get the label from. In this resource, the team label sources from the team field of the JWT token. The metadata namespace defaults to the namespace where you defined the JWT provider, but you can specify a different namespace if you have a different source of metadata. When you apply this resource, the gateway proxy restarts to pick up the new stats configuration.

      kubectl apply -f- <<EOF
    apiVersion: gateway.gloo.solo.io/v1alpha1
    kind: GatewayParameters
    metadata:
      name: gloo-gateway-override
      namespace: gloo-system
    spec:
      kube:
        aiExtension:
          enabled: true
          stats:
            customLabels:
              - name: "team"
                metadataKey: "principal:team"
    EOF
      
  2. Send one request as Alice and one request as Bob to test out the metrics labels. Recall that Alice works on the dev team, and Bob works on the ops team.

      curl "$INGRESS_GW_ADDRESS:8080/openai" -H "Authorization: Bearer $ALICE_TOKEN" -H content-type:application/json -d '{
     "model": "gpt-3.5-turbo",
     "messages": [
       {
         "role": "user",
         "content": "Please explain the movie Dr. Strangelove in 1 sentence."
       }
     ]
    }'
    curl "$INGRESS_GW_ADDRESS:8080/openai" -H "Authorization: Bearer $BOB_TOKEN" -H content-type:application/json -d '{
     "model": "gpt-3.5-turbo",
     "messages": [
       {
         "role": "user",
         "content": "Please explain the movie Dr. Strangelove in 1 sentence."
       }
     ]
    }'
      
  3. In the port-forwarding tab, run the port-forward command again for the ai-gateway container of the gateway proxy.

      kubectl port-forward -n gloo-system deploy/gloo-proxy-ai-gateway 9092
      
  4. In the previous tab, view the metrics again.

      curl localhost:9092/metrics
      
  5. In the output, search for the ai_completion_tokens_total and ai_prompt_tokens_total metrics again, and verify that the token total metrics are now separated according to team.

      # HELP ai_completion_tokens_total Completion tokens
    # TYPE ai_completion_tokens_total counter
    ai_prompt_tokens_total{llm="openai",model="gpt-3.5-turbo",team="dev"} 21.0
    ai_prompt_tokens_total{llm="openai",model="gpt-3.5-turbo",team="ops"} 21.0
    ...
    # HELP ai_prompt_tokens_total Prompt tokens
    # TYPE ai_prompt_tokens_total counter
    ai_completion_tokens_total{llm="openai",model="gpt-3.5-turbo",team="dev"} 18.0
    ai_completion_tokens_total{llm="openai",model="gpt-3.5-turbo",team="ops"} 30.0
      

Tracing

Tracing helps you follow the path of a request as it is forwarded through your system. You can use tracing data to identify bottlenecks, troubleshoot problems, and optimize performance. For this tutorial, you use the all-in-one deployment of the Jaeger open source tool, which runs all components of a tracing system that you need to get started. However, because the traces are formatted for OpenTelemetry, you can configure any system that supports OTel gRPC traces.

Note that the tracing functionality of the AI Gateway integrates seamlessly with existing tracing functionality in Gloo Gateway, so any existing tracing setups continue to work.

  1. Add the Helm repo for Jaeger.

      helm repo add jaegertracing https://jaegertracing.github.io/helm-charts
    helm repo update
      
  2. Deploy Jaeger into the observability namespace.

      helm upgrade --install jaeger jaegertracing/jaeger \
    -n observability \
    --create-namespace \
    -f - <<EOF
    provisionDataStore:
      cassandra: false
    allInOne:
      enabled: true
    storage:
      type: memory
    agent:
      enabled: false
    collector:
      enabled: false
    query:
      enabled: false
    EOF
      
  3. Verify that the Jaeger all-in-one pod is running.

      kubectl get pods -n observability
      

    Example output:

      NAME                      READY   STATUS    RESTARTS   AGE
    jaeger-5d459f9f94-b4ckv   1/1     Running   0          26s
      
  4. Update the GatewayParameters resource to configure the AI Gateway to send traces to the Jaeger collector.

      kubectl apply -f- <<EOF
    apiVersion: gateway.gloo.solo.io/v1alpha1
    kind: GatewayParameters
    metadata:
      name: gloo-gateway-override
      namespace: gloo-system
    spec:
      kube:
        aiExtension:
          enabled: true
          stats:
            customLabels:
              - name: "team"
                metadataKey: "principal:team"
          tracing:
            insecure: true
            grpc:
              host: "jaeger-collector.observability.svc.cluster.local"
              port: 4317
    EOF
      
  5. To enrich the tracing data with more details, create an additional tracing configuration for Envoy.

    1. Create an Upstream resource that represents the Jaeger collector.
        kubectl apply -f- <<EOF
      apiVersion: gloo.solo.io/v1
      kind: Upstream
      metadata:
        labels:
          app: gloo
        name: jaeger
        namespace: gloo-system
      spec:
        useHttp2: true
        static:
          hosts:
          - addr: "jaeger-collector.observability.svc.cluster.local"
            port: 4317
      EOF
        
    2. Create an HttpListenerOption resource that references the Gateway and Upstream. Notice that the tags use the same dynamic metadata values as the access logging and metric configurations in the previous sections.
        kubectl apply -f- <<EOF
      apiVersion: gateway.solo.io/v1
      kind: HttpListenerOption
      metadata:
        name: log-provider
        namespace: gloo-system
      spec:
        targetRefs:
        - group: gateway.networking.k8s.io
          kind: Gateway
          name: ai-gateway
        options:
          httpConnectionManagerSettings:
            tracing:
              spawnUpstreamSpan: true
              metadataForTags:
              - tag: ai.model
                value:
                  namespace: ai.gloo.solo.io
                  key: model
              - tag: ai.provider_model
                value:
                  namespace: ai.gloo.solo.io
                  key: provider_model
              - tag: ai.streaming
                value:
                  namespace: ai.gloo.solo.io
                  key: streaming
              - tag: ai.prompt_tokens
                value:
                  namespace: ai.gloo.solo.io
                  key: prompt_tokens
              - tag: ai.completion_tokens
                value:
                  namespace: ai.gloo.solo.io
                  key: completion_tokens
              - tag: user.name
                value:
                  namespace: envoy.filters.http.jwt_authn
                  key: principal.name
              - tag: user.team
                value:
                  namespace: envoy.filters.http.jwt_authn
                  key: principal.team
              openTelemetryConfig:
                serviceName: gloo-proxy
                collectorUpstreamRef:
                  namespace: gloo-system
                  name: jaeger
      EOF
        
  6. Send one request as Alice and one request as Bob to test out the tracing configuration.

      curl "$INGRESS_GW_ADDRESS:8080/openai" -H "Authorization: Bearer $ALICE_TOKEN" -H content-type:application/json -d '{
     "model": "gpt-3.5-turbo",
     "messages": [
       {
         "role": "user",
         "content": "Please explain the movie Dr. Strangelove in 1 sentence."
       }
     ]
    }'
    curl "$INGRESS_GW_ADDRESS:8080/openai" -H "Authorization: Bearer $BOB_TOKEN" -H content-type:application/json -d '{
     "model": "gpt-3.5-turbo",
     "messages": [
       {
         "role": "user",
         "content": "Please explain the movie Dr. Strangelove in 1 sentence."
       }
     ]
    }'
      
  7. In the port-forwarding tab of your terminal, port-forward the Jaeger UI.

      kubectl port-forward -n observability deployments/jaeger 16686
      
  8. In your browser, navigate to http://localhost:16686 to view the Jaeger UI. Look for the traces for the non-streaming requests that you made, which look like the following:

    Figure: Traces for requests to the AI Gateway in the Jaeger UI.
    Figure: Traces for requests to the AI Gateway in the Jaeger UI.

Next

You can now explore how to add further protection to your LLM provider by setting up rate limiting based on claims in a JWT token.

Note that if you do not want to complete the rate limiting tutorial and instead want to try a different tutorial, first clean up the JWT authentication resources that you created in the previous tutorial. If you do not remove these resources, you must include JWT tokens with the correct access in all subsequent curl requests to the AI API.

  kubectl delete VirtualHostOption jwt-provider -n gloo-system
kubectl delete RouteOption openai-opt -n gloo-system