Observability
Gather and observe key metrics related to LLM provider usage.
About observability
Observability helps you understand how your system is performing, identify issues, and troubleshoot problems. Gloo AI Gateway provides a rich set of observability features that help you monitor and analyze the performance of your AI Gateway and the LLM providers that it interacts with.
In the following tutorial, you extract claims from the JWT tokens for Alice and Bob that you created in the Control access tutorial. Then, you learn how to gather and observe key metrics related to LLM provider usage. Note that all observability features in this tutorial are built on existing Gloo Gateway monitoring capabilities. For more information about traditional observability features, see the observability guide.
Dynamic metadata
Dynamic metadata is a powerful feature of Envoy that allows you to attach arbitrary key-value pairs to requests and responses as they flow through the Envoy proxy. Gloo AI Gateway uses dynamic metadata to expose key metrics related to LLM provider usage. These metrics can be used to monitor and analyze the performance of your AI Gateway and the LLM providers that it interacts with.
The sections in this tutorial leverage the following dynamic metadata fields that are exposed by the AI Gateway, which are listed here for reference.
ai.gloo.solo.io:total_tokens
: The total number of tokens used in the request.ai.gloo.solo.io:prompt_tokens
: The number of tokens used in the prompt.ai.gloo.solo.io:completion_tokens
: The number of tokens used in the completion.envoy.ratelimit:hits_addend
: The number of tokens that were calculated to be rate limited.ai.gloo.solo.io:model
: The model which was specified by the user in the request.ai.gloo.solo.io:provider_model
: The model which the LLM provider used and returned in the response.ai.gloo.solo.io:provider
: The LLM provider being used, such asOpenAI
,Anthropic
, etc.ai.gloo.solo.io:streaming
: A boolean indicating whether the request was streamed.
Before you begin
Complete the Control access tutorial.
Access logging
Access logs, sometimes referred to as audit logs, represent all traffic requests that pass through the AI Gateway proxy. You can leverage the default Envoy access log collector to record logs for the AI Gateway. You can then review these logs to identify and troubleshoot issues as-needed, or scrape these logs to view them in your larger platform logging system. Auditors in your organization can use this information to better understand how users interact with your system, and to detect malicious activity or unusual amounts of requests to your gateway.
Define access logging configuration for the gateway in a
ListenerOptions
resource. This resource configures Envoy to log access logs tostdout
in JSON format by usingDYNAMIC_METADATA
fields that are specifically exposed for the AI Gateway.kubectl apply -f- <<EOF apiVersion: gateway.solo.io/v1 kind: ListenerOption metadata: name: log-provider namespace: gloo-system spec: targetRefs: - group: gateway.networking.k8s.io kind: Gateway name: ai-gateway options: accessLoggingService: accessLog: - fileSink: jsonFormat: http_method: '%REQ(:METHOD)%' path: '%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%' user: '%DYNAMIC_METADATA(envoy.filters.http.jwt_authn:principal:sub)%' team: '%DYNAMIC_METADATA(envoy.filters.http.jwt_authn:principal:team)%' request_id: '%REQ(X-REQUEST-ID)%' response_code: '%RESPONSE_CODE%' system_time: '%START_TIME%' target_duration: '%RESPONSE_DURATION%' upstream_name: '%UPSTREAM_CLUSTER%' total_tokens: '%DYNAMIC_METADATA(ai.gloo.solo.io:total_tokens)%' prompt_tokens: '%DYNAMIC_METADATA(ai.gloo.solo.io:prompt_tokens)%' completion_tokens: '%DYNAMIC_METADATA(ai.gloo.solo.io:completion_tokens)%' rate_limited_tokens: '%DYNAMIC_METADATA(envoy.ratelimit:hits_addend)%' streaming: '%DYNAMIC_METADATA(ai.gloo.solo.io:streaming)%' path: /dev/stdout EOF
Send a curl request with the JWT token for Alice to review access logs in action.
curl "$INGRESS_GW_ADDRESS:8080/openai" -H "Authorization: Bearer $ALICE_TOKEN" -H content-type:application/json -d '{ "model": "gpt-3.5-turbo", "messages": [ { "role": "user", "content": "Please explain the movie Dr. Strangelove in 1 sentence." } ] }'
After the request completes, view the access log for this request by getting the logs from the AI Gateway pod.
kubectl logs -n gloo-system deploy/gloo-proxy-ai-gateway | tail -1 | jq --sort-keys
Verify that a log for your request is returned and looks similar to the following. If a log with these fields doesn’t show up immediately, run the
kubectl logs
command again, as the logs are flushed asynchronously.{ "completion_tokens":22, "http_method":"POST", "path":"/v1/chat/completions", "prompt_tokens":21, "rate_limited_tokens":23, "request_id":"ee53553a-ca2f-4e49-b426-325d0cfc05f5", "response_code":200, "streaming":false, "system_time":"2025-01-02T17:32:53.596Z", "target_duration":544, "team":"dev", "total_tokens":43, "upstream_name":"openai_gloo-system", "user":"alice" }
To review a log for a streamed response, send a curl request that uses streaming with the JWT token for Bob.
curl "$INGRESS_GW_ADDRESS:8080/openai" -H "Authorization: Bearer $BOB_TOKEN" -H content-type:application/json -d '{ "model": "gpt-3.5-turbo", "messages": [ { "role": "user", "content": "Please explain the movie Dr. Strangelove in 1 sentence." } ], "stream_options": { "include_usage": true }, "stream": true }'
Check the most recent access log again.
kubectl logs -n gloo-system deploy/gloo-proxy-ai-gateway | tail -1 | jq --sort-keys
This time, the
streaming
field is recorded astrue
and theuser
isbob
, but all other token information is still available.{ "completion_tokens":40, "http_method":"POST", "path":"/v1/chat/completions", "prompt_tokens":21, "rate_limited_tokens":23, "request_id":"8fa7950a-e609-4e3b-83f1-2cd38cf3c591", "response_code":200, "streaming":true, "system_time":"2025-01-02T17:34:13.213Z", "target_duration":297, "team":"ops", "total_tokens":61, "upstream_name":"openai_gloo-system", "user":"bob" }
Metrics
While access logs are great for understanding individual requests, metrics are better for understanding the overall health and performance of your system. Gloo AI Gateway provides a rich set of metrics that help you monitor and analyze the performance of your AI Gateway and the LLM providers that it interacts with. In addition, you can add custom labels to these metrics to help you better understand the context of the requests.
Default metrics
Before you modify the labels, first take a look at the default metrics that the system outputs.
In another tab in your terminal, port-forward the
ai-gateway
container of the gateway proxy.kubectl port-forward -n gloo-system deploy/gloo-proxy-ai-gateway 9092
In the previous tab, run the following command to view the metrics.
curl localhost:9092/metrics
In the output, search for the
ai_completion_tokens_total
andai_prompt_tokens_total
metrics. These metrics total the number of tokens used in the prompt and completion for theopenai
modelgpt-3.5-turbo
.# HELP ai_completion_tokens_total Completion tokens # TYPE ai_completion_tokens_total counter ai_completion_tokens_total{llm="openai",model="gpt-3.5-turbo"} 539.0 ... # HELP ai_prompt_tokens_total Prompt tokens # TYPE ai_prompt_tokens_total counter ai_prompt_tokens_total{llm="openai",model="gpt-3.5-turbo"} 204.0
Stop port-forwarding the
ai-gateway
container.
Customized metrics
Default metrics are useful for gauging LLM usage overtime, but don’t help you understand usage by each team. You can add that context by creating custom labels.
To add custom labels to the metrics, update the
GatewayParameters
resource. In thestats.customLabels
section, add a list of labels that contain the name of the label and the dynamic metadata field to get the label from. In this resource, theteam
label sources from theteam
field of the JWT token. The metadata namespace defaults to the namespace where you defined the JWT provider, but you can specify a different namespace if you have a different source of metadata. When you apply this resource, the gateway proxy restarts to pick up the new stats configuration.kubectl apply -f- <<EOF apiVersion: gateway.gloo.solo.io/v1alpha1 kind: GatewayParameters metadata: name: gloo-gateway-override namespace: gloo-system spec: kube: aiExtension: enabled: true stats: customLabels: - name: "team" metadataKey: "principal:team" EOF
Send one request as Alice and one request as Bob to test out the metrics labels. Recall that Alice works on the
dev
team, and Bob works on theops
team.curl "$INGRESS_GW_ADDRESS:8080/openai" -H "Authorization: Bearer $ALICE_TOKEN" -H content-type:application/json -d '{ "model": "gpt-3.5-turbo", "messages": [ { "role": "user", "content": "Please explain the movie Dr. Strangelove in 1 sentence." } ] }' curl "$INGRESS_GW_ADDRESS:8080/openai" -H "Authorization: Bearer $BOB_TOKEN" -H content-type:application/json -d '{ "model": "gpt-3.5-turbo", "messages": [ { "role": "user", "content": "Please explain the movie Dr. Strangelove in 1 sentence." } ] }'
In the port-forwarding tab, run the port-forward command again for the
ai-gateway
container of the gateway proxy.kubectl port-forward -n gloo-system deploy/gloo-proxy-ai-gateway 9092
In the previous tab, view the metrics again.
curl localhost:9092/metrics
In the output, search for the
ai_completion_tokens_total
andai_prompt_tokens_total
metrics again, and verify that the token total metrics are now separated according to team.# HELP ai_completion_tokens_total Completion tokens # TYPE ai_completion_tokens_total counter ai_prompt_tokens_total{llm="openai",model="gpt-3.5-turbo",team="dev"} 21.0 ai_prompt_tokens_total{llm="openai",model="gpt-3.5-turbo",team="ops"} 21.0 ... # HELP ai_prompt_tokens_total Prompt tokens # TYPE ai_prompt_tokens_total counter ai_completion_tokens_total{llm="openai",model="gpt-3.5-turbo",team="dev"} 18.0 ai_completion_tokens_total{llm="openai",model="gpt-3.5-turbo",team="ops"} 30.0
Tracing
Tracing helps you follow the path of a request as it is forwarded through your system. You can use tracing data to identify bottlenecks, troubleshoot problems, and optimize performance. For this tutorial, you use the all-in-one deployment of the Jaeger open source tool, which runs all components of a tracing system that you need to get started. However, because the traces are formatted for OpenTelemetry, you can configure any system that supports OTel gRPC traces.
Note that the tracing functionality of the AI Gateway integrates seamlessly with existing tracing functionality in Gloo Gateway, so any existing tracing setups continue to work.
Add the Helm repo for Jaeger.
helm repo add jaegertracing https://jaegertracing.github.io/helm-charts helm repo update
Deploy Jaeger into the
observability
namespace.helm upgrade --install jaeger jaegertracing/jaeger \ -n observability \ --create-namespace \ -f - <<EOF provisionDataStore: cassandra: false allInOne: enabled: true storage: type: memory agent: enabled: false collector: enabled: false query: enabled: false EOF
Verify that the Jaeger all-in-one pod is running.
kubectl get pods -n observability
Example output:
NAME READY STATUS RESTARTS AGE jaeger-5d459f9f94-b4ckv 1/1 Running 0 26s
Update the
GatewayParameters
resource to configure the AI Gateway to send traces to the Jaeger collector.kubectl apply -f- <<EOF apiVersion: gateway.gloo.solo.io/v1alpha1 kind: GatewayParameters metadata: name: gloo-gateway-override namespace: gloo-system spec: kube: aiExtension: enabled: true stats: customLabels: - name: "team" metadataKey: "principal:team" tracing: insecure: true grpc: host: "jaeger-collector.observability.svc.cluster.local" port: 4317 EOF
To enrich the tracing data with more details, create an additional tracing configuration for Envoy.
- Create an Upstream resource that represents the Jaeger collector.
kubectl apply -f- <<EOF apiVersion: gloo.solo.io/v1 kind: Upstream metadata: labels: app: gloo name: jaeger namespace: gloo-system spec: useHttp2: true static: hosts: - addr: "jaeger-collector.observability.svc.cluster.local" port: 4317 EOF
- Create an
HttpListenerOption
resource that references the Gateway and Upstream. Notice that the tags use the same dynamic metadata values as the access logging and metric configurations in the previous sections.kubectl apply -f- <<EOF apiVersion: gateway.solo.io/v1 kind: HttpListenerOption metadata: name: log-provider namespace: gloo-system spec: targetRefs: - group: gateway.networking.k8s.io kind: Gateway name: ai-gateway options: httpConnectionManagerSettings: tracing: spawnUpstreamSpan: true metadataForTags: - tag: ai.model value: namespace: ai.gloo.solo.io key: model - tag: ai.provider_model value: namespace: ai.gloo.solo.io key: provider_model - tag: ai.streaming value: namespace: ai.gloo.solo.io key: streaming - tag: ai.prompt_tokens value: namespace: ai.gloo.solo.io key: prompt_tokens - tag: ai.completion_tokens value: namespace: ai.gloo.solo.io key: completion_tokens - tag: user.name value: namespace: envoy.filters.http.jwt_authn key: principal.name - tag: user.team value: namespace: envoy.filters.http.jwt_authn key: principal.team openTelemetryConfig: serviceName: gloo-proxy collectorUpstreamRef: namespace: gloo-system name: jaeger EOF
- Create an Upstream resource that represents the Jaeger collector.
Send one request as Alice and one request as Bob to test out the tracing configuration.
curl "$INGRESS_GW_ADDRESS:8080/openai" -H "Authorization: Bearer $ALICE_TOKEN" -H content-type:application/json -d '{ "model": "gpt-3.5-turbo", "messages": [ { "role": "user", "content": "Please explain the movie Dr. Strangelove in 1 sentence." } ] }' curl "$INGRESS_GW_ADDRESS:8080/openai" -H "Authorization: Bearer $BOB_TOKEN" -H content-type:application/json -d '{ "model": "gpt-3.5-turbo", "messages": [ { "role": "user", "content": "Please explain the movie Dr. Strangelove in 1 sentence." } ] }'
In the port-forwarding tab of your terminal, port-forward the Jaeger UI.
kubectl port-forward -n observability deployments/jaeger 16686
In your browser, navigate to http://localhost:16686 to view the Jaeger UI. Look for the traces for the non-streaming requests that you made, which look like the following:
Figure: Traces for requests to the AI Gateway in the Jaeger UI. Figure: Traces for requests to the AI Gateway in the Jaeger UI.
Next
You can now explore how to add further protection to your LLM provider by setting up rate limiting based on claims in a JWT token.
Note that if you do not want to complete the rate limiting tutorial and instead want to try a different tutorial, first clean up the JWT authentication resources that you created in the previous tutorial. If you do not remove these resources, you must include JWT tokens with the correct access in all subsequent curl requests to the AI API.
kubectl delete VirtualHostOption jwt-provider -n gloo-system
kubectl delete RouteOption openai-opt -n gloo-system