About streaming

Models return a response in two main ways: all at once in a single chunk, or in a stream of chunks.

Click through the following tabs to see the request flows for each.

Streaming benefits

Streaming is useful for:

  • Large responses that take a long time to generate. This way, you avoid a lag that could impact the user experience or even trigger a timeout that interrupts the response generation process.
  • Responses that are better received in smaller chunks, such as logging to troubleshoot or diagnose issues later.
  • Interactive, chat-style applications where you want to see the response in real time.

With Gloo AI Gateway, you can still apply policies to your streaming responses, such as prompt guards, JWT auth, and rate limiting.

Provider differences

The streaming process differs for each LLM provider.

Streaming example

The following steps show how to stream a response from OpenAI.

Before you begin

  1. Complete the Set up Gloo AI Gateway tutorial.
  2. Complete the Authenticate with API keys tutorial, which includes creating a secret with your OpenAI credentials and an HTTPRoute that routes requests to the OpenAI provider.
  3. Save the external address of the AI Gateway in an environment variable.

Stream a response from OpenAI

  1. Send a request to the OpenAI provider that includes the streaming parameter "stream": "true". For other providers, see Provider differences.

      curl "$INGRESS_GW_ADDRESS:8080/openai" -H content-type:application/json  -d '{
       "stream": true,
       "model": "gpt-3.5-turbo",
       "messages": [
         {
           "role": "system",
           "content": "You are a skilled developer who is good at explaining basic programming concepts to beginners."
         },
         {
           "role": "user",
           "content": "In a couple words, tell me what I should call my first GitHub repo."
         }
       ]
     }'
      
  2. In the output, verify that the request succeeds and that you get back a streamed response from the chat completion API.

      data: {"id":"chatcmpl-BKq9oMHlT4t6jvtIZ3BA28JIfbZg3","object":"chat.completion.chunk","created":1744306752,"model":"gpt-3.5-turbo-0125","service_tier":"default","system_fingerprint":null,"choices":[{"index":0,"delta":{"role":"assistant","content":"","refusal":null},"logprobs":null,"finish_reason":null}]}
    
    data: {"id":"chatcmpl-BKq9oMHlT4t6jvtIZ3BA28JIfbZg3","object":"chat.completion.chunk","created":1744306752,"model":"gpt-3.5-turbo-0125","service_tier":"default","system_fingerprint":null,"choices":[{"index":0,"delta":{"content":"You"},"logprobs":null,"finish_reason":null}]}
    
    data: {"id":"chatcmpl-BKq9oMHlT4t6jvtIZ3BA28JIfbZg3","object":"chat.completion.chunk","created":1744306752,"model":"gpt-3.5-turbo-0125","service_tier":"default","system_fingerprint":null,"choices":[{"index":0,"delta":{"content":" can"},"logprobs":null,"finish_reason":null}]}
    
    data: {"id":"chatcmpl-BKq9oMHlT4t6jvtIZ3BA28JIfbZg3","object":"chat.completion.chunk","created":1744306752,"model":"gpt-3.5-turbo-0125","service_tier":"default","system_fingerprint":null,"choices":[{"index":0,"delta":{"content":" call"},"logprobs":null,"finish_reason":null}]}
    
    data: {"id":"chatcmpl-BKq9oMHlT4t6jvtIZ3BA28JIfbZg3","object":"chat.completion.chunk","created":1744306752,"model":"gpt-3.5-turbo-0125","service_tier":"default","system_fingerprint":null,"choices":[{"index":0,"delta":{"content":" it"},"logprobs":null,"finish_reason":null}]}
    
    data: {"id":"chatcmpl-BKq9oMHlT4t6jvtIZ3BA28JIfbZg3","object":"chat.completion.chunk","created":1744306752,"model":"gpt-3.5-turbo-0125","service_tier":"default","system_fingerprint":null,"choices":[{"index":0,"delta":{"content":" something"},"logprobs":null,"finish_reason":null}]}
    
    data: {"id":"chatcmpl-BKq9oMHlT4t6jvtIZ3BA28JIfbZg3","object":"chat.completion.chunk","created":1744306752,"model":"gpt-3.5-turbo-0125","service_tier":"default","system_fingerprint":null,"choices":[{"index":0,"delta":{"content":" simple"},"logprobs":null,"finish_reason":null}]}
    
    data: {"id":"chatcmpl-BKq9oMHlT4t6jvtIZ3BA28JIfbZg3","object":"chat.completion.chunk","created":1744306752,"model":"gpt-3.5-turbo-0125","service_tier":"default","system_fingerprint":null,"choices":[{"index":0,"delta":{"content":" and"},"logprobs":null,"finish_reason":null}]}
    
    data: {"id":"chatcmpl-BKq9oMHlT4t6jvtIZ3BA28JIfbZg3","object":"chat.completion.chunk","created":1744306752,"model":"gpt-3.5-turbo-0125","service_tier":"default","system_fingerprint":null,"choices":[{"index":0,"delta":{"content":" descriptive"},"logprobs":null,"finish_reason":null}]}
    
    data: {"id":"chatcmpl-BKq9oMHlT4t6jvtIZ3BA28JIfbZg3","object":"chat.completion.chunk","created":1744306752,"model":"gpt-3.5-turbo-0125","service_tier":"default","system_fingerprint":null,"choices":[{"index":0,"delta":{"content":" like"},"logprobs":null,"finish_reason":null}]}
    
    data: {"id":"chatcmpl-BKq9oMHlT4t6jvtIZ3BA28JIfbZg3","object":"chat.completion.chunk","created":1744306752,"model":"gpt-3.5-turbo-0125","service_tier":"default","system_fingerprint":null,"choices":[{"index":0,"delta":{"content":" \""},"logprobs":null,"finish_reason":null}]}
    
    data: {"id":"chatcmpl-BKq9oMHlT4t6jvtIZ3BA28JIfbZg3","object":"chat.completion.chunk","created":1744306752,"model":"gpt-3.5-turbo-0125","service_tier":"default","system_fingerprint":null,"choices":[{"index":0,"delta":{"content":"my"},"logprobs":null,"finish_reason":null}]}
    
    data: {"id":"chatcmpl-BKq9oMHlT4t6jvtIZ3BA28JIfbZg3","object":"chat.completion.chunk","created":1744306752,"model":"gpt-3.5-turbo-0125","service_tier":"default","system_fingerprint":null,"choices":[{"index":0,"delta":{"content":"-first"},"logprobs":null,"finish_reason":null}]}
    
    data: {"id":"chatcmpl-BKq9oMHlT4t6jvtIZ3BA28JIfbZg3","object":"chat.completion.chunk","created":1744306752,"model":"gpt-3.5-turbo-0125","service_tier":"default","system_fingerprint":null,"choices":[{"index":0,"delta":{"content":"-re"},"logprobs":null,"finish_reason":null}]}
    
    data: {"id":"chatcmpl-BKq9oMHlT4t6jvtIZ3BA28JIfbZg3","object":"chat.completion.chunk","created":1744306752,"model":"gpt-3.5-turbo-0125","service_tier":"default","system_fingerprint":null,"choices":[{"index":0,"delta":{"content":"po"},"logprobs":null,"finish_reason":null}]}
    
    data: {"id":"chatcmpl-BKq9oMHlT4t6jvtIZ3BA28JIfbZg3","object":"chat.completion.chunk","created":1744306752,"model":"gpt-3.5-turbo-0125","service_tier":"default","system_fingerprint":null,"choices":[{"index":0,"delta":{"content":"\""},"logprobs":null,"finish_reason":null}]}
    
    data: {"id":"chatcmpl-BKq9oMHlT4t6jvtIZ3BA28JIfbZg3","object":"chat.completion.chunk","created":1744306752,"model":"gpt-3.5-turbo-0125","service_tier":"default","system_fingerprint":null,"choices":[{"index":0,"delta":{"content":" or"},"logprobs":null,"finish_reason":null}]}
    
    data: {"id":"chatcmpl-BKq9oMHlT4t6jvtIZ3BA28JIfbZg3","object":"chat.completion.chunk","created":1744306752,"model":"gpt-3.5-turbo-0125","service_tier":"default","system_fingerprint":null,"choices":[{"index":0,"delta":{"content":" \""},"logprobs":null,"finish_reason":null}]}
    
    data: {"id":"chatcmpl-BKq9oMHlT4t6jvtIZ3BA28JIfbZg3","object":"chat.completion.chunk","created":1744306752,"model":"gpt-3.5-turbo-0125","service_tier":"default","system_fingerprint":null,"choices":[{"index":0,"delta":{"content":"hello"},"logprobs":null,"finish_reason":null}]}
    
    data: {"id":"chatcmpl-BKq9oMHlT4t6jvtIZ3BA28JIfbZg3","object":"chat.completion.chunk","created":1744306752,"model":"gpt-3.5-turbo-0125","service_tier":"default","system_fingerprint":null,"choices":[{"index":0,"delta":{"content":"-world"},"logprobs":null,"finish_reason":null}]}
    
    data: {"id":"chatcmpl-BKq9oMHlT4t6jvtIZ3BA28JIfbZg3","object":"chat.completion.chunk","created":1744306752,"model":"gpt-3.5-turbo-0125","service_tier":"default","system_fingerprint":null,"choices":[{"index":0,"delta":{"content":"\"."},"logprobs":null,"finish_reason":null}]}
    
    data: {"id":"chatcmpl-BKq9oMHlT4t6jvtIZ3BA28JIfbZg3","object":"chat.completion.chunk","created":1744306752,"model":"gpt-3.5-turbo-0125","service_tier":"default","system_fingerprint":null,"choices":[{"index":0,"delta":{},"logprobs":null,"finish_reason":"stop"}]}
    
    data: [DONE]
      

    If you string together the {"content":...} chunks, you get the complete response: "You can call it something simple and descriptive like "my-first-repo" or "hello-world".

    The [DONE] message indicates that the streaming process is complete.