About Large Language Models (LLMs)

A Large Language Model (LLM) is a type of artificial intelligence (AI) model that is designed to understand, generate, and manipulate human language in a way that is both coherent and contextually relevant. In recent years, the number of LLM providers or open source LLM projects increased significantly, such as OpenAI, Llama2, and Mistral. These providers distribute LLMs in various ways, such as through APIs and cloud-based platforms.

Because the AI technology landscape is fragmented, access to LLMs can vary a lot. Developers must learn each AI API and AI platform, and implement provider-specific code so that the app can consume the AI services of each LLM provider. This redundancy can significantly decrease developer efficiency and make it difficult to scale and upgrade the app or integrate it with other platforms.

About Gloo AI Gateway

Gloo AI Gateway unleashes developer productivity and accelerates AI innovation by providing a unified API interface that developers can use to access and consume AI services from multiple LLM providers. Because the API is part of the gateway proxy, you can leverage and apply additional traffic management, security, and resiliency policies to the requests to your LLM provider. This set of policies allows you to centrally govern, secure, control, and audit access to your LLM providers.

Key capabilities

Learn more about the key capabilities of Gloo AI Gateway.

Centralized credential management

With Gloo AI Gateway, you can centrally secure and store the API keys for accessing your AI provider in a Kubernetes secret in the cluster. The gateway proxy uses these credentials to authenticate with the AI provider and consume AI services. To further secure access to the AI credentials, you use fine-grained RBAC controls.

Access control

Controlling access is crucial to prevent unauthorized access to your LLM provider, protect sensitive data, maintain model integrity, and audit trails. With Gloo AI Gateway, you can leverage security policies, such as access logging, JSON Web Tokens (JWT), or external auth to ensure that only authenticated and authorized users can access the AI API. For example, you can integrate a JWT token provider to authenticate users. In addition, you can extract claims from the JWT token to enforce fine-grained access controls and restrict access to the AI API based on claims. That way, you can ensure that access to the LLM provider is granted only if the user is allowed to use the LLM or is part of a specific role, group, and organization.

Prompt enrichment

Prompts are basic building blocks for guiding LLMs to produce relevant and accurate responses. By effectively managing both system prompts, which set initial guidelines, and user prompts, which provide specific context, you can significantly enhance the quality and coherence of the model’s outputs. Gloo AI Gateway allows you to pre-configure and refactor system and user prompts, extract common AI provider settings so that you can reuse them across requests, dynamically append or prepend prompts to where you need them, and overwrite default settings on a per-route level.

Prompt guards

Prompt guards are mechanisms that ensure that prompt-based interactions with a language model are secure, appropriate, and aligned with the intended use. These mechanisms help to filter, block, monitor, and control LLM inputs and outputs to filter offensive content, prevent misuse, and ensure ethical and responsible AI usage. With Gloo AI Gateway, you can set up prompt guards to block unwanted requests to the LLM provider and mask sensitive data.

Rate limiting

Rate limiting on LLM provider token usage is primarily related to cost management, security and service stability. LLM providers charge based on the number of input (user prompts and system prompts) and output (responses from the model) tokens, which can make uncontrolled usage very expensive. With Gloo AI Gateway, you can configure rate limiting based on LLM usage so that organizations can enforce budget constraints across groups, teams, departments, and individuals, and ensure that their usage remains within predictable bounds. That way, you can avoid unexpected costs and prevent malicious attacks to your LLM provider.