Retrieval augmented generation (RAG)
Provide relevant context for LLM provider by retrieving data from one or more datasets.
About retrieval augmented generation (RAG)
Retrieval augmented generation (RAG) is a technique of providing relevant context by retrieving relevant data from one or more datasets and augmenting the prompt with the retrieved information. This approach helps LLMs to generate more accurate and relevant responses and to a certain point prevent hallucinations.
In the following tutorial, you configure the vector datastore used for RAG and see how it helps LLMs to generate more accurate responses.
Before you begin
Complete the Authenticate with API keys tutorial.
Set up a RAG datastore
Deploy a vector database that includes data and embeddings from a website that talks about French cheeses.
kubectl apply -f - <<EOF apiVersion: apps/v1 kind: Deployment metadata: name: vector-db labels: app: vector-db spec: replicas: 1 selector: matchLabels: app: vector-db template: metadata: labels: app: vector-db spec: containers: - name: db image: gcr.io/solo-public/docs/vector-db imagePullPolicy: IfNotPresent ports: - containerPort: 5432 env: - name: POSTGRES_DB value: gloo - name: POSTGRES_USER value: gloo - name: POSTGRES_PASSWORD value: gloo --- apiVersion: v1 kind: Service metadata: name: vector-db spec: selector: app: vector-db ports: - protocol: TCP port: 5432 targetPort: 5432 EOF
Send a request without using RAG. Note that the response is verbose and not as accurate as expected.
curl "$INGRESS_GW_ADDRESS:8080/openai" -H content-type:application/json -d '{ "model": "gpt-4o", "messages": [ { "role": "user", "content": "How many varieties of cheeses are in France?" } ] }'
Example output:
... "France is renowned for its rich cheese-making tradition, and the exact number of cheese varieties can vary depending on how one counts them. Generally, it is often cited that France boasts around 1,000 distinct varieties of cheese. This includes a wide range of types categorized by factors such as their region of origin, milk type (cow, goat, sheep), and production methods. Some of the most famous French cheeses include Brie, Camembert, Roquefort, and Comté, but the diversity extends far beyond these well-known examples." ...
Configure the OpenAI route to use the vector database for RAG. Add the
ai.rag.datastore.postgres
section to thespec.options
section of the RouteOption resource.kubectl apply -f - <<EOF apiVersion: gateway.solo.io/v1 kind: RouteOption metadata: name: openai-opt namespace: gloo-system spec: targetRefs: - group: gateway.networking.k8s.io kind: HTTPRoute name: openai options: ai: rag: datastore: postgres: connectionString: postgresql+psycopg://gloo:gloo@vector-db.default.svc.cluster.local:5432/gloo collectionName: default embedding: openai: authTokenRef: openai-secret EOF
Repeat the request and verify that the response is now concise and accurate. This time, Gloo AI Gateway uses the RAG options that you set up to automatically attach additional context to the query that improves the response.
curl "$INGRESS_GW_ADDRESS:8080/openai" -H content-type:application/json -d '{ "model": "gpt-4o", "messages": [ { "role": "user", "content": "How many varieties of cheeses are in France?" } ] }'
Example output:
... "France has between 1,000 and 1,600 varieties of cheese." ...
Next
Reduce the number of requests sent to the LLM provider, improve the response time, and reduce costs by using semantic caching.