Retrieval Augmented Generation - RAG
Retrieval Augmented Generation augments the AIFS Foundation models with knowledge from outside its training data, such as proprietary product information or documents provided by the users. AIFS-RAG automatically parses and chunks the documents, creates and stores the embeddings, and uses vector search to retrieve relevant content to answer user queries.
Quickstart
In this example, we'll create an assistant that can help answer questions about Open Telekom Cloud (OTC).
Step 1: Create a new session for the user Create a new session for user with some unique session identifier.
- Python
import requests
url = "http://localhost:8000/v3/session"
headers = {
'accept': 'application/json',
'Content-Type': 'application/json',
}
json_data = {
'userId': 'test_1',
'sessionId': 'session_1',
}
response = requests.post(url, headers=headers, json=json_data)
You could use any identity manager to authenticate user anget userID.
For better management, generate sessionID for each session create request.
Step 2: Upload files and add them to the session context
- Python
# Upload File
headers = {
'accept': 'application/json',
# requests won't add a boundary if this header is set when you pass files=
# 'Content-Type': 'multipart/form-data',
}
files = {
'file': ('Service.pdf', open('Service.pdf', 'rb'), 'application/pdf'),
}
response = requests.post(url, headers=headers, files=files)
# Add file to context / indexing file
headers = {
'accept': 'application/json',
'Content-Type': 'application/json',
}
json_data = {
'userId': 'test_1',
'sessionId': 'session_1',
'files': [
{
'name': 'Service.pdf',
'metadata': {
'Author': 'Open Telekom Cloud',
},
},
],
}
response = requests.post(url, headers=headers, json=json_data)
Step 3: Update user configuration to set model and RAG parameters
- Python
headers = {
'accept': 'application/json',
'Content-Type': 'application/json',
}
json_data = {
'userId': 'test_1',
'sessionId': 'session_1',
'modelSettings': {
'chatModel': 'Llama-2',
},
}
response = requests.post(url, headers=headers, json=json_data)
Step 4: Create a chat streaming run and check output
- Python
import requests
headers = {
'accept': 'application/json',
'Content-Type': 'application/json',
}
json_data = {
'userId': 'test_1',
'sessionId': 'session_1',
'prompt': 'What is OTC?',
}
response = requests.post(url, headers=headers, json=json_data)
How it works
The AIFS-RAG
tool implements several retrieval best practices out of the box to help you extract the right data from your files and augment the model's responses. The AIFS-RAG
tool:
- Runs semantic searches across both assistant and vector stores.
- Filters results based on similarity threshold and chunk similarity top k.
By default, the AIFS-RAG
tool uses the following settings:
- Chunk size: 1000 sentences
- Chunk overlap: 20 sentences
- Embedding model:
text-embedding-ada-002
at 1536 dimensions - Maximum number of chunks added to context: 5 (could be more)
Supported File Formats
.pdf
.msg