Skip to main content
Version: Latest

LlamaIndex Integration

Use AI Foundation Services with LlamaIndex for building RAG applications, indexing documents, and building chat engines.


Setup

pip install llama-index llama-index-llms-azure-openai llama-index-embeddings-openai

Initialize LLM

import os
from llama_index.llms.azure_openai import AzureOpenAI

llm = AzureOpenAI(
deployment_name="gpt-4o",
api_key=os.getenv("OPENAI_API_KEY"),
azure_endpoint=os.getenv("OPENAI_BASE_URL"),
api_version="2023-07-01-preview",
)

# Test
response_iter = llm.stream_complete("Tell me a joke.")
for response in response_iter:
print(response.delta, end="", flush=True)

Initialize Embeddings

import os
from llama_index.embeddings.openai import OpenAIEmbedding

embed_model = OpenAIEmbedding(
model_name="jina-embeddings-v2-base-de",
api_key=os.getenv("OPENAI_API_KEY"),
api_base=os.getenv("OPENAI_BASE_URL"),
)

# Test
query_embedding = embed_model.get_query_embedding("Hello world")
print(f"Embedding dimension: {len(query_embedding)}")

Simple RAG Example

1. Prepare Documents

mkdir example_data
# Place your PDF documents in the example_data directory
cp /path/to/your-documents.pdf example_data/

2. Index Documents

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter

documents = SimpleDirectoryReader(
input_dir="./example_data", filename_as_id=True
).load_data()

index = VectorStoreIndex.from_documents(
documents=documents,
transformations=[SentenceSplitter(chunk_size=512, chunk_overlap=20)],
embed_model=embed_model,
)

3. Create Chat Engine

from llama_index.core.postprocessor import LongContextReorder
from llama_index.core.memory import ChatMemoryBuffer

CONTEXT_PROMPT = """\
You are a helpful AI assistant. Answer based on the context provided.
If the context doesn't help, say: I can't find that in the given context.

Context:
{context_str}

Answer in the same language as the question.
"""

chat_engine = index.as_chat_engine(
llm=llm,
streaming=True,
chat_mode="context",
context_template=CONTEXT_PROMPT,
node_postprocessors=[LongContextReorder()],
memory=ChatMemoryBuffer.from_defaults(token_limit=6000),
similarity_top_k=10,
)

4. Ask Questions

response = chat_engine.stream_chat("How much revenue did Alphabet generate?")
for token in response.response_gen:
print(token, end="")

Example output:

According to the context, Alphabet generated $69,787 million in revenue
in the quarter ended March 31, 2023.

Next Steps

© Deutsche Telekom AG