Zum Inhalt springen

Llama Index

Nutzen Sie AI Foundation Services mit LlamaIndex zum Erstellen von RAG-Anwendungen, Indizieren von Dokumenten und Aufbauen von Chat-Engines.

Einrichtung

pip install llama-index llama-index-llms-azure-openai llama-index-embeddings-openai

LLM initialisieren

import os

from llama_index.llms.azure_openai import AzureOpenAI

llm = AzureOpenAI(
    deployment_name="gpt-4o",
    api_key=os.getenv("OPENAI_API_KEY"),
    azure_endpoint=os.getenv("OPENAI_BASE_URL"),
    api_version="2023-07-01-preview",
)

# Test
response_iter = llm.stream_complete("Tell me a joke.")
for response in response_iter:
    print(response.delta, end="", flush=True)

Embeddings initialisieren

import os
from llama_index.embeddings.openai import OpenAIEmbedding

embed_model = OpenAIEmbedding(
    model_name="jina-embeddings-v2-base-de",
    api_key=os.getenv("OPENAI_API_KEY"),
    api_base=os.getenv("OPENAI_BASE_URL"),
)

# Test
query_embedding = embed_model.get_query_embedding("Hello world")
print(f"Embedding dimension: {len(query_embedding)}")

Einfaches RAG-Beispiel

1. Dokumente vorbereiten

mkdir example_data
# Place your PDF documents in the example_data directory
cp /path/to/your-documents.pdf example_data/

2. Dokumente indizieren

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter

documents = SimpleDirectoryReader(
    input_dir="./example_data", filename_as_id=True
).load_data()

index = VectorStoreIndex.from_documents(
    documents=documents,
    transformations=[SentenceSplitter(chunk_size=512, chunk_overlap=20)],
    embed_model=embed_model,
)

3. Chat-Engine erstellen

from llama_index.core.postprocessor import LongContextReorder
from llama_index.core.memory import ChatMemoryBuffer

CONTEXT_PROMPT = """\
You are a helpful AI assistant. Answer based on the context provided.
If the context doesn't help, say: I can't find that in the given context.

Context:
{context_str}

Answer in the same language as the question.
"""

chat_engine = index.as_chat_engine(
    llm=llm,
    streaming=True,
    chat_mode="context",
    context_template=CONTEXT_PROMPT,
    node_postprocessors=[LongContextReorder()],
    memory=ChatMemoryBuffer.from_defaults(token_limit=6000),
    similarity_top_k=10,
)

4. Fragen stellen

response = chat_engine.stream_chat("How much revenue did Alphabet generate?")
for token in response.response_gen:
    print(token, end="")

Beispielausgabe:

According to the context, Alphabet generated $69,787 million in revenue
in the quarter ended March 31, 2023.

Nächste Schritte

Embeddings-Leitfaden — Erfahren Sie mehr über Embedding-Modelle
LangChain — Alternatives RAG-Framework