Zum Inhalt springen

API Endpoints

Dieser Inhalt ist für v1.0.0. Geh zur neuesten Version, um die aktuellste Dokumentation zu bekommen.

Dieser Inhalt ist noch nicht in deiner Sprache verfügbar.

AI Foundation Services provides an OpenAI-compatible REST API. All endpoints use the base URL:

https://llm-server.llmhub.t-systems.net/v2

For the complete OpenAPI specification, see the interactive API docs (Redoc).


MethodEndpointDescription
GET/modelsList all available models
GET/models/{model_id}Get model details and metadata
POST/chat/completionsCreate a chat completion
POST/completionsCreate a text completion
POST/embeddingsCreate embeddings
POST/audio/transcriptionsTranscribe audio to text
POST/audio/translationsTranslate audio to English
GET/audio/modelsList available audio models
POST/images/generationsGenerate images from text
POST/responsesCreate a response (Responses API)
POST/fine_tuning/jobsCreate a fine-tuning job
GET/fine_tuning/jobsList fine-tuning jobs
POST/fine_tuning/jobs/{id}/cancelCancel a fine-tuning job
GET/fine_tuning/jobs/{id}/eventsList fine-tuning events
POST/filesUpload a file
GET/filesList uploaded files
DELETE/files/{id}Delete a file

All major endpoints have /queue variants for asynchronous processing. See the Asynchronous Requests guide for details.

MethodEndpointDescription
POST/queue/chat/completionsAsync chat completion
POST/queue/completionsAsync text completion
POST/queue/embeddingsAsync embeddings
POST/queue/audio/transcriptionsAsync audio transcription
POST/queue/audio/translationsAsync audio translation
POST/queue/images/generationsAsync image generation
POST/queue/images/editsAsync image editing
GET/queue/modelsList models (queue)

The API uses versioned paths:

Path PrefixPurposeExamples
/v2Default — LLM inference, embeddings, audio, images/v2/chat/completions, /v2/embeddings
/v1Visual RAG, vector stores, file management/v1/vector_stores, /v1/files
/queueAsynchronous processing (mirrors /v2 endpoints)/queue/chat/completions
  • Use /v2 for all standard LLM API calls (this is the default when using OpenAI SDKs with our base URL)
  • Use /v1 for Visual RAG and file management operations
  • Use /queue for long-running or batch workloads

All requests require an API key in the Authorization header:

Authorization: Bearer YOUR_API_KEY

See Authentication for setup details.


All POST requests use JSON:

Terminal window
curl -X POST "https://llm-server.llmhub.t-systems.net/v2/chat/completions" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "Llama-3.3-70B-Instruct", "messages": [{"role": "user", "content": "Hello"}]}'

Responses follow the OpenAI response format. For chat completions:

{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1710000000,
"model": "Llama-3.3-70B-Instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 9,
"total_tokens": 19
}
}