Skip to main content
Version: Latest

API Endpoints

AI Foundation Services provides an OpenAI-compatible REST API. All endpoints use the base URL:

https://llm-server.llmhub.t-systems.net/v2

For the complete OpenAPI specification, see the interactive API docs (Redoc).


Endpoints

MethodEndpointDescription
GET/modelsList all available models
GET/models/{model_id}Get model details and metadata
POST/chat/completionsCreate a chat completion
POST/completionsCreate a text completion
POST/embeddingsCreate embeddings
POST/audio/transcriptionsTranscribe audio to text
POST/audio/translationsTranslate audio to English
GET/audio/modelsList available audio models
POST/images/generationsGenerate images from text
POST/responsesCreate a response (Responses API)
POST/fine_tuning/jobsCreate a fine-tuning job
GET/fine_tuning/jobsList fine-tuning jobs
POST/fine_tuning/jobs/{id}/cancelCancel a fine-tuning job
GET/fine_tuning/jobs/{id}/eventsList fine-tuning events
POST/filesUpload a file
GET/filesList uploaded files
DELETE/files/{id}Delete a file

Queue Endpoints (Asynchronous)

All major endpoints have /queue variants for asynchronous processing. See the Asynchronous Requests guide for details.

MethodEndpointDescription
POST/queue/chat/completionsAsync chat completion
POST/queue/completionsAsync text completion
POST/queue/embeddingsAsync embeddings
POST/queue/audio/transcriptionsAsync audio transcription
POST/queue/audio/translationsAsync audio translation
POST/queue/images/generationsAsync image generation
POST/queue/images/editsAsync image editing
GET/queue/modelsList models (queue)

API Versioning

The API uses versioned paths:

Path PrefixPurposeExamples
/v2Default — LLM inference, embeddings, audio, images/v2/chat/completions, /v2/embeddings
/v1Visual RAG, vector stores, file management/v1/vector_stores, /v1/files
/queueAsynchronous processing (mirrors /v2 endpoints)/queue/chat/completions
  • Use /v2 for all standard LLM API calls (this is the default when using OpenAI SDKs with our base URL)
  • Use /v1 for Visual RAG and file management operations
  • Use /queue for long-running or batch workloads

Authentication

All requests require an API key in the Authorization header:

Authorization: Bearer YOUR_API_KEY

See Authentication for setup details.


Request Format

All POST requests use JSON:

curl -X POST "https://llm-server.llmhub.t-systems.net/v2/chat/completions" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "Llama-3.3-70B-Instruct", "messages": [{"role": "user", "content": "Hello"}]}'

Response Format

Responses follow the OpenAI response format. For chat completions:

{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1710000000,
"model": "Llama-3.3-70B-Instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 9,
"total_tokens": 19
}
}
© Deutsche Telekom AG