API Endpoints
AI Foundation Services provides an OpenAI-compatible REST API. All endpoints use the base URL:
https://llm-server.llmhub.t-systems.net/v2
For the complete OpenAPI specification, see the interactive API docs (Redoc).
Endpoints
| Method | Endpoint | Description |
|---|---|---|
GET | /models | List all available models |
GET | /models/{model_id} | Get model details and metadata |
POST | /chat/completions | Create a chat completion |
POST | /completions | Create a text completion |
POST | /embeddings | Create embeddings |
POST | /audio/transcriptions | Transcribe audio to text |
POST | /audio/translations | Translate audio to English |
GET | /audio/models | List available audio models |
POST | /images/generations | Generate images from text |
POST | /responses | Create a response (Responses API) |
POST | /fine_tuning/jobs | Create a fine-tuning job |
GET | /fine_tuning/jobs | List fine-tuning jobs |
POST | /fine_tuning/jobs/{id}/cancel | Cancel a fine-tuning job |
GET | /fine_tuning/jobs/{id}/events | List fine-tuning events |
POST | /files | Upload a file |
GET | /files | List uploaded files |
DELETE | /files/{id} | Delete a file |
Queue Endpoints (Asynchronous)
All major endpoints have /queue variants for asynchronous processing. See the Asynchronous Requests guide for details.
| Method | Endpoint | Description |
|---|---|---|
POST | /queue/chat/completions | Async chat completion |
POST | /queue/completions | Async text completion |
POST | /queue/embeddings | Async embeddings |
POST | /queue/audio/transcriptions | Async audio transcription |
POST | /queue/audio/translations | Async audio translation |
POST | /queue/images/generations | Async image generation |
POST | /queue/images/edits | Async image editing |
GET | /queue/models | List models (queue) |
API Versioning
The API uses versioned paths:
| Path Prefix | Purpose | Examples |
|---|---|---|
/v2 | Default — LLM inference, embeddings, audio, images | /v2/chat/completions, /v2/embeddings |
/v1 | Visual RAG, vector stores, file management | /v1/vector_stores, /v1/files |
/queue | Asynchronous processing (mirrors /v2 endpoints) | /queue/chat/completions |
- Use
/v2for all standard LLM API calls (this is the default when using OpenAI SDKs with our base URL) - Use
/v1for Visual RAG and file management operations - Use
/queuefor long-running or batch workloads
Authentication
All requests require an API key in the Authorization header:
Authorization: Bearer YOUR_API_KEY
See Authentication for setup details.
Request Format
All POST requests use JSON:
curl -X POST "https://llm-server.llmhub.t-systems.net/v2/chat/completions" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "Llama-3.3-70B-Instruct", "messages": [{"role": "user", "content": "Hello"}]}'
Response Format
Responses follow the OpenAI response format. For chat completions:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1710000000,
"model": "Llama-3.3-70B-Instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 9,
"total_tokens": 19
}
}