Version: Latest

API Endpoints

AI Foundation Services provides an OpenAI-compatible REST API. All endpoints use the base URL:

https://llm-server.llmhub.t-systems.net/v2

For the complete OpenAPI specification, see the interactive API docs (Redoc).

Endpoints

Method	Endpoint	Description
`GET`	`/models`	List all available models
`GET`	`/models/{model_id}`	Get model details and metadata
`POST`	`/chat/completions`	Create a chat completion
`POST`	`/completions`	Create a text completion
`POST`	`/embeddings`	Create embeddings
`POST`	`/audio/transcriptions`	Transcribe audio to text
`POST`	`/audio/translations`	Translate audio to English
`GET`	`/audio/models`	List available audio models
`POST`	`/images/generations`	Generate images from text
`POST`	`/responses`	Create a response (Responses API)
`POST`	`/fine_tuning/jobs`	Create a fine-tuning job
`GET`	`/fine_tuning/jobs`	List fine-tuning jobs
`POST`	`/fine_tuning/jobs/{id}/cancel`	Cancel a fine-tuning job
`GET`	`/fine_tuning/jobs/{id}/events`	List fine-tuning events
`POST`	`/files`	Upload a file
`GET`	`/files`	List uploaded files
`DELETE`	`/files/{id}`	Delete a file

Queue Endpoints (Asynchronous)

All major endpoints have /queue variants for asynchronous processing. See the Asynchronous Requests guide for details.

Method	Endpoint	Description
`POST`	`/queue/chat/completions`	Async chat completion
`POST`	`/queue/completions`	Async text completion
`POST`	`/queue/embeddings`	Async embeddings
`POST`	`/queue/audio/transcriptions`	Async audio transcription
`POST`	`/queue/audio/translations`	Async audio translation
`POST`	`/queue/images/generations`	Async image generation
`POST`	`/queue/images/edits`	Async image editing
`GET`	`/queue/models`	List models (queue)

API Versioning

The API uses versioned paths:

Path Prefix	Purpose	Examples
`/v2`	Default — LLM inference, embeddings, audio, images	`/v2/chat/completions`, `/v2/embeddings`
`/v1`	Visual RAG, vector stores, file management	`/v1/vector_stores`, `/v1/files`
`/queue`	Asynchronous processing (mirrors `/v2` endpoints)	`/queue/chat/completions`

Use /v2 for all standard LLM API calls (this is the default when using OpenAI SDKs with our base URL)
Use /v1 for Visual RAG and file management operations
Use /queue for long-running or batch workloads

Authentication

All requests require an API key in the Authorization header:

Authorization: Bearer YOUR_API_KEY

See Authentication for setup details.

Request Format

All POST requests use JSON:

curl -X POST "https://llm-server.llmhub.t-systems.net/v2/chat/completions" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "Llama-3.3-70B-Instruct", "messages": [{"role": "user", "content": "Hello"}]}'

Response Format

Responses follow the OpenAI response format. For chat completions:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1710000000,
  "model": "Llama-3.3-70B-Instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 9,
    "total_tokens": 19
  }
}

Endpoints​

Queue Endpoints (Asynchronous)​

API Versioning​

Authentication​

Request Format​

Response Format​