API Endpoints

Dieser Inhalt ist für v1.0.0. Geh zur neuesten Version, um die aktuellste Dokumentation zu bekommen.

Dieser Inhalt ist noch nicht in deiner Sprache verfügbar.

API Endpoints

AI Foundation Services provides an OpenAI-compatible REST API. All endpoints use the base URL:

https://llm-server.llmhub.t-systems.net/v2

For the complete OpenAPI specification, see the interactive API docs (Redoc).

Endpoints

Method	Endpoint	Description
`GET`	`/models`	List all available models
`GET`	`/models/{model_id}`	Get model details and metadata
`POST`	`/chat/completions`	Create a chat completion
`POST`	`/completions`	Create a text completion
`POST`	`/embeddings`	Create embeddings
`POST`	`/audio/transcriptions`	Transcribe audio to text
`POST`	`/audio/translations`	Translate audio to English
`GET`	`/audio/models`	List available audio models
`POST`	`/images/generations`	Generate images from text
`POST`	`/responses`	Create a response (Responses API)
`POST`	`/fine_tuning/jobs`	Create a fine-tuning job
`GET`	`/fine_tuning/jobs`	List fine-tuning jobs
`POST`	`/fine_tuning/jobs/{id}/cancel`	Cancel a fine-tuning job
`GET`	`/fine_tuning/jobs/{id}/events`	List fine-tuning events
`POST`	`/files`	Upload a file
`GET`	`/files`	List uploaded files
`DELETE`	`/files/{id}`	Delete a file

Queue Endpoints (Asynchronous)

All major endpoints have /queue variants for asynchronous processing. See the Asynchronous Requests guide for details.

Method	Endpoint	Description
`POST`	`/queue/chat/completions`	Async chat completion
`POST`	`/queue/completions`	Async text completion
`POST`	`/queue/embeddings`	Async embeddings
`POST`	`/queue/audio/transcriptions`	Async audio transcription
`POST`	`/queue/audio/translations`	Async audio translation
`POST`	`/queue/images/generations`	Async image generation
`POST`	`/queue/images/edits`	Async image editing
`GET`	`/queue/models`	List models (queue)

API Versioning

The API uses versioned paths:

Path Prefix	Purpose	Examples
`/v2`	Default — LLM inference, embeddings, audio, images	`/v2/chat/completions`, `/v2/embeddings`
`/v1`	Visual RAG, vector stores, file management	`/v1/vector_stores`, `/v1/files`
`/queue`	Asynchronous processing (mirrors `/v2` endpoints)	`/queue/chat/completions`

Use /v2 for all standard LLM API calls (this is the default when using OpenAI SDKs with our base URL)
Use /v1 for Visual RAG and file management operations
Use /queue for long-running or batch workloads

Authentication

All requests require an API key in the Authorization header:

Authorization: Bearer YOUR_API_KEY

See Authentication for setup details.

Request Format

All POST requests use JSON:

curl -X POST "https://llm-server.llmhub.t-systems.net/v2/chat/completions" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "Llama-3.3-70B-Instruct", "messages": [{"role": "user", "content": "Hello"}]}'

Response Format

Responses follow the OpenAI response format. For chat completions:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1710000000,
  "model": "Llama-3.3-70B-Instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 9,
    "total_tokens": 19
  }
}