Asynchronous Requests (Queue API)

Dieser Inhalt ist für v1.0.0. Geh zur neuesten Version, um die aktuellste Dokumentation zu bekommen.

Dieser Inhalt ist noch nicht in deiner Sprache verfügbar.

Asynchronous Requests (Queue API)

Submit long-running API requests to a queue for asynchronous processing. The Queue API mirrors the standard endpoints but processes requests in the background, making it ideal for batch workloads and large inference jobs.

What you’ll learn:

How to submit requests to the Queue API
When to use async vs. synchronous endpoints
How to poll for and retrieve results

When to Use the Queue API

Use the /queue endpoints when:

Long-running inference — Large models or complex prompts that may exceed synchronous timeout limits
Batch processing — Submitting many requests without waiting for each to complete
Background jobs — Fire-and-forget workloads where you process results later
Rate limit management — Spreading load across time instead of bursting

For interactive or real-time use cases (chatbots, streaming), use the standard /v2 endpoints instead.

Endpoint Mapping

Every standard endpoint has a /queue equivalent:

Standard Endpoint	Queue Endpoint	Description
`POST /v2/chat/completions`	`POST /queue/chat/completions`	Chat completion
`POST /v2/completions`	`POST /queue/completions`	Text completion
`POST /v2/embeddings`	`POST /queue/embeddings`	Embeddings
`POST /v2/audio/transcriptions`	`POST /queue/audio/transcriptions`	Audio transcription
`POST /v2/audio/translations`	`POST /queue/audio/translations`	Audio translation
`POST /v2/images/generations`	`POST /queue/images/generations`	Image generation
`POST /v2/images/edits`	`POST /queue/images/edits`	Image editing
`GET /v2/models`	`GET /queue/models`	List models

The request body for each queue endpoint is identical to its standard counterpart — just change the base path from /v2 to /queue.

Basic Usage

Submit a chat completion to the queue:

curl -X POST "https://llm-server.llmhub.t-systems.net/queue/chat/completions" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Llama-3.3-70B-Instruct",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Write a detailed analysis of renewable energy trends in Europe."}
    ],
    "max_tokens": 2000
  }'

from openai import OpenAI

client = OpenAI(
    base_url="https://llm-server.llmhub.t-systems.net/queue",
)

response = client.chat.completions.create(
    model="Llama-3.3-70B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Write a detailed analysis of renewable energy trends in Europe."},
    ],
    max_tokens=2000,
)

print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://llm-server.llmhub.t-systems.net/queue",
});

const response = await client.chat.completions.create({
  model: "Llama-3.3-70B-Instruct",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Write a detailed analysis of renewable energy trends in Europe." },
  ],
  max_tokens: 2000,
});

console.log(response.choices[0].message.content);

Batch Processing Example

Process multiple prompts efficiently using the queue:

Python
Node.js

import asyncio
from openai import AsyncOpenAI

client = AsyncOpenAI(
    base_url="https://llm-server.llmhub.t-systems.net/queue",
)

prompts = [
    "Summarize the benefits of solar energy.",
    "Explain how wind turbines generate electricity.",
    "Describe the future of hydrogen fuel cells.",
]

async def process_prompt(prompt):
    response = await client.chat.completions.create(
        model="Llama-3.3-70B-Instruct",
        messages=[{"role": "user", "content": prompt}],
        max_tokens=500,
    )
    return response.choices[0].message.content

async def main():
    tasks = [process_prompt(p) for p in prompts]
    results = await asyncio.gather(*tasks)
    for prompt, result in zip(prompts, results):
        print(f"Prompt: {prompt[:50]}...")
        print(f"Result: {result[:100]}...\n")

asyncio.run(main())

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://llm-server.llmhub.t-systems.net/queue",
});

const prompts = [
  "Summarize the benefits of solar energy.",
  "Explain how wind turbines generate electricity.",
  "Describe the future of hydrogen fuel cells.",
];

const results = await Promise.all(
  prompts.map((prompt) =>
    client.chat.completions.create({
      model: "Llama-3.3-70B-Instruct",
      messages: [{ role: "user", content: prompt }],
      max_tokens: 500,
    })
  )
);

results.forEach((result, i) => {
  console.log(`Prompt: ${prompts[i]}`);
  console.log(`Result: ${result.choices[0].message.content}\n`);
});

Queue Endpoints for Other APIs

Embeddings

from openai import OpenAI

client = OpenAI(
    base_url="https://llm-server.llmhub.t-systems.net/queue",
)

response = client.embeddings.create(
    model="text-embedding-bge-m3",
    input="The benefits of renewable energy in Europe",
)

print(f"Embedding dimension: {len(response.data[0].embedding)}")

Audio Transcription

from openai import OpenAI

client = OpenAI(
    base_url="https://llm-server.llmhub.t-systems.net/queue",
)

with open("meeting_recording.mp3", "rb") as audio_file:
    transcript = client.audio.transcriptions.create(
        model="whisper-large-v3",
        file=audio_file,
    )

print(transcript.text)

Image Generation

from openai import OpenAI

client = OpenAI(
    base_url="https://llm-server.llmhub.t-systems.net/queue",
)

result = client.images.generate(
    model="gpt-image-1",
    prompt="A futuristic data center powered by renewable energy",
)

print(result.data[0].b64_json[:50] + "...")

Next Steps

Chat Completions — Standard synchronous chat API
Streaming — Real-time token-by-token responses
Rate Limits — Understand TPM/RPM limits and headers
API Endpoints — Full endpoint reference