Asynchronous Requests (Queue API)
Submit long-running API requests to a queue for asynchronous processing. The Queue API mirrors the standard endpoints but processes requests in the background, making it ideal for batch workloads and large inference jobs.
Prerequisites
- An API key (get one here)
- OpenAI SDK or HTTP client installed (Quickstart)
What you'll learn:
- How to submit requests to the Queue API
- When to use async vs. synchronous endpoints
- How to poll for and retrieve results
When to Use the Queue API
Use the /queue endpoints when:
- Long-running inference — Large models or complex prompts that may exceed synchronous timeout limits
- Batch processing — Submitting many requests without waiting for each to complete
- Background jobs — Fire-and-forget workloads where you process results later
- Rate limit management — Spreading load across time instead of bursting
For interactive or real-time use cases (chatbots, streaming), use the standard /v2 endpoints instead.
Endpoint Mapping
Every standard endpoint has a /queue equivalent:
| Standard Endpoint | Queue Endpoint | Description |
|---|---|---|
POST /v2/chat/completions | POST /queue/chat/completions | Chat completion |
POST /v2/completions | POST /queue/completions | Text completion |
POST /v2/embeddings | POST /queue/embeddings | Embeddings |
POST /v2/audio/transcriptions | POST /queue/audio/transcriptions | Audio transcription |
POST /v2/audio/translations | POST /queue/audio/translations | Audio translation |
POST /v2/images/generations | POST /queue/images/generations | Image generation |
POST /v2/images/edits | POST /queue/images/edits | Image editing |
GET /v2/models | GET /queue/models | List models |
The request body for each queue endpoint is identical to its standard counterpart — just change the base path from /v2 to /queue.
Basic Usage
Submit a chat completion to the queue:
- curl
- Python
- Node.js
curl -X POST "https://llm-server.llmhub.t-systems.net/queue/chat/completions" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "Llama-3.3-70B-Instruct",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Write a detailed analysis of renewable energy trends in Europe."}
],
"max_tokens": 2000
}'
from openai import OpenAI
client = OpenAI(
base_url="https://llm-server.llmhub.t-systems.net/queue",
)
response = client.chat.completions.create(
model="Llama-3.3-70B-Instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Write a detailed analysis of renewable energy trends in Europe."},
],
max_tokens=2000,
)
print(response.choices[0].message.content)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://llm-server.llmhub.t-systems.net/queue",
});
const response = await client.chat.completions.create({
model: "Llama-3.3-70B-Instruct",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Write a detailed analysis of renewable energy trends in Europe." },
],
max_tokens: 2000,
});
console.log(response.choices[0].message.content);
Batch Processing Example
Process multiple prompts efficiently using the queue:
- Python
- Node.js
import asyncio
from openai import AsyncOpenAI
client = AsyncOpenAI(
base_url="https://llm-server.llmhub.t-systems.net/queue",
)
prompts = [
"Summarize the benefits of solar energy.",
"Explain how wind turbines generate electricity.",
"Describe the future of hydrogen fuel cells.",
]
async def process_prompt(prompt):
response = await client.chat.completions.create(
model="Llama-3.3-70B-Instruct",
messages=[{"role": "user", "content": prompt}],
max_tokens=500,
)
return response.choices[0].message.content
async def main():
tasks = [process_prompt(p) for p in prompts]
results = await asyncio.gather(*tasks)
for prompt, result in zip(prompts, results):
print(f"Prompt: {prompt[:50]}...")
print(f"Result: {result[:100]}...\n")
asyncio.run(main())
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://llm-server.llmhub.t-systems.net/queue",
});
const prompts = [
"Summarize the benefits of solar energy.",
"Explain how wind turbines generate electricity.",
"Describe the future of hydrogen fuel cells.",
];
const results = await Promise.all(
prompts.map((prompt) =>
client.chat.completions.create({
model: "Llama-3.3-70B-Instruct",
messages: [{ role: "user", content: prompt }],
max_tokens: 500,
})
)
);
results.forEach((result, i) => {
console.log(`Prompt: ${prompts[i]}`);
console.log(`Result: ${result.choices[0].message.content}\n`);
});
Queue Endpoints for Other APIs
Embeddings
from openai import OpenAI
client = OpenAI(
base_url="https://llm-server.llmhub.t-systems.net/queue",
)
response = client.embeddings.create(
model="text-embedding-bge-m3",
input="The benefits of renewable energy in Europe",
)
print(f"Embedding dimension: {len(response.data[0].embedding)}")
Audio Transcription
from openai import OpenAI
client = OpenAI(
base_url="https://llm-server.llmhub.t-systems.net/queue",
)
with open("meeting_recording.mp3", "rb") as audio_file:
transcript = client.audio.transcriptions.create(
model="whisper-large-v3",
file=audio_file,
)
print(transcript.text)
Image Generation
from openai import OpenAI
client = OpenAI(
base_url="https://llm-server.llmhub.t-systems.net/queue",
)
result = client.images.generate(
model="gpt-image-1",
prompt="A futuristic data center powered by renewable energy",
)
print(result.data[0].b64_json[:50] + "...")
Next Steps
- Chat Completions — Standard synchronous chat API
- Streaming — Real-time token-by-token responses
- Rate Limits — Understand TPM/RPM limits and headers
- API Endpoints — Full endpoint reference