Plans & Pricing

LLM Serving is offered in two service models. The Essential / Professional / Agentic plans below are the Shared LLM Serving Service — multi-tenant, pay-as-you-go, self-service via the T-Cloud Marketplace and the API Key self-serve portal, with best-effort RPM and TPM ceilings per model. For workloads that need contractual performance guarantees, private models, or single-tenant infrastructure, see Dedicated LLM Serving.

Service Description (PDF) Official, binding service description document.

Shared vs Dedicated LLM Serving Side-by-side comparison of both service models and when to pick which.

Our Rate Plans

Choose the plan that matches your workload. A higher monthly minimum commitment unlocks higher rate limits and more powerful models. Need a custom setup or on-premises hosting? Contact us!

Rate Plan	Minimum Monthly Commitment € / month	Input / Output Token Price € / M Tokens	RPM ^* Requests/Minute	Model Host	Model Tier
Essential	1,000 €	0.20 / 0.65 †	300 ^*	T-Cloud External	Standard
Professional	3,000 €	0.20 / 0.65 †	600 ^*	T-Cloud External	Standard Premium
Agentic	5,000 €	0.20 / 0.65 †	1,000 ^*	T-Cloud External	Standard Premium
Enterprise	Custom	0.20 / 0.65 †	Custom ^*	T-Cloud External	Standard Premium

† Example pricing for T-Cloud-hosted Large Language Models
Learn more about rate limits →

Our Models

Standard models are capable and built for speed - reliable across most use cases, without cutting corners. Premium models go further: complex reasoning, long contexts, and tasks where output quality is the priority.

Rate Plan Search

Server Location

Cloud Provider

Model Tier

Input Price

Output Price

Showing 0 of 0 models

Model Host	Cloud Provider	Model Tier	Creator	Model Name	Input Token Price M Tokens	Output Token Price M Tokens	Server Location	RPM Limit ^* Requests/Min	TPM ^* Tokens/Min	Context Window Tokens	Links

* EU/EEA includes countries with an adequacy decision by the EU Commission.

Looking for more than the shared service?

The plans above are the right starting point for most production workloads. Dedicated LLM Serving is the right choice when one or more of the following apply:

You serve many users with strict latency SLAs. Even the Agentic plan’s high RPM/TPM are best-effort on shared infrastructure — there is no contractual guarantee on time-to-first-token or end-to-end latency when other tenants drive the platform load.
Your workload is bursty. Short windows of very high demand that would otherwise hit the shared service’s rate-limit ceilings.
You need to host a private, fine-tuned, or custom model that isn’t in the shared catalog.
Your compliance posture requires single-tenant infrastructure.
You want a predictable cost structure — a fixed fee for the reserved hardware and the managed inference service, instead of per-token billing.

What Dedicated LLM Serving gives you

No RPM or TPM ceilings. Throughput is limited only by the GPU hardware you reserve, not by per-minute quotas.
Deterministic performance. No contention from other tenants. The same prompt produces the same latency profile run after run.
Negotiable performance SLAs. Unlike the shared service — whose SLA covers API availability only — Dedicated LLM Serving contracts can include SLAs on time-to-first-token, end-to-end latency, sustained throughput, 24/7 support, and stricter availability targets.
Any compatible model. Any model that runs on the shared T-Cloud catalog can be deployed on a dedicated instance, plus your own private or fine-tuned variants.

Dedicated LLM Serving Reserve GPU hardware exclusively for your contract — full details, advantages, and how to order.

Compare Shared vs Dedicated LLM Serving Side-by-side comparison of both service models — pricing model, SLAs, performance, and when to pick which.

Dedicated LLM Serving requires a tailored offer and is not available via the T-Cloud Marketplace. Contact the AIFS team at ai@t-systems.com for a quote based on your workload profile and SLA requirements.

Service note on the published RPM and TPM values

^* The Requests-per-Minute (RPM) and Tokens-per-Minute (TPM) figures shown in the tables above are best-effort, best-case ceilings for the Shared LLM Serving Service. They denote the maximum traffic permitted under the contract — not a guaranteed level of throughput. The following conditions apply:

These figures are not part of the Service Level Agreement (SLA). The SLA covers API availability only; it does not extend to throughput, end-to-end latency, or time-to-first-token.
Realised throughput, end-to-end latency, and time-to-first-token vary with platform load. Open-source models hosted on T-Cloud may operate below the published ceilings during periods of peak concurrent demand.
Customers requiring contractual guarantees on throughput, end-to-end latency, or time-to-first-token should consider Dedicated LLM Serving, where such commitments can be negotiated as performance SLAs on reserved hardware.

For workload-specific commitments, contact the AIFS team at ai@t-systems.com.