Plans & Pricing
LLM Serving is offered in two service models. The Essential / Professional / Agentic plans below are the Shared LLM Serving Service — multi-tenant, pay-as-you-go, self-service via the T-Cloud Marketplace and the API Key self-serve portal, with best-effort RPM and TPM ceilings per model. For workloads that need contractual performance guarantees, private models, or single-tenant infrastructure, see Dedicated LLM Serving.
Our Rate Plans
Choose the plan that matches your workload. A higher monthly minimum commitment unlocks higher rate limits and more powerful models. Need a custom setup or on-premises hosting? Contact us!
| Rate Plan | Minimum Monthly Commitment € / month | Input / Output Token Price € / M Tokens | RPM * Requests/Minute | Model Host | Model Tier |
|---|---|---|---|---|---|
| Essential | 1,000 € | 0.20 / 0.65 † | 300 * | T-Cloud Hosted on Telekom's sovereign infrastructure. External Hosted by third-party providers like Microsoft Azure or Google Cloud Platform. | Standard Capable, fast, and reliable models for most tasks. |
| Professional | 3,000 € | 0.20 / 0.65 † | 600 * | T-Cloud Hosted on Telekom's sovereign infrastructure. External Hosted by third-party providers like Microsoft Azure or Google Cloud Platform. | Standard Capable, fast, and reliable models for most tasks. Premium Powerful, high-capability models for complex and quality-critical work. |
| Agentic | 5,000 € | 0.20 / 0.65 † | 1,000 * | T-Cloud Hosted on Telekom's sovereign infrastructure. External Hosted by third-party providers like Microsoft Azure or Google Cloud Platform. | Standard Capable, fast, and reliable models for most tasks. Premium Powerful, high-capability models for complex and quality-critical work. |
| Enterprise | Custom | 0.20 / 0.65 † | Custom * | T-Cloud Hosted on Telekom's sovereign infrastructure. External Hosted by third-party providers like Microsoft Azure or Google Cloud Platform. | Standard Capable, fast, and reliable models for most tasks. Premium Powerful, high-capability models for complex and quality-critical work. |
Learn more about rate limits →
Our Models
Standard models are capable and built for speed - reliable across most use cases, without cutting corners. Premium models go further: complex reasoning, long contexts, and tasks where output quality is the priority.
Looking for more than the shared service?
Section titled “Looking for more than the shared service?”The plans above are the right starting point for most production workloads. Dedicated LLM Serving is the right choice when one or more of the following apply:
- You serve many users with strict latency SLAs. Even the Agentic plan’s high RPM/TPM are best-effort on shared infrastructure — there is no contractual guarantee on time-to-first-token or end-to-end latency when other tenants drive the platform load.
- Your workload is bursty. Short windows of very high demand that would otherwise hit the shared service’s rate-limit ceilings.
- You need to host a private, fine-tuned, or custom model that isn’t in the shared catalog.
- Your compliance posture requires single-tenant infrastructure.
- You want a predictable cost structure — a fixed fee for the reserved hardware and the managed inference service, instead of per-token billing.
What Dedicated LLM Serving gives you
Section titled “What Dedicated LLM Serving gives you”- No RPM or TPM ceilings. Throughput is limited only by the GPU hardware you reserve, not by per-minute quotas.
- Deterministic performance. No contention from other tenants. The same prompt produces the same latency profile run after run.
- Negotiable performance SLAs. Unlike the shared service — whose SLA covers API availability only — Dedicated LLM Serving contracts can include SLAs on time-to-first-token, end-to-end latency, sustained throughput, 24/7 support, and stricter availability targets.
- Any compatible model. Any model that runs on the shared T-Cloud catalog can be deployed on a dedicated instance, plus your own private or fine-tuned variants.
Dedicated LLM Serving requires a tailored offer and is not available via the T-Cloud Marketplace. Contact the AIFS team at ai@t-systems.com for a quote based on your workload profile and SLA requirements.
Service note on the published RPM and TPM values
Section titled “Service note on the published RPM and TPM values”* The Requests-per-Minute (RPM) and Tokens-per-Minute (TPM) figures shown in the tables above are best-effort, best-case ceilings for the Shared LLM Serving Service. They denote the maximum traffic permitted under the contract — not a guaranteed level of throughput. The following conditions apply:
- These figures are not part of the Service Level Agreement (SLA). The SLA covers API availability only; it does not extend to throughput, end-to-end latency, or time-to-first-token.
- Realised throughput, end-to-end latency, and time-to-first-token vary with platform load. Open-source models hosted on T-Cloud may operate below the published ceilings during periods of peak concurrent demand.
- Customers requiring contractual guarantees on throughput, end-to-end latency, or time-to-first-token should consider Dedicated LLM Serving, where such commitments can be negotiated as performance SLAs on reserved hardware.
For workload-specific commitments, contact the AIFS team at ai@t-systems.com.