Skip to content

Plans & Pricing

LLM Serving is offered in two service models. The Essential / Professional / Agentic plans below are the Shared LLM Serving Service — multi-tenant, pay-as-you-go, self-service via the T-Cloud Marketplace and the API Key self-serve portal, with best-effort RPM and TPM ceilings per model. For workloads that need contractual performance guarantees, private models, or single-tenant infrastructure, see Dedicated LLM Serving.

Our Rate Plans

Choose the plan that matches your workload. A higher monthly minimum commitment unlocks higher rate limits and more powerful models. Need a custom setup or on-premises hosting? Contact us!

Rate Plan Minimum Monthly Commitment
€ / month
Input / Output Token Price
€ / M Tokens
RPM *
Requests/Minute
Model Host Model Tier
Essential 1,000 € 0.20 / 0.65 † 300 * T-Cloud Hosted on Telekom's sovereign infrastructure. External Hosted by third-party providers like Microsoft Azure or Google Cloud Platform. Standard Capable, fast, and reliable models for most tasks.
Professional 3,000 € 0.20 / 0.65 † 600 * T-Cloud Hosted on Telekom's sovereign infrastructure. External Hosted by third-party providers like Microsoft Azure or Google Cloud Platform. Standard Capable, fast, and reliable models for most tasks. Premium Powerful, high-capability models for complex and quality-critical work.
Agentic 5,000 € 0.20 / 0.65 † 1,000 * T-Cloud Hosted on Telekom's sovereign infrastructure. External Hosted by third-party providers like Microsoft Azure or Google Cloud Platform. Standard Capable, fast, and reliable models for most tasks. Premium Powerful, high-capability models for complex and quality-critical work.
Enterprise Custom 0.20 / 0.65 † Custom * T-Cloud Hosted on Telekom's sovereign infrastructure. External Hosted by third-party providers like Microsoft Azure or Google Cloud Platform. Standard Capable, fast, and reliable models for most tasks. Premium Powerful, high-capability models for complex and quality-critical work.
† Example pricing for T-Cloud-hosted Large Language Models
Learn more about rate limits →

Our Models

Standard models are capable and built for speed - reliable across most use cases, without cutting corners. Premium models go further: complex reasoning, long contexts, and tasks where output quality is the priority.

Server Location
Cloud Provider
Model Tier
Input Price
Output Price
Showing 0 of 0 models
Model Host Cloud Provider Model Tier Creator Model Name Input Token Price
M Tokens
Output Token Price
M Tokens
Server Location RPM Limit *
Requests/Min
TPM *
Tokens/Min
Context Window
Tokens
Links
* EU/EEA includes countries with an adequacy decision by the EU Commission.

The plans above are the right starting point for most production workloads. Dedicated LLM Serving is the right choice when one or more of the following apply:

  • You serve many users with strict latency SLAs. Even the Agentic plan’s high RPM/TPM are best-effort on shared infrastructure — there is no contractual guarantee on time-to-first-token or end-to-end latency when other tenants drive the platform load.
  • Your workload is bursty. Short windows of very high demand that would otherwise hit the shared service’s rate-limit ceilings.
  • You need to host a private, fine-tuned, or custom model that isn’t in the shared catalog.
  • Your compliance posture requires single-tenant infrastructure.
  • You want a predictable cost structure — a fixed fee for the reserved hardware and the managed inference service, instead of per-token billing.
  • No RPM or TPM ceilings. Throughput is limited only by the GPU hardware you reserve, not by per-minute quotas.
  • Deterministic performance. No contention from other tenants. The same prompt produces the same latency profile run after run.
  • Negotiable performance SLAs. Unlike the shared service — whose SLA covers API availability only — Dedicated LLM Serving contracts can include SLAs on time-to-first-token, end-to-end latency, sustained throughput, 24/7 support, and stricter availability targets.
  • Any compatible model. Any model that runs on the shared T-Cloud catalog can be deployed on a dedicated instance, plus your own private or fine-tuned variants.

Dedicated LLM Serving requires a tailored offer and is not available via the T-Cloud Marketplace. Contact the AIFS team at ai@t-systems.com for a quote based on your workload profile and SLA requirements.

Service note on the published RPM and TPM values

Section titled “Service note on the published RPM and TPM values”

* The Requests-per-Minute (RPM) and Tokens-per-Minute (TPM) figures shown in the tables above are best-effort, best-case ceilings for the Shared LLM Serving Service. They denote the maximum traffic permitted under the contract — not a guaranteed level of throughput. The following conditions apply:

  • These figures are not part of the Service Level Agreement (SLA). The SLA covers API availability only; it does not extend to throughput, end-to-end latency, or time-to-first-token.
  • Realised throughput, end-to-end latency, and time-to-first-token vary with platform load. Open-source models hosted on T-Cloud may operate below the published ceilings during periods of peak concurrent demand.
  • Customers requiring contractual guarantees on throughput, end-to-end latency, or time-to-first-token should consider Dedicated LLM Serving, where such commitments can be negotiated as performance SLAs on reserved hardware.

For workload-specific commitments, contact the AIFS team at ai@t-systems.com.