Skip to content

Audio (Speech-to-Text)

Whisper-family models for transcription and translation. Available via the POST /audio/transcriptions and POST /audio/translations endpoints.

Provider
Cloud
Input
Output
Streaming output
Tool calling
Plans
Showing 8 of 8 models
Token-by-token output via Server-Sent Events. Suitable for low-latency, real-time UI. Function / tool calling (OpenAI-compatible). The model can return structured tool invocations.
Whisper Large v3 OpenAI OpenAI Hosted on T-Cloud Public — Telekom's sovereign infrastructure in Germany. Audio Text 448 €14.61 €14.61 EssentialProfessionalAgentic
Whisper Large v3 Turbo OpenAI OpenAI Hosted on T-Cloud Public — Telekom's sovereign infrastructure in Germany. Audio Text 448 €9.21 €9.21 EssentialProfessionalAgentic
Gemini 2.5 Flash Google Google Hosted on Google Cloud Platform (EU regions). Text, Image, Audio Text 1M €0.27 €2.25 EssentialProfessionalAgentic
Gemini 2.5 Pro (>200k) Google Google Hosted on Google Cloud Platform (EU regions). Text, Image, Audio Text 1M €2.25 €13.50 EssentialProfessionalAgentic
Gemini 2.5 Pro (≤200k) Google Google Hosted on Google Cloud Platform (EU regions). Text, Image, Audio Text 200K €1.13 €9.00 EssentialProfessionalAgentic
Gemini 3 Flash Google Google Hosted on Google Cloud Platform (EU regions). Text, Image, Audio Text 1M €0.45 €2.70 EssentialProfessionalAgentic
Gemini 3 Pro (>200k) Google Google Hosted on Google Cloud Platform (EU regions). Text, Image, Audio Text 1M €3.60 €16.20 EssentialProfessionalAgentic
Gemini 3 Pro (≤200k) Google Google Hosted on Google Cloud Platform (EU regions). Text, Image, Audio Text 200K €1.80 €10.80 EssentialProfessionalAgentic