Audio (Speech-to-Text)
AI Foundation Services provides Whisper-based audio models for transcription and translation, compatible with the OpenAI Audio API.
Prerequisites
- An API key (get one here)
- OpenAI SDK or HTTP client installed (Quickstart)
- An audio file (MP3, WAV, M4A, etc.)
What you'll learn:
- How to transcribe audio to text in the original language
- How to translate audio from any language to English
- Available audio models and parameters
List Audio Models
Audio models have model_type: "STT" in their metadata. Use the models endpoint and filter:
- Python
- curl
from openai import OpenAI
client = OpenAI()
models = client.models.list()
for model in models.data:
if model.meta_data.get("model_type") == "STT":
print(model.id)
curl "$OPENAI_BASE_URL/models" \
-H "Authorization: Bearer $OPENAI_API_KEY"
# Filter results for models with model_type "STT"
# Available audio models: whisper-large-v3, whisper-large-v3-turbo
Audio Transcription
The transcription API converts audio into text in the same language as the input. It auto-detects the language from the first 30 seconds if language is not specified.
- Python
- curl
from openai import OpenAI
client = OpenAI()
with open("/path/to/audio_file.mp3", "rb") as audio_file:
transcription = client.audio.transcriptions.create(
model="whisper-large-v3",
file=audio_file,
# language="en" # Optional: specify language
)
print(f"Transcription: {transcription.text}")
curl -X POST "$OPENAI_BASE_URL/audio/transcriptions" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-F "model=whisper-large-v3" \
-F "language=en" \
-F "temperature=0.0" \
-F "file=@/path/to/audio_file.mp3"
Example output:
Transcription: The stale smell of old beer lingers. It takes heat to bring out the odor.
A cold dip restores health and zest. A salt pickle tastes fine with ham.
Audio Translation
The translation API translates audio from any language into English.
- Python
- curl
from openai import OpenAI
client = OpenAI()
with open("/path/to/audio_file.mp3", "rb") as audio_file:
translation = client.audio.translations.create(
model="whisper-large-v3",
file=audio_file,
temperature=1.0,
)
print(f"Translation: {translation.text}")
curl -X POST "$OPENAI_BASE_URL/audio/translations" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-F "model=whisper-large-v3" \
-F "temperature=1.0" \
-F "file=@/path/to/audio_file.mp3"
Parameters
| Parameter | Type | Description |
|---|---|---|
model | string | Audio model ID (e.g., whisper-large-v3) |
file | file | The audio file to process |
language | string | Optional. ISO language code. Auto-detected if omitted. |
temperature | float | 0.0 for deterministic, higher for more varied output |
Key Features
- Auto Language Detection — Identifies input language from the first 30 seconds
- Customizable Output — Adjust behavior with
languageandtemperatureparameters - Efficient Processing — Low latency for both transcription and translation
Next Steps
- Chat Completions — Process transcribed text with LLMs
- Asynchronous Requests — Queue long audio files for async processing
- API Endpoints — Full endpoint reference