Asynchronous Requests (Queue API)
Submit long-running API requests to a queue for asynchronous processing. The Queue API mirrors the standard endpoints but processes requests in the background, making it ideal for batch workloads and large inference jobs.
Audio (Speech-to-Text)
AI Foundation Services provides Whisper-based audio models for transcription and translation, compatible with the OpenAI Audio API.
Chat Completions
The Chat Completions API is the primary way to interact with LLMs on AI Foundation Services. It's fully compatible with the OpenAI Chat API.
Embeddings
The Embedding API converts text into vector representations that capture semantic meaning. Use embeddings for semantic search, clustering, and RAG applications.
Fine-Tuning
Fine-tune models with your own data using techniques like LoRA and DPO with RLHF. The Fine-Tuning API is compatible with the OpenAI fine-tuning interface.
Function Calling
Function calling lets models generate structured JSON arguments for functions you define, enabling integration with external tools and APIs.
Image Generation
Generate images from text prompts using the OpenAI-compatible Images API.
Multimodal (Vision)
AI Foundation Services provides vision models that can analyze images alongside text. Use the same Chat Completions API with image content.
Reasoning Control
For models that support reasoning (chain-of-thought), you can control the depth of reasoning with the reasoning_effort parameter.
Streaming
Stream responses token-by-token using Server-Sent Events (SSE) for faster time-to-first-token and a better user experience in interactive applications.
Visual RAG
Visual RAG indexes your files in both text and image, combining the stability of text indexing with the flexibility of visual indexing. It can retrieve information from graphs, charts, and complex table layouts that text-based RAG systems miss.