Skip to main content
Version: Latest

Multimodal (Vision)

AI Foundation Services provides vision models that can analyze images alongside text. Use the same Chat Completions API with image content.

Prerequisites

What you'll learn:

  • How to analyze images from URLs
  • How to send local images via base64 encoding
  • Which models support vision capabilities

Analyze an Image from URL

curl -X POST "$OPENAI_BASE_URL/chat/completions" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-2.5-flash",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "What is in this image?"},
{"type": "image_url", "image_url": {"url": "https://images.unsplash.com/photo-1546069901-ba9599a7e63c?w=400"}}
]
}
],
"max_tokens": 1024
}'

Analyze a Local Image (Base64)

You can also pass a local image as a base64-encoded string:

import base64
from openai import OpenAI

client = OpenAI()

def encode_image(image_path):
with open(image_path, "rb") as f:
return base64.b64encode(f.read()).decode("utf-8")

base64_image = encode_image("/path/to/your/image.jpg")

response = client.chat.completions.create(
model="Qwen3-VL-30B-A3B-Instruct-FP8",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{base64_image}"},
},
],
}
],
max_tokens=1000,
)

print(response.choices[0].message.content)

Available Vision Models

ModelProviderCapabilities
Qwen3-VL-30B-A3B-Instruct-FP8T-Cloud (Germany)Image understanding, OCR
gemini-2.5-flashGoogle CloudImage + video understanding
gpt-4.1AzureImage understanding

Check Available Models for the latest list.


Next Steps

  • Visual RAG — Index and retrieve from documents with text + image understanding
  • Function Calling — Connect models to external tools
  • Streaming — Stream responses for better UX
© Deutsche Telekom AG