Multimodal (Vision)

Dieser Inhalt ist für v1.0.0. Geh zur neuesten Version, um die aktuellste Dokumentation zu bekommen.

Dieser Inhalt ist noch nicht in deiner Sprache verfügbar.

Multimodal (Vision)

AI Foundation Services provides vision models that can analyze images alongside text. Use the same Chat Completions API with image content.

What you’ll learn:

How to analyze images from URLs
How to send local images via base64 encoding
Which models support vision capabilities

Analyze an Image from URL

curl -X POST "$OPENAI_BASE_URL/chat/completions" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-2.5-flash",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "What is in this image?"},
          {"type": "image_url", "image_url": {"url": "https://images.unsplash.com/photo-1546069901-ba9599a7e63c?w=400"}}
        ]
      }
    ],
    "max_tokens": 1024
  }'

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gemini-2.5-flash",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://images.unsplash.com/photo-1546069901-ba9599a7e63c?w=400"
                    },
                },
            ],
        }
    ],
    max_tokens=1024,
)

print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI();

const response = await client.chat.completions.create({
  model: "gemini-2.5-flash",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "What's in this image?" },
        {
          type: "image_url",
          image_url: {
            url: "https://images.unsplash.com/photo-1546069901-ba9599a7e63c?w=400",
          },
        },
      ],
    },
  ],
  max_tokens: 1024,
});

console.log(response.choices[0].message.content);

Analyze a Local Image (Base64)

You can also pass a local image as a base64-encoded string:

import base64
from openai import OpenAI

client = OpenAI()

def encode_image(image_path):
    with open(image_path, "rb") as f:
        return base64.b64encode(f.read()).decode("utf-8")

base64_image = encode_image("/path/to/your/image.jpg")

response = client.chat.completions.create(
    model="Qwen3-VL-30B-A3B-Instruct-FP8",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"},
                },
            ],
        }
    ],
    max_tokens=1000,
)

print(response.choices[0].message.content)

Available Vision Models

Model	Provider	Capabilities
`Qwen3-VL-30B-A3B-Instruct-FP8`	T-Cloud (Germany)	Image understanding, OCR
`gemini-2.5-flash`	Google Cloud	Image + video understanding
`gpt-4.1`	Azure	Image understanding

Check Available Models for the latest list.

Next Steps

Visual RAG — Index and retrieve from documents with text + image understanding
Function Calling — Connect models to external tools
Streaming — Stream responses for better UX