Version: Latest

Multimodal (Vision)

AI Foundation Services provides vision models that can analyze images alongside text. Use the same Chat Completions API with image content.

Prerequisites

An API key (get one here)
OpenAI SDK or HTTP client installed (Quickstart)
A vision-capable model (see Available Models)

What you'll learn:

How to analyze images from URLs
How to send local images via base64 encoding
Which models support vision capabilities

Analyze an Image from URL

curl
Python
Node.js

curl -X POST "$OPENAI_BASE_URL/chat/completions" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-2.5-flash",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "What is in this image?"},
          {"type": "image_url", "image_url": {"url": "https://images.unsplash.com/photo-1546069901-ba9599a7e63c?w=400"}}
        ]
      }
    ],
    "max_tokens": 1024
  }'

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gemini-2.5-flash",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://images.unsplash.com/photo-1546069901-ba9599a7e63c?w=400"
                    },
                },
            ],
        }
    ],
    max_tokens=1024,
)

print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI();

const response = await client.chat.completions.create({
  model: "gemini-2.5-flash",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "What's in this image?" },
        {
          type: "image_url",
          image_url: {
            url: "https://images.unsplash.com/photo-1546069901-ba9599a7e63c?w=400",
          },
        },
      ],
    },
  ],
  max_tokens: 1024,
});

console.log(response.choices[0].message.content);

Analyze a Local Image (Base64)

You can also pass a local image as a base64-encoded string:

import base64
from openai import OpenAI

client = OpenAI()

def encode_image(image_path):
    with open(image_path, "rb") as f:
        return base64.b64encode(f.read()).decode("utf-8")

base64_image = encode_image("/path/to/your/image.jpg")

response = client.chat.completions.create(
    model="Qwen3-VL-30B-A3B-Instruct-FP8",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"},
                },
            ],
        }
    ],
    max_tokens=1000,
)

print(response.choices[0].message.content)

Available Vision Models

Model	Provider	Capabilities
`Qwen3-VL-30B-A3B-Instruct-FP8`	T-Cloud (Germany)	Image understanding, OCR
`gemini-2.5-flash`	Google Cloud	Image + video understanding
`gpt-4.1`	Azure	Image understanding

Check Available Models for the latest list.

Next Steps

Visual RAG — Index and retrieve from documents with text + image understanding
Function Calling — Connect models to external tools
Streaming — Stream responses for better UX

Analyze an Image from URL​

Analyze a Local Image (Base64)​

Available Vision Models​

Next Steps​

Analyze an Image from URL

Analyze a Local Image (Base64)

Available Vision Models

Next Steps