Version: Latest

Chat Completions

The Chat Completions API is the primary way to interact with LLMs on AI Foundation Services. It's fully compatible with the OpenAI Chat API.

Prerequisites

An API key (get one here)
OpenAI SDK or HTTP client installed (Quickstart)

What you'll learn:

How to send chat completion requests with system and user messages
How to use streaming for real-time responses
How to use the Completion and Responses APIs
Key parameters for controlling output

Basic Usage

curl
Python
Node.js

curl -X POST "$OPENAI_BASE_URL/chat/completions" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Llama-3.3-70B-Instruct",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is T-Cloud?"}
    ],
    "temperature": 0.1,
    "max_tokens": 256
  }'

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="Llama-3.3-70B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is T-Cloud?"},
    ],
    temperature=0.1,
    max_tokens=256,
)

print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI();

const response = await client.chat.completions.create({
  model: "Llama-3.3-70B-Instruct",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "What is T-Cloud?" },
  ],
  temperature: 0.1,
  max_tokens: 256,
});

console.log(response.choices[0].message.content);

Streaming

Enable streaming to receive tokens as they're generated. Set stream: true in your request:

stream = client.chat.completions.create(
    model="Llama-3.3-70B-Instruct",
    messages=[{"role": "user", "content": "Write a short poem about AI."}],
    stream=True,
)

for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

For full streaming documentation including error handling, function calling with streams, and stream options, see the Streaming guide.

Completion API

The Completion API sends raw text directly to the LLM without the chat message format.

Python

from openai import OpenAI

client = OpenAI()

completion = client.completions.create(
    model="Llama-3.3-70B-Instruct",
    prompt="What is the Python programming language?",
    stream=False,
    temperature=0.2,
    max_tokens=128,
)

print(completion.choices[0].text)

info

The Completions API is only available for open-source models. To get correct behavior, you need to follow the model's specific prompt template. We recommend always using the Chat API instead.

Response API

Using an up-to-date version of the OpenAI Python package, you can use the Responses API:

Python

from openai import OpenAI

client = OpenAI()

response = client.responses.create(
    model="gpt-4.1",
    input="Write a one-sentence bedtime story about a unicorn.",
)

print(response.output_text)

You can also use both input and instructions fields:

response = client.responses.create(
    model="gpt-4.1",
    instructions="You are a professional copywriter. Focus on benefits rather than features.",
    input="Create a product description for NoiseGuard Pro Headphones.",
    temperature=0.7,
    max_output_tokens=200,
)

print(response.output[0].content[0].text)

info

The Responses API is currently fully supported only for OpenAI models (e.g., gpt-4.1, gpt-4o). T-Cloud-hosted open-source models have limited support.

Parameters

Parameter	Type	Description
`model`	string	Model ID (e.g., `Llama-3.3-70B-Instruct`)
`messages`	array	List of message objects with `role` and `content`
`temperature`	float	Sampling temperature (0-2). Lower = more deterministic
`max_tokens`	integer	Maximum tokens to generate
`stream`	boolean	Enable streaming responses
`top_p`	float	Nucleus sampling parameter

For the full API specification, see the API Reference.

Next Steps

Streaming — Stream responses token-by-token
Function Calling — Connect models to external tools
Multimodal — Analyze images alongside text
Asynchronous Requests — Queue-based processing for batch workloads

Basic Usage​

Streaming​

Completion API​

Response API​

Parameters​

Next Steps​