How to Use OpenAI's gpt-oss: A Step-by-Step Guide to Open-Weight Language Models

How to Use OpenAI's gpt-oss: A Step-by-Step Guide to Open-Weight Language Models
gpt-oss-120

Introduction

OpenAI's gpt-oss series introduces powerful, open-weight language models designed for advanced reasoning, agentic tasks, and versatile developer use cases. With a permissive Apache 2.0 license, gpt-oss-120b and gpt-oss-20b are ideal for experimentation, customization, and commercial deployment. In this tutorial, you'll learn how to set up, run, and leverage these models for your own projects.

Table of Contents

Model Overview

  • gpt-oss-120b: For production, high-reasoning use cases. Fits on a single H100 GPU (117B parameters, 5.1B active).
  • gpt-oss-20b: For lower latency and local/specialized use cases (21B parameters, 3.6B active).

Both models use the harmony response format and support configurable reasoning effort, full chain-of-thought, fine-tuning, and agentic capabilities (function calling, web browsing, Python code execution, structured outputs).

Setup & Installation

Requirements:

  • Python 3.12
  • macOS: Xcode CLI tools (xcode-select --install)
  • Linux: CUDA for reference implementations
  • Windows: Not officially supported (try Ollama for local runs)

Install from PyPI:

# Tools only
pip install gpt-oss
# PyTorch implementation
pip install gpt-oss[torch]
# Triton implementation
pip install gpt-oss[triton]

Install from source (for Metal or development):

git clone https://github.com/openai/gpt-oss.git
pip install -e ".[metal]"

Downloading Model Weights

Download weights from the Hugging Face Hub:

# gpt-oss-120b
huggingface-cli download openai/gpt-oss-120b --include "original/*" --local-dir gpt-oss-120b/
# gpt-oss-20b
huggingface-cli download openai/gpt-oss-20b --include "original/*" --local-dir gpt-oss-20b/

Usage Examples

Transformers

from transformers import pipeline
import torch

model_id = "openai/gpt-oss-120b"
pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype="auto",
    device_map="auto",
)

messages = [
    {"role": "user", "content": "Explain quantum mechanics clearly and concisely."},
]
outputs = pipe(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

More on using gpt-oss with Transformers

vLLM

uv pip install --pre vllm==0.10.1+gptoss \
    --extra-index-url https://wheels.vllm.ai/gpt-oss/ \
    --extra-index-url https://download.pytorch.org/whl/nightly/cu128 \
    --index-strategy unsafe-best-match
vllm serve openai/gpt-oss-20b

More on using gpt-oss with vLLM

PyTorch Reference Implementation

pip install -e .[torch]
# On 4xH100:
torchrun --nproc-per-node=4 -m gpt_oss.generate gpt-oss-120b/original/

Triton Reference Implementation (Single GPU)

# Install triton from source
 git clone https://github.com/triton-lang/triton
 cd triton/
 pip install -r python/requirements.txt
 pip install -e . --verbose --no-build-isolation
# Install gpt-oss triton implementation
 pip install -e .[triton]
# On 1xH100
 export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
 python -m gpt_oss.generate --backend triton gpt-oss-120b/original/

Metal (Apple Silicon)

pip install -e .[metal]
# Convert weights
python gpt_oss/metal/scripts/create-local-model.py -s  -d 
# Or download pre-converted weights
huggingface-cli download openai/gpt-oss-120b --include "metal/*" --local-dir gpt-oss-120b/metal/
huggingface-cli download openai/gpt-oss-20b --include "metal/*" --local-dir gpt-oss-20b/metal/
# Test inference
python gpt_oss/metal/examples/generate.py gpt-oss-20b/metal/model.bin -p "why did the chicken cross the road?"

Ollama (Consumer Hardware)

# gpt-oss-20b
ollama pull gpt-oss:20b
ollama run gpt-oss:20b
# gpt-oss-120b
ollama pull gpt-oss:120b
ollama run gpt-oss:120b

LM Studio

# gpt-oss-20b
lms get openai/gpt-oss-20b
# gpt-oss-120b
lms get openai/gpt-oss-120b

Client Integrations

  • Terminal Chat: Basic chat app using harmony format and various backends.
  • Responses API: Example server for Responses API compatibility.
  • Codex: Use with Codex as a client for gpt-oss.

Built-in Tools

  • Browser Tool: Enables web browsing capabilities for the model. See harmony docs.
  • Python Tool: Allows the model to execute Python code in a stateless Docker container.
  • Apply Patch: Create, update, or delete files locally.

Conclusion

OpenAI's gpt-oss models offer a flexible, open, and powerful foundation for building advanced AI applications. With support for multiple backends, agentic tools, and a permissive license, developers can experiment, customize, and deploy with confidence.

For more expert insights and tutorials on AI and automation, visit us at decisioncrafters.com.

gpt-oss README screenshot

Read more

MetaMCP: The Complete Guide to MCP Aggregation, Orchestration, and Gateway Management

MetaMCP: The Complete Guide to MCP Aggregation, Orchestration, and Gateway Management

Introduction MetaMCP is a powerful MCP (Model Context Protocol) aggregator, orchestrator, middleware, and gateway that allows you to dynamically aggregate multiple MCP servers into a unified endpoint. As a comprehensive solution packaged in Docker, MetaMCP enables developers to build sophisticated AI agent infrastructures with enhanced observability, security, and scalability. Table

By Tosin Akinosho