How to Use OpenAI's gpt-oss: A Step-by-Step Guide to Open-Weight Language Models
Introduction
OpenAI's gpt-oss series introduces powerful, open-weight language models designed for advanced reasoning, agentic tasks, and versatile developer use cases. With a permissive Apache 2.0 license, gpt-oss-120b and gpt-oss-20b are ideal for experimentation, customization, and commercial deployment. In this tutorial, you'll learn how to set up, run, and leverage these models for your own projects.
Table of Contents
- Model Overview
- Setup & Installation
- Downloading Model Weights
- Usage Examples
- Client Integrations
- Built-in Tools
- Conclusion
Model Overview
- gpt-oss-120b: For production, high-reasoning use cases. Fits on a single H100 GPU (117B parameters, 5.1B active).
- gpt-oss-20b: For lower latency and local/specialized use cases (21B parameters, 3.6B active).
Both models use the harmony response format and support configurable reasoning effort, full chain-of-thought, fine-tuning, and agentic capabilities (function calling, web browsing, Python code execution, structured outputs).
Setup & Installation
Requirements:
- Python 3.12
- macOS: Xcode CLI tools (
xcode-select --install
) - Linux: CUDA for reference implementations
- Windows: Not officially supported (try Ollama for local runs)
Install from PyPI:
# Tools only
pip install gpt-oss
# PyTorch implementation
pip install gpt-oss[torch]
# Triton implementation
pip install gpt-oss[triton]
Install from source (for Metal or development):
git clone https://github.com/openai/gpt-oss.git
pip install -e ".[metal]"
Downloading Model Weights
Download weights from the Hugging Face Hub:
# gpt-oss-120b
huggingface-cli download openai/gpt-oss-120b --include "original/*" --local-dir gpt-oss-120b/
# gpt-oss-20b
huggingface-cli download openai/gpt-oss-20b --include "original/*" --local-dir gpt-oss-20b/
Usage Examples
Transformers
from transformers import pipeline
import torch
model_id = "openai/gpt-oss-120b"
pipe = pipeline(
"text-generation",
model=model_id,
torch_dtype="auto",
device_map="auto",
)
messages = [
{"role": "user", "content": "Explain quantum mechanics clearly and concisely."},
]
outputs = pipe(
messages,
max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])
More on using gpt-oss with Transformers
vLLM
uv pip install --pre vllm==0.10.1+gptoss \
--extra-index-url https://wheels.vllm.ai/gpt-oss/ \
--extra-index-url https://download.pytorch.org/whl/nightly/cu128 \
--index-strategy unsafe-best-match
vllm serve openai/gpt-oss-20b
More on using gpt-oss with vLLM
PyTorch Reference Implementation
pip install -e .[torch]
# On 4xH100:
torchrun --nproc-per-node=4 -m gpt_oss.generate gpt-oss-120b/original/
Triton Reference Implementation (Single GPU)
# Install triton from source
git clone https://github.com/triton-lang/triton
cd triton/
pip install -r python/requirements.txt
pip install -e . --verbose --no-build-isolation
# Install gpt-oss triton implementation
pip install -e .[triton]
# On 1xH100
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
python -m gpt_oss.generate --backend triton gpt-oss-120b/original/
Metal (Apple Silicon)
pip install -e .[metal]
# Convert weights
python gpt_oss/metal/scripts/create-local-model.py -s -d
# Or download pre-converted weights
huggingface-cli download openai/gpt-oss-120b --include "metal/*" --local-dir gpt-oss-120b/metal/
huggingface-cli download openai/gpt-oss-20b --include "metal/*" --local-dir gpt-oss-20b/metal/
# Test inference
python gpt_oss/metal/examples/generate.py gpt-oss-20b/metal/model.bin -p "why did the chicken cross the road?"
Ollama (Consumer Hardware)
# gpt-oss-20b
ollama pull gpt-oss:20b
ollama run gpt-oss:20b
# gpt-oss-120b
ollama pull gpt-oss:120b
ollama run gpt-oss:120b
LM Studio
# gpt-oss-20b
lms get openai/gpt-oss-20b
# gpt-oss-120b
lms get openai/gpt-oss-120b
Client Integrations
- Terminal Chat: Basic chat app using harmony format and various backends.
- Responses API: Example server for Responses API compatibility.
- Codex: Use with Codex as a client for gpt-oss.
Built-in Tools
- Browser Tool: Enables web browsing capabilities for the model. See harmony docs.
- Python Tool: Allows the model to execute Python code in a stateless Docker container.
- Apply Patch: Create, update, or delete files locally.
Conclusion
OpenAI's gpt-oss models offer a flexible, open, and powerful foundation for building advanced AI applications. With support for multiple backends, agentic tools, and a permissive license, developers can experiment, customize, and deploy with confidence.
For more expert insights and tutorials on AI and automation, visit us at decisioncrafters.com.
