How to Use OpenAI's gpt-oss: A Step-by-Step Guide to Open-Weight Language Models

Tosin Akinosho

Aug 6, 2025 — 3 min read

Introduction

OpenAI's gpt-oss series introduces powerful, open-weight language models designed for advanced reasoning, agentic tasks, and versatile developer use cases. With a permissive Apache 2.0 license, gpt-oss-120b and gpt-oss-20b are ideal for experimentation, customization, and commercial deployment. In this tutorial, you'll learn how to set up, run, and leverage these models for your own projects.

Model Overview
Setup & Installation
Downloading Model Weights
Usage Examples
Client Integrations
Built-in Tools
Conclusion

Model Overview

gpt-oss-120b: For production, high-reasoning use cases. Fits on a single H100 GPU (117B parameters, 5.1B active).
gpt-oss-20b: For lower latency and local/specialized use cases (21B parameters, 3.6B active).

Both models use the harmony response format and support configurable reasoning effort, full chain-of-thought, fine-tuning, and agentic capabilities (function calling, web browsing, Python code execution, structured outputs).

Setup & Installation

Requirements:

Python 3.12
macOS: Xcode CLI tools (xcode-select --install)
Linux: CUDA for reference implementations
Windows: Not officially supported (try Ollama for local runs)

Install from PyPI:

# Tools only
pip install gpt-oss
# PyTorch implementation
pip install gpt-oss[torch]
# Triton implementation
pip install gpt-oss[triton]

Install from source (for Metal or development):

git clone https://github.com/openai/gpt-oss.git
pip install -e ".[metal]"

Downloading Model Weights

Download weights from the Hugging Face Hub:

# gpt-oss-120b
huggingface-cli download openai/gpt-oss-120b --include "original/*" --local-dir gpt-oss-120b/
# gpt-oss-20b
huggingface-cli download openai/gpt-oss-20b --include "original/*" --local-dir gpt-oss-20b/

Usage Examples

Transformers

from transformers import pipeline
import torch

model_id = "openai/gpt-oss-120b"
pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype="auto",
    device_map="auto",
)

messages = [
    {"role": "user", "content": "Explain quantum mechanics clearly and concisely."},
]
outputs = pipe(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

vLLM

uv pip install --pre vllm==0.10.1+gptoss \
    --extra-index-url https://wheels.vllm.ai/gpt-oss/ \
    --extra-index-url https://download.pytorch.org/whl/nightly/cu128 \
    --index-strategy unsafe-best-match
vllm serve openai/gpt-oss-20b

PyTorch Reference Implementation

pip install -e .[torch]
# On 4xH100:
torchrun --nproc-per-node=4 -m gpt_oss.generate gpt-oss-120b/original/

Triton Reference Implementation (Single GPU)

# Install triton from source
 git clone https://github.com/triton-lang/triton
 cd triton/
 pip install -r python/requirements.txt
 pip install -e . --verbose --no-build-isolation
# Install gpt-oss triton implementation
 pip install -e .[triton]
# On 1xH100
 export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
 python -m gpt_oss.generate --backend triton gpt-oss-120b/original/

Metal (Apple Silicon)

pip install -e .[metal]
# Convert weights
python gpt_oss/metal/scripts/create-local-model.py -s  -d 
# Or download pre-converted weights
huggingface-cli download openai/gpt-oss-120b --include "metal/*" --local-dir gpt-oss-120b/metal/
huggingface-cli download openai/gpt-oss-20b --include "metal/*" --local-dir gpt-oss-20b/metal/
# Test inference
python gpt_oss/metal/examples/generate.py gpt-oss-20b/metal/model.bin -p "why did the chicken cross the road?"

Ollama (Consumer Hardware)

# gpt-oss-20b
ollama pull gpt-oss:20b
ollama run gpt-oss:20b
# gpt-oss-120b
ollama pull gpt-oss:120b
ollama run gpt-oss:120b

LM Studio

# gpt-oss-20b
lms get openai/gpt-oss-20b
# gpt-oss-120b
lms get openai/gpt-oss-120b

Client Integrations

Terminal Chat: Basic chat app using harmony format and various backends.
Responses API: Example server for Responses API compatibility.
Codex: Use with Codex as a client for gpt-oss.

Built-in Tools

Browser Tool: Enables web browsing capabilities for the model. See harmony docs.
Python Tool: Allows the model to execute Python code in a stateless Docker container.
Apply Patch: Create, update, or delete files locally.

Conclusion

OpenAI's gpt-oss models offer a flexible, open, and powerful foundation for building advanced AI applications. With support for multiple backends, agentic tools, and a permissive license, developers can experiment, customize, and deploy with confidence.

For more expert insights and tutorials on AI and automation, visit us at decisioncrafters.com.