Open-R1: The Revolutionary Open-Source Framework That's Democratizing DeepSeek-R1 Reasoning with 25k+ GitHub Stars
Learn how to use Open-R1, Hugging Face's fully open reproduction of DeepSeek-R1. This comprehensive tutorial covers installation, training reasoning models with SFT and GRPO, evaluation on AIME and MATH benchmarks, and synthetic data generation with practical code examples.
Open-R1: The Revolutionary Open-Source Framework That's Democratizing DeepSeek-R1 Reasoning with 25k+ GitHub Stars
The AI community has been buzzing about DeepSeek-R1's impressive reasoning capabilities, but until now, reproducing these results required access to proprietary systems. Enter Open-R1 by Hugging Face – a fully open reproduction of DeepSeek-R1 that's making advanced reasoning models accessible to everyone. With over 25,700 GitHub stars and active development from 44+ contributors, this project is revolutionizing how we approach AI reasoning model development.
What is Open-R1?
Open-R1 is an ambitious open-source project that aims to build the missing pieces of the R1 pipeline, enabling anyone to reproduce and build upon DeepSeek-R1's groundbreaking reasoning capabilities. The project provides a complete framework for training, evaluating, and generating data with reasoning models.

Key Features
- Complete Training Pipeline: Support for both Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO)
- Scalable Architecture: Built with vLLM and TRL for high-performance training across multiple nodes
- Comprehensive Evaluation: Integration with LightEval for benchmarking on AIME, MATH-500, GPQA Diamond, and LiveCodeBench
- Data Generation Tools: Distilabel-powered synthetic data generation from reasoning models
- Production Ready: Optimized for enterprise deployment with DeepSpeed and multi-GPU support
The Three-Step Master Plan
The Open-R1 project follows a strategic three-step approach based on the DeepSeek-R1 technical report:
- Step 1 (✅ Completed): Replicate R1-Distill models by distilling high-quality reasoning traces from DeepSeek-R1
- Step 2 (In Progress): Implement the pure RL pipeline for creating R1-Zero with large-scale datasets
- Step 3 (Planned): Demonstrate end-to-end training from base model to RL-tuned reasoning model
Installation and Setup
Getting started with Open-R1 requires a proper environment setup. The project relies on CUDA 12.4, so ensure your system compatibility first.
Quick Installation
# Create virtual environment with uv
uv venv openr1 --python 3.11 && source openr1/bin/activate
# Install core dependencies
uv pip install vllm==0.8.5.post1
uv pip install setuptools && uv pip install flash-attn --no-build-isolation
# Install Open-R1 with development dependencies
GIT_LFS_SKIP_SMUDGE=1 uv pip install -e ".[dev]"
# Setup authentication
huggingface-cli login
wandb login
Alternative: Make Installation
# One-command setup
make install
Training Your First Reasoning Model
Open-R1 supports two primary training approaches: Supervised Fine-Tuning (SFT) for distillation and Group Relative Policy Optimization (GRPO) for reinforcement learning.
Supervised Fine-Tuning (SFT)
Train a model on reasoning traces from the Mixture-of-Thoughts dataset:
# Train via command line
accelerate launch --config_file=recipes/accelerate_configs/zero3.yaml src/open_r1/sft.py \
--model_name_or_path open-r1/Qwen2.5-Math-7B-RoPE-300k \
--dataset_name open-r1/Mixture-of-Thoughts \
--dataset_config all \
--eos_token '<|im_end|>' \
--learning_rate 4.0e-5 \
--num_train_epochs 5 \
--max_seq_length 32768 \
--per_device_train_batch_size 2 \
--gradient_checkpointing \
--bf16 \
--use_liger_kernel \
--output_dir data/OpenR1-Distill-7B
Reproducing OpenR1-Distill-7B
The project provides a complete recipe to reproduce DeepSeek-R1-Distill-Qwen-7B performance:
# Train the distilled model
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/zero3.yaml \
src/open_r1/sft.py \
--config recipes/OpenR1-Distill-7B/sft/config_distill.yaml
This produces a model with impressive benchmark performance:
| Model | AIME 2024 | MATH-500 | GPQA Diamond | LiveCodeBench v5 |
|---|---|---|---|---|
| OpenR1-Distill-7B | 52.7 | 89.0 | 52.8 | 39.4 |
| DeepSeek-R1-Distill-Qwen-7B | 51.3 | 93.5 | 52.4 | 37.4 |
Group Relative Policy Optimization (GRPO)
For reinforcement learning training, Open-R1 uses TRL's vLLM backend for scalable training:
# Single-node GRPO training
ACCELERATE_LOG_LEVEL=info \
accelerate launch --config_file recipes/accelerate_configs/zero3.yaml \
src/open_r1/grpo.py --config recipes/DeepSeek-R1-Distill-Qwen-1.5B/grpo/config_demo.yaml \
--vllm_mode colocate
Training with Code Interpreter
Open-R1 supports training with code execution rewards using E2B or Morph sandboxes:
# Install code execution dependencies
uv pip install -e '.[code]'
# Setup E2B environment
echo 'E2B_API_KEY="e2b_xxx"' > .env
# Start vLLM server
CUDA_VISIBLE_DEVICES=0 trl vllm-serve --model Qwen/Qwen2.5-1.5B-Instruct
# Run training with code rewards
CUDA_VISIBLE_DEVICES=1,2,3,4,5,6,7 ACCELERATE_LOG_LEVEL=info \
accelerate launch --config_file recipes/accelerate_configs/zero2.yaml --num_processes=7 \
src/open_r1/grpo.py --config recipes/Qwen2.5-1.5B-Instruct/grpo/config_demo_code.yaml
Comprehensive Model Evaluation
Open-R1 integrates with LightEval for standardized benchmarking across multiple reasoning tasks.
Single GPU Evaluation
# Setup environment
export VLLM_WORKER_MULTIPROC_METHOD=spawn
MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
MODEL_ARGS="model_name=$MODEL,dtype=bfloat16,max_model_length=32768,gpu_memory_utilization=0.8,generation_parameters={max_new_tokens:32768,temperature:0.6,top_p:0.95}"
OUTPUT_DIR=data/evals/$MODEL
# Run AIME 2024 evaluation
lighteval vllm $MODEL_ARGS "lighteval|aime24|0|0" \
--use-chat-template \
--output-dir $OUTPUT_DIR
Multi-GPU Evaluation
# Data parallel evaluation
NUM_GPUS=8
MODEL_ARGS="model_name=$MODEL,dtype=bfloat16,data_parallel_size=$NUM_GPUS,max_model_length=32768,gpu_memory_utilization=0.8,generation_parameters={max_new_tokens:32768,temperature:0.6,top_p:0.95}"
# Tensor parallel for large models
MODEL_ARGS="model_name=$MODEL,dtype=bfloat16,tensor_parallel_size=$NUM_GPUS,max_model_length=32768,gpu_memory_utilization=0.8,generation_parameters={max_new_tokens:32768,temperature:0.6,top_p:0.95}"
Makefile Shortcuts
# Quick evaluation commands
make evaluate MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B TASK=aime24
make evaluate MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B TASK=math_500 PARALLEL=data NUM_GPUS=8
Synthetic Data Generation
One of Open-R1's most powerful features is its ability to generate high-quality reasoning data using Distilabel.
Generate Data from Distilled Models
# pipeline.py
from datasets import load_dataset
from distilabel.models import vLLM
from distilabel.pipeline import Pipeline
from distilabel.steps.tasks import TextGeneration
prompt_template = """
You will be given a problem. Please reason step by step, and put your final answer within \\boxed{}:
{{ instruction }}"""
dataset = load_dataset("AI-MO/NuminaMath-TIR", split="train").select(range(10))
model_id = "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B"
with Pipeline(
name="distill-qwen-7b-r1",
description="A pipeline to generate data from a distilled r1 model",
) as pipeline:
llm = vLLM(
model=model_id,
tokenizer=model_id,
extra_kwargs={
"tensor_parallel_size": 1,
"max_model_len": 8192,
},
generation_kwargs={
"temperature": 0.6,
"max_new_tokens": 8192,
},
)
text_generation = TextGeneration(
llm=llm,
template=prompt_template,
num_generations=4,
input_mappings={"instruction": "problem"}
)
if __name__ == "__main__":
distiset = pipeline.run(dataset=dataset)
distiset.push_to_hub(repo_id="username/numina-deepseek-r1-qwen-7b")
Large-Scale Data Generation
For generating data from the full DeepSeek-R1 model, use the provided Slurm scripts:
# Install dependencies for large-scale generation
pip install https://wheels.vllm.ai/221d388cc5a836fa189305785ed7e887cea8b510/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl
uv pip install "distilabel[vllm,ray,openai]>=1.5.2"
# Submit generation job
sbatch slurm/generate.slurm \
--hf-dataset AI-MO/NuminaMath-TIR \
--temperature 0.6 \
--prompt-column problem \
--model deepseek-ai/DeepSeek-R1 \
--hf-output-dataset username/r1-dataset
Key Datasets and Models
Open-R1 has produced several high-quality datasets and models:
Datasets
- Mixture-of-Thoughts: 350k verified reasoning traces across math, coding, and science
- OpenR1-Math-220k: 220k mathematical reasoning traces
- CodeForces-CoTs: 10k competitive programming problems with 100k solutions
Models
- OpenR1-Distill-7B: Matches DeepSeek-R1-Distill-Qwen-7B performance
- Various size variants: From 0.6B to 70B parameters
Advanced Features
Dataset Mixing
Combine multiple datasets with custom weights:
# config.yaml
dataset_mixture:
datasets:
- id: open-r1/Mixture-of-Thoughts
config: all
split: train
weight: 0.7
- id: open-r1/OpenR1-Math-220k
config: default
split: train
weight: 0.3
seed: 42
test_split_size: 0.1
Data Decontamination
Ensure clean training data with built-in decontamination:
# Decontaminate dataset
python scripts/decontaminate.py \
--dataset "open-r1/verifiable-coding-problems-python" \
--problem_column problem \
--cleanup
Slurm Integration
Scale training across multiple nodes:
# Multi-node SFT training
sbatch --job-name=open_r1 --nodes=1 slurm/train.slurm \
--model OpenR1-Distill-7B --task sft --config distill --accelerator zero3
# Multi-node GRPO training (1 node for vLLM + N nodes for training)
sbatch --job-name=open_r1 --nodes=2 slurm/train.slurm \
--model Qwen2.5-1.5B-Instruct --task grpo --config demo --accelerator zero2 --dp 4 --tp 2
Performance Benchmarks
Open-R1 models achieve competitive performance across multiple reasoning benchmarks:
AIME 2024 Results
| Model | Open-R1 Score | DeepSeek Reported |
|---|---|---|
| 1.5B | 30.7 | 28.9 |
| 7B | 50.8 | 55.5 |
| 32B | 69.7 | 72.6 |
MATH-500 Results
| Model | Open-R1 Score | DeepSeek Reported |
|---|---|---|
| 1.5B | 83.1 | 83.9 |
| 7B | 94.5 | 92.8 |
| 32B | 95.6 | 94.3 |
Best Practices and Tips
Hardware Requirements
- Minimum: Single H100 80GB for 7B models
- Recommended: 8x H100 80GB for optimal training speed
- Large Models: Multi-node setup for 32B+ models
Training Tips
- Use gradient checkpointing to reduce memory usage
- Enable Liger kernel for improved performance
- Scale batch size with number of GPUs to maintain consistency
- Monitor training with Weights & Biases integration
Chat Template Considerations
Pay attention to EOS tokens and chat templates:
# For Qwen base models
--eos_token '<|im_end|>'
# For Llama models with custom template
--chat_template "$(cat llama_chat_template.jinja)" \
--eos_token '<|eot_id|>'
Community and Ecosystem
Open-R1 benefits from a vibrant community and ecosystem:
- 44+ Contributors: Active development from the global AI community
- Integration Partners: vLLM, SGLang, OpenThoughts, Prime Intellect
- Regular Updates: Continuous improvements and new features
- Comprehensive Documentation: Detailed guides and examples
Future Roadmap
The Open-R1 project continues to evolve with exciting developments:
- Step 2 Completion: Pure RL pipeline implementation
- Step 3 Implementation: End-to-end base-to-RL training
- New Benchmarks: Additional evaluation tasks and metrics
- Performance Optimizations: Faster training and inference
- Extended Language Support: Beyond Python code execution
Getting Started Today
Ready to dive into reasoning model development? Here's your quickstart checklist:
- Clone the repository:
git clone https://github.com/huggingface/open-r1.git - Set up environment: Follow the installation guide
- Try the examples: Start with SFT on Mixture-of-Thoughts
- Evaluate models: Run benchmarks on your trained models
- Generate data: Create custom reasoning datasets
- Join the community: Contribute to the project on GitHub
Conclusion
Open-R1 represents a watershed moment in AI reasoning model development. By providing a fully open, reproducible framework for training DeepSeek-R1-level reasoning models, Hugging Face has democratized access to cutting-edge AI capabilities. Whether you're a researcher exploring new reasoning techniques, a developer building AI applications, or an organization looking to deploy reasoning models, Open-R1 provides the tools and infrastructure you need.
The project's comprehensive approach – from data generation and training to evaluation and deployment – makes it an invaluable resource for the AI community. With its proven ability to match proprietary model performance and its active development community, Open-R1 is positioned to drive the next wave of innovations in AI reasoning.
Start your journey with Open-R1 today and join the movement to make advanced AI reasoning accessible to everyone.
For more expert insights and tutorials on AI and automation, visit us at decisioncrafters.com.