Open-R1: The Revolutionary Open-Source Framework That's Democratizing DeepSeek-R1 Reasoning with 25k+ GitHub Stars

Learn how to use Open-R1, Hugging Face's fully open reproduction of DeepSeek-R1. This comprehensive tutorial covers installation, training reasoning models with SFT and GRPO, evaluation on AIME and MATH benchmarks, and synthetic data generation with practical code examples.

Open-R1: The Revolutionary Open-Source Framework That's Democratizing DeepSeek-R1 Reasoning with 25k+ GitHub Stars

Open-R1: The Revolutionary Open-Source Framework That's Democratizing DeepSeek-R1 Reasoning with 25k+ GitHub Stars

The AI community has been buzzing about DeepSeek-R1's impressive reasoning capabilities, but until now, reproducing these results required access to proprietary systems. Enter Open-R1 by Hugging Face – a fully open reproduction of DeepSeek-R1 that's making advanced reasoning models accessible to everyone. With over 25,700 GitHub stars and active development from 44+ contributors, this project is revolutionizing how we approach AI reasoning model development.

What is Open-R1?

Open-R1 is an ambitious open-source project that aims to build the missing pieces of the R1 pipeline, enabling anyone to reproduce and build upon DeepSeek-R1's groundbreaking reasoning capabilities. The project provides a complete framework for training, evaluating, and generating data with reasoning models.

Open-R1 Plan of Attack

Key Features

  • Complete Training Pipeline: Support for both Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO)
  • Scalable Architecture: Built with vLLM and TRL for high-performance training across multiple nodes
  • Comprehensive Evaluation: Integration with LightEval for benchmarking on AIME, MATH-500, GPQA Diamond, and LiveCodeBench
  • Data Generation Tools: Distilabel-powered synthetic data generation from reasoning models
  • Production Ready: Optimized for enterprise deployment with DeepSpeed and multi-GPU support

The Three-Step Master Plan

The Open-R1 project follows a strategic three-step approach based on the DeepSeek-R1 technical report:

  1. Step 1 (✅ Completed): Replicate R1-Distill models by distilling high-quality reasoning traces from DeepSeek-R1
  2. Step 2 (In Progress): Implement the pure RL pipeline for creating R1-Zero with large-scale datasets
  3. Step 3 (Planned): Demonstrate end-to-end training from base model to RL-tuned reasoning model

Installation and Setup

Getting started with Open-R1 requires a proper environment setup. The project relies on CUDA 12.4, so ensure your system compatibility first.

Quick Installation

# Create virtual environment with uv
uv venv openr1 --python 3.11 && source openr1/bin/activate

# Install core dependencies
uv pip install vllm==0.8.5.post1
uv pip install setuptools && uv pip install flash-attn --no-build-isolation

# Install Open-R1 with development dependencies
GIT_LFS_SKIP_SMUDGE=1 uv pip install -e ".[dev]"

# Setup authentication
huggingface-cli login
wandb login

Alternative: Make Installation

# One-command setup
make install

Training Your First Reasoning Model

Open-R1 supports two primary training approaches: Supervised Fine-Tuning (SFT) for distillation and Group Relative Policy Optimization (GRPO) for reinforcement learning.

Supervised Fine-Tuning (SFT)

Train a model on reasoning traces from the Mixture-of-Thoughts dataset:

# Train via command line
accelerate launch --config_file=recipes/accelerate_configs/zero3.yaml src/open_r1/sft.py \
    --model_name_or_path open-r1/Qwen2.5-Math-7B-RoPE-300k \
    --dataset_name open-r1/Mixture-of-Thoughts \
    --dataset_config all \
    --eos_token '<|im_end|>' \
    --learning_rate 4.0e-5 \
    --num_train_epochs 5 \
    --max_seq_length 32768 \
    --per_device_train_batch_size 2 \
    --gradient_checkpointing \
    --bf16 \
    --use_liger_kernel \
    --output_dir data/OpenR1-Distill-7B

Reproducing OpenR1-Distill-7B

The project provides a complete recipe to reproduce DeepSeek-R1-Distill-Qwen-7B performance:

# Train the distilled model
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/zero3.yaml \
    src/open_r1/sft.py \
    --config recipes/OpenR1-Distill-7B/sft/config_distill.yaml

This produces a model with impressive benchmark performance:

Model AIME 2024 MATH-500 GPQA Diamond LiveCodeBench v5
OpenR1-Distill-7B 52.7 89.0 52.8 39.4
DeepSeek-R1-Distill-Qwen-7B 51.3 93.5 52.4 37.4

Group Relative Policy Optimization (GRPO)

For reinforcement learning training, Open-R1 uses TRL's vLLM backend for scalable training:

# Single-node GRPO training
ACCELERATE_LOG_LEVEL=info \
    accelerate launch --config_file recipes/accelerate_configs/zero3.yaml \
    src/open_r1/grpo.py --config recipes/DeepSeek-R1-Distill-Qwen-1.5B/grpo/config_demo.yaml \
    --vllm_mode colocate

Training with Code Interpreter

Open-R1 supports training with code execution rewards using E2B or Morph sandboxes:

# Install code execution dependencies
uv pip install -e '.[code]'

# Setup E2B environment
echo 'E2B_API_KEY="e2b_xxx"' > .env

# Start vLLM server
CUDA_VISIBLE_DEVICES=0 trl vllm-serve --model Qwen/Qwen2.5-1.5B-Instruct

# Run training with code rewards
CUDA_VISIBLE_DEVICES=1,2,3,4,5,6,7 ACCELERATE_LOG_LEVEL=info \
    accelerate launch --config_file recipes/accelerate_configs/zero2.yaml --num_processes=7 \
    src/open_r1/grpo.py --config recipes/Qwen2.5-1.5B-Instruct/grpo/config_demo_code.yaml

Comprehensive Model Evaluation

Open-R1 integrates with LightEval for standardized benchmarking across multiple reasoning tasks.

Single GPU Evaluation

# Setup environment
export VLLM_WORKER_MULTIPROC_METHOD=spawn
MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
MODEL_ARGS="model_name=$MODEL,dtype=bfloat16,max_model_length=32768,gpu_memory_utilization=0.8,generation_parameters={max_new_tokens:32768,temperature:0.6,top_p:0.95}"
OUTPUT_DIR=data/evals/$MODEL

# Run AIME 2024 evaluation
lighteval vllm $MODEL_ARGS "lighteval|aime24|0|0" \
    --use-chat-template \
    --output-dir $OUTPUT_DIR

Multi-GPU Evaluation

# Data parallel evaluation
NUM_GPUS=8
MODEL_ARGS="model_name=$MODEL,dtype=bfloat16,data_parallel_size=$NUM_GPUS,max_model_length=32768,gpu_memory_utilization=0.8,generation_parameters={max_new_tokens:32768,temperature:0.6,top_p:0.95}"

# Tensor parallel for large models
MODEL_ARGS="model_name=$MODEL,dtype=bfloat16,tensor_parallel_size=$NUM_GPUS,max_model_length=32768,gpu_memory_utilization=0.8,generation_parameters={max_new_tokens:32768,temperature:0.6,top_p:0.95}"

Makefile Shortcuts

# Quick evaluation commands
make evaluate MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B TASK=aime24
make evaluate MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B TASK=math_500 PARALLEL=data NUM_GPUS=8

Synthetic Data Generation

One of Open-R1's most powerful features is its ability to generate high-quality reasoning data using Distilabel.

Generate Data from Distilled Models

# pipeline.py
from datasets import load_dataset
from distilabel.models import vLLM
from distilabel.pipeline import Pipeline
from distilabel.steps.tasks import TextGeneration

prompt_template = """
You will be given a problem. Please reason step by step, and put your final answer within \\boxed{}:
{{ instruction }}"""

dataset = load_dataset("AI-MO/NuminaMath-TIR", split="train").select(range(10))
model_id = "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B"

with Pipeline(
    name="distill-qwen-7b-r1",
    description="A pipeline to generate data from a distilled r1 model",
) as pipeline:
    llm = vLLM(
        model=model_id,
        tokenizer=model_id,
        extra_kwargs={
            "tensor_parallel_size": 1,
            "max_model_len": 8192,
        },
        generation_kwargs={
            "temperature": 0.6,
            "max_new_tokens": 8192,
        },
    )
    
    text_generation = TextGeneration(
        llm=llm,
        template=prompt_template,
        num_generations=4,
        input_mappings={"instruction": "problem"}
    )

if __name__ == "__main__":
    distiset = pipeline.run(dataset=dataset)
    distiset.push_to_hub(repo_id="username/numina-deepseek-r1-qwen-7b")

Large-Scale Data Generation

For generating data from the full DeepSeek-R1 model, use the provided Slurm scripts:

# Install dependencies for large-scale generation
pip install https://wheels.vllm.ai/221d388cc5a836fa189305785ed7e887cea8b510/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl
uv pip install "distilabel[vllm,ray,openai]>=1.5.2"

# Submit generation job
sbatch slurm/generate.slurm \
    --hf-dataset AI-MO/NuminaMath-TIR \
    --temperature 0.6 \
    --prompt-column problem \
    --model deepseek-ai/DeepSeek-R1 \
    --hf-output-dataset username/r1-dataset

Key Datasets and Models

Open-R1 has produced several high-quality datasets and models:

Datasets

  • Mixture-of-Thoughts: 350k verified reasoning traces across math, coding, and science
  • OpenR1-Math-220k: 220k mathematical reasoning traces
  • CodeForces-CoTs: 10k competitive programming problems with 100k solutions

Models

  • OpenR1-Distill-7B: Matches DeepSeek-R1-Distill-Qwen-7B performance
  • Various size variants: From 0.6B to 70B parameters

Advanced Features

Dataset Mixing

Combine multiple datasets with custom weights:

# config.yaml
dataset_mixture:
  datasets:
    - id: open-r1/Mixture-of-Thoughts
      config: all
      split: train
      weight: 0.7
    - id: open-r1/OpenR1-Math-220k
      config: default
      split: train
      weight: 0.3
  seed: 42
  test_split_size: 0.1

Data Decontamination

Ensure clean training data with built-in decontamination:

# Decontaminate dataset
python scripts/decontaminate.py \
    --dataset "open-r1/verifiable-coding-problems-python" \
    --problem_column problem \
    --cleanup

Slurm Integration

Scale training across multiple nodes:

# Multi-node SFT training
sbatch --job-name=open_r1 --nodes=1 slurm/train.slurm \
    --model OpenR1-Distill-7B --task sft --config distill --accelerator zero3

# Multi-node GRPO training (1 node for vLLM + N nodes for training)
sbatch --job-name=open_r1 --nodes=2 slurm/train.slurm \
    --model Qwen2.5-1.5B-Instruct --task grpo --config demo --accelerator zero2 --dp 4 --tp 2

Performance Benchmarks

Open-R1 models achieve competitive performance across multiple reasoning benchmarks:

AIME 2024 Results

Model Open-R1 Score DeepSeek Reported
1.5B 30.7 28.9
7B 50.8 55.5
32B 69.7 72.6

MATH-500 Results

Model Open-R1 Score DeepSeek Reported
1.5B 83.1 83.9
7B 94.5 92.8
32B 95.6 94.3

Best Practices and Tips

Hardware Requirements

  • Minimum: Single H100 80GB for 7B models
  • Recommended: 8x H100 80GB for optimal training speed
  • Large Models: Multi-node setup for 32B+ models

Training Tips

  • Use gradient checkpointing to reduce memory usage
  • Enable Liger kernel for improved performance
  • Scale batch size with number of GPUs to maintain consistency
  • Monitor training with Weights & Biases integration

Chat Template Considerations

Pay attention to EOS tokens and chat templates:

# For Qwen base models
--eos_token '<|im_end|>'

# For Llama models with custom template
--chat_template "$(cat llama_chat_template.jinja)" \
--eos_token '<|eot_id|>'

Community and Ecosystem

Open-R1 benefits from a vibrant community and ecosystem:

  • 44+ Contributors: Active development from the global AI community
  • Integration Partners: vLLM, SGLang, OpenThoughts, Prime Intellect
  • Regular Updates: Continuous improvements and new features
  • Comprehensive Documentation: Detailed guides and examples

Future Roadmap

The Open-R1 project continues to evolve with exciting developments:

  • Step 2 Completion: Pure RL pipeline implementation
  • Step 3 Implementation: End-to-end base-to-RL training
  • New Benchmarks: Additional evaluation tasks and metrics
  • Performance Optimizations: Faster training and inference
  • Extended Language Support: Beyond Python code execution

Getting Started Today

Ready to dive into reasoning model development? Here's your quickstart checklist:

  1. Clone the repository: git clone https://github.com/huggingface/open-r1.git
  2. Set up environment: Follow the installation guide
  3. Try the examples: Start with SFT on Mixture-of-Thoughts
  4. Evaluate models: Run benchmarks on your trained models
  5. Generate data: Create custom reasoning datasets
  6. Join the community: Contribute to the project on GitHub

Conclusion

Open-R1 represents a watershed moment in AI reasoning model development. By providing a fully open, reproducible framework for training DeepSeek-R1-level reasoning models, Hugging Face has democratized access to cutting-edge AI capabilities. Whether you're a researcher exploring new reasoning techniques, a developer building AI applications, or an organization looking to deploy reasoning models, Open-R1 provides the tools and infrastructure you need.

The project's comprehensive approach – from data generation and training to evaluation and deployment – makes it an invaluable resource for the AI community. With its proven ability to match proprietary model performance and its active development community, Open-R1 is positioned to drive the next wave of innovations in AI reasoning.

Start your journey with Open-R1 today and join the movement to make advanced AI reasoning accessible to everyone.

For more expert insights and tutorials on AI and automation, visit us at decisioncrafters.com.

Read more