Open-R1: The Revolutionary Open-Source Framework That's Democratizing DeepSeek-R1 Reasoning with 25k+ GitHub Stars

Learn how to use Open-R1, Hugging Face's fully open reproduction of DeepSeek-R1. This comprehensive tutorial covers installation, training reasoning models with SFT and GRPO, evaluation on AIME and MATH benchmarks, and synthetic data generation with practical code examples.

Open-R1: The Revolutionary Open-Source Framework That's Democratizing DeepSeek-R1 Reasoning with 25k+ GitHub Stars

Open-R1: The Revolutionary Open-Source Framework That's Democratizing DeepSeek-R1 Reasoning with 25k+ GitHub Stars

The AI community has been buzzing about DeepSeek-R1's impressive reasoning capabilities, but until now, reproducing these results required access to proprietary systems. Enter Open-R1 by Hugging Face – a fully open reproduction of DeepSeek-R1 that's making advanced reasoning models accessible to everyone. With over 25,700 GitHub stars and active development from 44+ contributors, this project is revolutionizing how we approach AI reasoning model development.

What is Open-R1?

Open-R1 is an ambitious open-source project that aims to build the missing pieces of the R1 pipeline, enabling anyone to reproduce and build upon DeepSeek-R1's groundbreaking reasoning capabilities. The project provides a complete framework for training, evaluating, and generating data with reasoning models.

Open-R1 Plan of Attack

Key Features

  • Complete Training Pipeline: Support for both Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO)
  • Scalable Architecture: Built with vLLM and TRL for high-performance training across multiple nodes
  • Comprehensive Evaluation: Integration with LightEval for benchmarking on AIME, MATH-500, GPQA Diamond, and LiveCodeBench
  • Data Generation Tools: Distilabel-powered synthetic data generation from reasoning models
  • Production Ready: Optimized for enterprise deployment with DeepSpeed and multi-GPU support

The Three-Step Master Plan

The Open-R1 project follows a strategic three-step approach based on the DeepSeek-R1 technical report:

  1. Step 1 (✅ Completed): Replicate R1-Distill models by distilling high-quality reasoning traces from DeepSeek-R1
  2. Step 2 (In Progress): Implement the pure RL pipeline for creating R1-Zero with large-scale datasets
  3. Step 3 (Planned): Demonstrate end-to-end training from base model to RL-tuned reasoning model

Installation and Setup

Getting started with Open-R1 requires a proper environment setup. The project relies on CUDA 12.4, so ensure your system compatibility first.

Quick Installation

# Create virtual environment with uv
uv venv openr1 --python 3.11 && source openr1/bin/activate

# Install core dependencies
uv pip install vllm==0.8.5.post1
uv pip install setuptools && uv pip install flash-attn --no-build-isolation

# Install Open-R1 with development dependencies
GIT_LFS_SKIP_SMUDGE=1 uv pip install -e ".[dev]"

# Setup authentication
huggingface-cli login
wandb login

Alternative: Make Installation

# One-command setup
make install

Training Your First Reasoning Model

Open-R1 supports two primary training approaches: Supervised Fine-Tuning (SFT) for distillation and Group Relative Policy Optimization (GRPO) for reinforcement learning.

Supervised Fine-Tuning (SFT)

Train a model on reasoning traces from the Mixture-of-Thoughts dataset:

# Train via command line
accelerate launch --config_file=recipes/accelerate_configs/zero3.yaml src/open_r1/sft.py \
    --model_name_or_path open-r1/Qwen2.5-Math-7B-RoPE-300k \
    --dataset_name open-r1/Mixture-of-Thoughts \
    --dataset_config all \
    --eos_token '<|im_end|>' \
    --learning_rate 4.0e-5 \
    --num_train_epochs 5 \
    --max_seq_length 32768 \
    --per_device_train_batch_size 2 \
    --gradient_checkpointing \
    --bf16 \
    --use_liger_kernel \
    --output_dir data/OpenR1-Distill-7B

Reproducing OpenR1-Distill-7B

The project provides a complete recipe to reproduce DeepSeek-R1-Distill-Qwen-7B performance:

# Train the distilled model
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/zero3.yaml \
    src/open_r1/sft.py \
    --config recipes/OpenR1-Distill-7B/sft/config_distill.yaml

This produces a model with impressive benchmark performance:

Model AIME 2024 MATH-500 GPQA Diamond LiveCodeBench v5
OpenR1-Distill-7B 52.7 89.0 52.8 39.4
DeepSeek-R1-Distill-Qwen-7B 51.3 93.5 52.4 37.4

Group Relative Policy Optimization (GRPO)

For reinforcement learning training, Open-R1 uses TRL's vLLM backend for scalable training:

# Single-node GRPO training
ACCELERATE_LOG_LEVEL=info \
    accelerate launch --config_file recipes/accelerate_configs/zero3.yaml \
    src/open_r1/grpo.py --config recipes/DeepSeek-R1-Distill-Qwen-1.5B/grpo/config_demo.yaml \
    --vllm_mode colocate

Training with Code Interpreter

Open-R1 supports training with code execution rewards using E2B or Morph sandboxes:

# Install code execution dependencies
uv pip install -e '.[code]'

# Setup E2B environment
echo 'E2B_API_KEY="e2b_xxx"' > .env

# Start vLLM server
CUDA_VISIBLE_DEVICES=0 trl vllm-serve --model Qwen/Qwen2.5-1.5B-Instruct

# Run training with code rewards
CUDA_VISIBLE_DEVICES=1,2,3,4,5,6,7 ACCELERATE_LOG_LEVEL=info \
    accelerate launch --config_file recipes/accelerate_configs/zero2.yaml --num_processes=7 \
    src/open_r1/grpo.py --config recipes/Qwen2.5-1.5B-Instruct/grpo/config_demo_code.yaml

Comprehensive Model Evaluation

Open-R1 integrates with LightEval for standardized benchmarking across multiple reasoning tasks.

Single GPU Evaluation

# Setup environment
export VLLM_WORKER_MULTIPROC_METHOD=spawn
MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
MODEL_ARGS="model_name=$MODEL,dtype=bfloat16,max_model_length=32768,gpu_memory_utilization=0.8,generation_parameters={max_new_tokens:32768,temperature:0.6,top_p:0.95}"
OUTPUT_DIR=data/evals/$MODEL

# Run AIME 2024 evaluation
lighteval vllm $MODEL_ARGS "lighteval|aime24|0|0" \
    --use-chat-template \
    --output-dir $OUTPUT_DIR

Multi-GPU Evaluation

# Data parallel evaluation
NUM_GPUS=8
MODEL_ARGS="model_name=$MODEL,dtype=bfloat16,data_parallel_size=$NUM_GPUS,max_model_length=32768,gpu_memory_utilization=0.8,generation_parameters={max_new_tokens:32768,temperature:0.6,top_p:0.95}"

# Tensor parallel for large models
MODEL_ARGS="model_name=$MODEL,dtype=bfloat16,tensor_parallel_size=$NUM_GPUS,max_model_length=32768,gpu_memory_utilization=0.8,generation_parameters={max_new_tokens:32768,temperature:0.6,top_p:0.95}"

Makefile Shortcuts

# Quick evaluation commands
make evaluate MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B TASK=aime24
make evaluate MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B TASK=math_500 PARALLEL=data NUM_GPUS=8

Synthetic Data Generation

One of Open-R1's most powerful features is its ability to generate high-quality reasoning data using Distilabel.

Generate Data from Distilled Models

# pipeline.py
from datasets import load_dataset
from distilabel.models import vLLM
from distilabel.pipeline import Pipeline
from distilabel.steps.tasks import TextGeneration

prompt_template = """
You will be given a problem. Please reason step by step, and put your final answer within \\boxed{}:
{{ instruction }}"""

dataset = load_dataset("AI-MO/NuminaMath-TIR", split="train").select(range(10))
model_id = "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B"

with Pipeline(
    name="distill-qwen-7b-r1",
    description="A pipeline to generate data from a distilled r1 model",
) as pipeline:
    llm = vLLM(
        model=model_id,
        tokenizer=model_id,
        extra_kwargs={
            "tensor_parallel_size": 1,
            "max_model_len": 8192,
        },
        generation_kwargs={
            "temperature": 0.6,
            "max_new_tokens": 8192,
        },
    )
    
    text_generation = TextGeneration(
        llm=llm,
        template=prompt_template,
        num_generations=4,
        input_mappings={"instruction": "problem"}
    )

if __name__ == "__main__":
    distiset = pipeline.run(dataset=dataset)
    distiset.push_to_hub(repo_id="username/numina-deepseek-r1-qwen-7b")

Large-Scale Data Generation

For generating data from the full DeepSeek-R1 model, use the provided Slurm scripts:

# Install dependencies for large-scale generation
pip install https://wheels.vllm.ai/221d388cc5a836fa189305785ed7e887cea8b510/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl
uv pip install "distilabel[vllm,ray,openai]>=1.5.2"

# Submit generation job
sbatch slurm/generate.slurm \
    --hf-dataset AI-MO/NuminaMath-TIR \
    --temperature 0.6 \
    --prompt-column problem \
    --model deepseek-ai/DeepSeek-R1 \
    --hf-output-dataset username/r1-dataset

Key Datasets and Models

Open-R1 has produced several high-quality datasets and models:

Datasets

  • Mixture-of-Thoughts: 350k verified reasoning traces across math, coding, and science
  • OpenR1-Math-220k: 220k mathematical reasoning traces
  • CodeForces-CoTs: 10k competitive programming problems with 100k solutions

Models

  • OpenR1-Distill-7B: Matches DeepSeek-R1-Distill-Qwen-7B performance
  • Various size variants: From 0.6B to 70B parameters

Advanced Features

Dataset Mixing

Combine multiple datasets with custom weights:

# config.yaml
dataset_mixture:
  datasets:
    - id: open-r1/Mixture-of-Thoughts
      config: all
      split: train
      weight: 0.7
    - id: open-r1/OpenR1-Math-220k
      config: default
      split: train
      weight: 0.3
  seed: 42
  test_split_size: 0.1

Data Decontamination

Ensure clean training data with built-in decontamination:

# Decontaminate dataset
python scripts/decontaminate.py \
    --dataset "open-r1/verifiable-coding-problems-python" \
    --problem_column problem \
    --cleanup

Slurm Integration

Scale training across multiple nodes:

# Multi-node SFT training
sbatch --job-name=open_r1 --nodes=1 slurm/train.slurm \
    --model OpenR1-Distill-7B --task sft --config distill --accelerator zero3

# Multi-node GRPO training (1 node for vLLM + N nodes for training)
sbatch --job-name=open_r1 --nodes=2 slurm/train.slurm \
    --model Qwen2.5-1.5B-Instruct --task grpo --config demo --accelerator zero2 --dp 4 --tp 2

Performance Benchmarks

Open-R1 models achieve competitive performance across multiple reasoning benchmarks:

AIME 2024 Results

Model Open-R1 Score DeepSeek Reported
1.5B 30.7 28.9
7B 50.8 55.5
32B 69.7 72.6

MATH-500 Results

Model Open-R1 Score DeepSeek Reported
1.5B 83.1 83.9
7B 94.5 92.8
32B 95.6 94.3

Best Practices and Tips

Hardware Requirements

  • Minimum: Single H100 80GB for 7B models
  • Recommended: 8x H100 80GB for optimal training speed
  • Large Models: Multi-node setup for 32B+ models

Training Tips

  • Use gradient checkpointing to reduce memory usage
  • Enable Liger kernel for improved performance
  • Scale batch size with number of GPUs to maintain consistency
  • Monitor training with Weights & Biases integration

Chat Template Considerations

Pay attention to EOS tokens and chat templates:

# For Qwen base models
--eos_token '<|im_end|>'

# For Llama models with custom template
--chat_template "$(cat llama_chat_template.jinja)" \
--eos_token '<|eot_id|>'

Community and Ecosystem

Open-R1 benefits from a vibrant community and ecosystem:

  • 44+ Contributors: Active development from the global AI community
  • Integration Partners: vLLM, SGLang, OpenThoughts, Prime Intellect
  • Regular Updates: Continuous improvements and new features
  • Comprehensive Documentation: Detailed guides and examples

Future Roadmap

The Open-R1 project continues to evolve with exciting developments:

  • Step 2 Completion: Pure RL pipeline implementation
  • Step 3 Implementation: End-to-end base-to-RL training
  • New Benchmarks: Additional evaluation tasks and metrics
  • Performance Optimizations: Faster training and inference
  • Extended Language Support: Beyond Python code execution

Getting Started Today

Ready to dive into reasoning model development? Here's your quickstart checklist:

  1. Clone the repository: git clone https://github.com/huggingface/open-r1.git
  2. Set up environment: Follow the installation guide
  3. Try the examples: Start with SFT on Mixture-of-Thoughts
  4. Evaluate models: Run benchmarks on your trained models
  5. Generate data: Create custom reasoning datasets
  6. Join the community: Contribute to the project on GitHub

Conclusion

Open-R1 represents a watershed moment in AI reasoning model development. By providing a fully open, reproducible framework for training DeepSeek-R1-level reasoning models, Hugging Face has democratized access to cutting-edge AI capabilities. Whether you're a researcher exploring new reasoning techniques, a developer building AI applications, or an organization looking to deploy reasoning models, Open-R1 provides the tools and infrastructure you need.

The project's comprehensive approach – from data generation and training to evaluation and deployment – makes it an invaluable resource for the AI community. With its proven ability to match proprietary model performance and its active development community, Open-R1 is positioned to drive the next wave of innovations in AI reasoning.

Start your journey with Open-R1 today and join the movement to make advanced AI reasoning accessible to everyone.

For more expert insights and tutorials on AI and automation, visit us at decisioncrafters.com.

Read more

SynkraAI AIOS-Core: The Revolutionary AI-Orchestrated Framework That's Transforming Full Stack Development with 1.4k+ GitHub Stars

SynkraAI AIOS-Core: The Revolutionary AI-Orchestrated Framework That's Transforming Full Stack Development with 1.4k+ GitHub Stars In the rapidly evolving landscape of AI-powered development tools, SynkraAI AIOS-Core stands out as a groundbreaking framework that's revolutionizing how developers approach full-stack development. With over 1,400 GitHub stars

By Tosin Akinosho