Open-R1: The Revolutionary Open-Source Framework That's Democratizing DeepSeek-R1 Reasoning with 25k+ GitHub Stars

Learn how to use Open-R1, Hugging Face's fully open reproduction of DeepSeek-R1. This comprehensive tutorial covers installation, training reasoning models with SFT and GRPO, evaluation on AIME and MATH benchmarks, and synthetic data generation with practical code examples.

Tosin Akinosho

Dec 2, 2025 — 7 min read

Open-R1: The Revolutionary Open-Source Framework That's Democratizing DeepSeek-R1 Reasoning with 25k+ GitHub Stars

The AI community has been buzzing about DeepSeek-R1's impressive reasoning capabilities, but until now, reproducing these results required access to proprietary systems. Enter Open-R1 by Hugging Face – a fully open reproduction of DeepSeek-R1 that's making advanced reasoning models accessible to everyone. With over 25,700 GitHub stars and active development from 44+ contributors, this project is revolutionizing how we approach AI reasoning model development.

What is Open-R1?

Open-R1 is an ambitious open-source project that aims to build the missing pieces of the R1 pipeline, enabling anyone to reproduce and build upon DeepSeek-R1's groundbreaking reasoning capabilities. The project provides a complete framework for training, evaluating, and generating data with reasoning models.

Key Features

Complete Training Pipeline: Support for both Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO)
Scalable Architecture: Built with vLLM and TRL for high-performance training across multiple nodes
Comprehensive Evaluation: Integration with LightEval for benchmarking on AIME, MATH-500, GPQA Diamond, and LiveCodeBench
Data Generation Tools: Distilabel-powered synthetic data generation from reasoning models
Production Ready: Optimized for enterprise deployment with DeepSpeed and multi-GPU support

The Three-Step Master Plan

The Open-R1 project follows a strategic three-step approach based on the DeepSeek-R1 technical report:

Step 1 (✅ Completed): Replicate R1-Distill models by distilling high-quality reasoning traces from DeepSeek-R1
Step 2 (In Progress): Implement the pure RL pipeline for creating R1-Zero with large-scale datasets
Step 3 (Planned): Demonstrate end-to-end training from base model to RL-tuned reasoning model

Installation and Setup

Getting started with Open-R1 requires a proper environment setup. The project relies on CUDA 12.4, so ensure your system compatibility first.

Quick Installation

# Create virtual environment with uv
uv venv openr1 --python 3.11 && source openr1/bin/activate

# Install core dependencies
uv pip install vllm==0.8.5.post1
uv pip install setuptools && uv pip install flash-attn --no-build-isolation

# Install Open-R1 with development dependencies
GIT_LFS_SKIP_SMUDGE=1 uv pip install -e ".[dev]"

# Setup authentication
huggingface-cli login
wandb login

Alternative: Make Installation

# One-command setup
make install

Training Your First Reasoning Model

Open-R1 supports two primary training approaches: Supervised Fine-Tuning (SFT) for distillation and Group Relative Policy Optimization (GRPO) for reinforcement learning.

Supervised Fine-Tuning (SFT)

Train a model on reasoning traces from the Mixture-of-Thoughts dataset:

# Train via command line
accelerate launch --config_file=recipes/accelerate_configs/zero3.yaml src/open_r1/sft.py \
    --model_name_or_path open-r1/Qwen2.5-Math-7B-RoPE-300k \
    --dataset_name open-r1/Mixture-of-Thoughts \
    --dataset_config all \
    --eos_token '<|im_end|>' \
    --learning_rate 4.0e-5 \
    --num_train_epochs 5 \
    --max_seq_length 32768 \
    --per_device_train_batch_size 2 \
    --gradient_checkpointing \
    --bf16 \
    --use_liger_kernel \
    --output_dir data/OpenR1-Distill-7B

Reproducing OpenR1-Distill-7B

The project provides a complete recipe to reproduce DeepSeek-R1-Distill-Qwen-7B performance:

# Train the distilled model
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/zero3.yaml \
    src/open_r1/sft.py \
    --config recipes/OpenR1-Distill-7B/sft/config_distill.yaml

This produces a model with impressive benchmark performance:

Model	AIME 2024	MATH-500	GPQA Diamond	LiveCodeBench v5
OpenR1-Distill-7B	52.7	89.0	52.8	39.4
DeepSeek-R1-Distill-Qwen-7B	51.3	93.5	52.4	37.4

Group Relative Policy Optimization (GRPO)

For reinforcement learning training, Open-R1 uses TRL's vLLM backend for scalable training:

# Single-node GRPO training
ACCELERATE_LOG_LEVEL=info \
    accelerate launch --config_file recipes/accelerate_configs/zero3.yaml \
    src/open_r1/grpo.py --config recipes/DeepSeek-R1-Distill-Qwen-1.5B/grpo/config_demo.yaml \
    --vllm_mode colocate

Training with Code Interpreter

Open-R1 supports training with code execution rewards using E2B or Morph sandboxes:

# Install code execution dependencies
uv pip install -e '.[code]'

# Setup E2B environment
echo 'E2B_API_KEY="e2b_xxx"' > .env

# Start vLLM server
CUDA_VISIBLE_DEVICES=0 trl vllm-serve --model Qwen/Qwen2.5-1.5B-Instruct

# Run training with code rewards
CUDA_VISIBLE_DEVICES=1,2,3,4,5,6,7 ACCELERATE_LOG_LEVEL=info \
    accelerate launch --config_file recipes/accelerate_configs/zero2.yaml --num_processes=7 \
    src/open_r1/grpo.py --config recipes/Qwen2.5-1.5B-Instruct/grpo/config_demo_code.yaml

Comprehensive Model Evaluation

Open-R1 integrates with LightEval for standardized benchmarking across multiple reasoning tasks.

Single GPU Evaluation

# Setup environment
export VLLM_WORKER_MULTIPROC_METHOD=spawn
MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
MODEL_ARGS="model_name=$MODEL,dtype=bfloat16,max_model_length=32768,gpu_memory_utilization=0.8,generation_parameters={max_new_tokens:32768,temperature:0.6,top_p:0.95}"
OUTPUT_DIR=data/evals/$MODEL

# Run AIME 2024 evaluation
lighteval vllm $MODEL_ARGS "lighteval|aime24|0|0" \
    --use-chat-template \
    --output-dir $OUTPUT_DIR

Multi-GPU Evaluation

# Data parallel evaluation
NUM_GPUS=8
MODEL_ARGS="model_name=$MODEL,dtype=bfloat16,data_parallel_size=$NUM_GPUS,max_model_length=32768,gpu_memory_utilization=0.8,generation_parameters={max_new_tokens:32768,temperature:0.6,top_p:0.95}"

# Tensor parallel for large models
MODEL_ARGS="model_name=$MODEL,dtype=bfloat16,tensor_parallel_size=$NUM_GPUS,max_model_length=32768,gpu_memory_utilization=0.8,generation_parameters={max_new_tokens:32768,temperature:0.6,top_p:0.95}"

Makefile Shortcuts

# Quick evaluation commands
make evaluate MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B TASK=aime24
make evaluate MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B TASK=math_500 PARALLEL=data NUM_GPUS=8

Synthetic Data Generation

One of Open-R1's most powerful features is its ability to generate high-quality reasoning data using Distilabel.

Generate Data from Distilled Models

# pipeline.py
from datasets import load_dataset
from distilabel.models import vLLM
from distilabel.pipeline import Pipeline
from distilabel.steps.tasks import TextGeneration

prompt_template = """
You will be given a problem. Please reason step by step, and put your final answer within \\boxed{}:
{{ instruction }}"""

dataset = load_dataset("AI-MO/NuminaMath-TIR", split="train").select(range(10))
model_id = "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B"

with Pipeline(
    name="distill-qwen-7b-r1",
    description="A pipeline to generate data from a distilled r1 model",
) as pipeline:
    llm = vLLM(
        model=model_id,
        tokenizer=model_id,
        extra_kwargs={
            "tensor_parallel_size": 1,
            "max_model_len": 8192,
        },
        generation_kwargs={
            "temperature": 0.6,
            "max_new_tokens": 8192,
        },
    )
    
    text_generation = TextGeneration(
        llm=llm,
        template=prompt_template,
        num_generations=4,
        input_mappings={"instruction": "problem"}
    )

if __name__ == "__main__":
    distiset = pipeline.run(dataset=dataset)
    distiset.push_to_hub(repo_id="username/numina-deepseek-r1-qwen-7b")

Large-Scale Data Generation

For generating data from the full DeepSeek-R1 model, use the provided Slurm scripts:

# Install dependencies for large-scale generation
pip install https://wheels.vllm.ai/221d388cc5a836fa189305785ed7e887cea8b510/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl
uv pip install "distilabel[vllm,ray,openai]>=1.5.2"

# Submit generation job
sbatch slurm/generate.slurm \
    --hf-dataset AI-MO/NuminaMath-TIR \
    --temperature 0.6 \
    --prompt-column problem \
    --model deepseek-ai/DeepSeek-R1 \
    --hf-output-dataset username/r1-dataset

Key Datasets and Models

Open-R1 has produced several high-quality datasets and models:

Datasets

Mixture-of-Thoughts: 350k verified reasoning traces across math, coding, and science
OpenR1-Math-220k: 220k mathematical reasoning traces
CodeForces-CoTs: 10k competitive programming problems with 100k solutions

Models

OpenR1-Distill-7B: Matches DeepSeek-R1-Distill-Qwen-7B performance
Various size variants: From 0.6B to 70B parameters

Advanced Features

Dataset Mixing

Combine multiple datasets with custom weights:

# config.yaml
dataset_mixture:
  datasets:
    - id: open-r1/Mixture-of-Thoughts
      config: all
      split: train
      weight: 0.7
    - id: open-r1/OpenR1-Math-220k
      config: default
      split: train
      weight: 0.3
  seed: 42
  test_split_size: 0.1

Data Decontamination

Ensure clean training data with built-in decontamination:

# Decontaminate dataset
python scripts/decontaminate.py \
    --dataset "open-r1/verifiable-coding-problems-python" \
    --problem_column problem \
    --cleanup

Slurm Integration

Scale training across multiple nodes:

# Multi-node SFT training
sbatch --job-name=open_r1 --nodes=1 slurm/train.slurm \
    --model OpenR1-Distill-7B --task sft --config distill --accelerator zero3

# Multi-node GRPO training (1 node for vLLM + N nodes for training)
sbatch --job-name=open_r1 --nodes=2 slurm/train.slurm \
    --model Qwen2.5-1.5B-Instruct --task grpo --config demo --accelerator zero2 --dp 4 --tp 2

Performance Benchmarks

Open-R1 models achieve competitive performance across multiple reasoning benchmarks:

AIME 2024 Results

Model	Open-R1 Score	DeepSeek Reported
1.5B	30.7	28.9
7B	50.8	55.5
32B	69.7	72.6

MATH-500 Results

Model	Open-R1 Score	DeepSeek Reported
1.5B	83.1	83.9
7B	94.5	92.8
32B	95.6	94.3

Best Practices and Tips

Hardware Requirements

Minimum: Single H100 80GB for 7B models
Recommended: 8x H100 80GB for optimal training speed
Large Models: Multi-node setup for 32B+ models

Training Tips

Use gradient checkpointing to reduce memory usage
Enable Liger kernel for improved performance
Scale batch size with number of GPUs to maintain consistency
Monitor training with Weights & Biases integration

Chat Template Considerations

Pay attention to EOS tokens and chat templates:

# For Qwen base models
--eos_token '<|im_end|>'

# For Llama models with custom template
--chat_template "$(cat llama_chat_template.jinja)" \
--eos_token '<|eot_id|>'

Community and Ecosystem

Open-R1 benefits from a vibrant community and ecosystem:

44+ Contributors: Active development from the global AI community
Integration Partners: vLLM, SGLang, OpenThoughts, Prime Intellect
Regular Updates: Continuous improvements and new features
Comprehensive Documentation: Detailed guides and examples

Future Roadmap

The Open-R1 project continues to evolve with exciting developments:

Step 2 Completion: Pure RL pipeline implementation
Step 3 Implementation: End-to-end base-to-RL training
New Benchmarks: Additional evaluation tasks and metrics
Performance Optimizations: Faster training and inference
Extended Language Support: Beyond Python code execution

Getting Started Today

Ready to dive into reasoning model development? Here's your quickstart checklist:

Clone the repository: git clone https://github.com/huggingface/open-r1.git
Set up environment: Follow the installation guide
Try the examples: Start with SFT on Mixture-of-Thoughts
Evaluate models: Run benchmarks on your trained models
Generate data: Create custom reasoning datasets
Join the community: Contribute to the project on GitHub

Conclusion

Open-R1 represents a watershed moment in AI reasoning model development. By providing a fully open, reproducible framework for training DeepSeek-R1-level reasoning models, Hugging Face has democratized access to cutting-edge AI capabilities. Whether you're a researcher exploring new reasoning techniques, a developer building AI applications, or an organization looking to deploy reasoning models, Open-R1 provides the tools and infrastructure you need.

The project's comprehensive approach – from data generation and training to evaluation and deployment – makes it an invaluable resource for the AI community. With its proven ability to match proprietary model performance and its active development community, Open-R1 is positioned to drive the next wave of innovations in AI reasoning.

Start your journey with Open-R1 today and join the movement to make advanced AI reasoning accessible to everyone.

For more expert insights and tutorials on AI and automation, visit us at decisioncrafters.com.

Open-R1: The Revolutionary Open-Source Framework That's Democratizing DeepSeek-R1 Reasoning with 25k+ GitHub Stars

What is Open-R1?

Key Features

The Three-Step Master Plan

Installation and Setup

Quick Installation

Alternative: Make Installation

Training Your First Reasoning Model

Supervised Fine-Tuning (SFT)

Reproducing OpenR1-Distill-7B

Group Relative Policy Optimization (GRPO)

Training with Code Interpreter

Comprehensive Model Evaluation

Single GPU Evaluation

Multi-GPU Evaluation

Makefile Shortcuts

Synthetic Data Generation

Generate Data from Distilled Models

Large-Scale Data Generation

Key Datasets and Models

Datasets

Models

Advanced Features

Dataset Mixing

Data Decontamination

Slurm Integration

Performance Benchmarks

AIME 2024 Results

MATH-500 Results

Best Practices and Tips

Hardware Requirements

Training Tips

Chat Template Considerations

Community and Ecosystem

Future Roadmap

Getting Started Today

Conclusion

Read more

Open R1: The Revolutionary Open-Source Framework That's Democratizing DeepSeek-R1 with 25k+ GitHub Stars

CSM: The Revolutionary Conversational Speech Model That's Transforming AI Voice Generation with Llama Architecture

Nano-vLLM: The Lightweight LLM Inference Engine That's Outperforming vLLM with Just 1,200 Lines of Code

STORM: The Revolutionary AI Knowledge Curation System That's Transforming Research and Report Generation with 27k+ GitHub Stars