HomeWinBuzzer NewsNew DeepSeek R1 Reasoning Models Beat OpenAI o1 in Math Benchmarks

New DeepSeek R1 Reasoning Models Beat OpenAI o1 in Math Benchmarks

DeepSeek sets new standards for open-source AI reasoning with its R1 and R1-Zero models, achieving competitive results across multiple benchmarks.

-

DeepSeek has launched its latest open-source AI models, DeepSeek-R1 and DeepSeek-R1-Zero, redefining how reasoning capabilities can be achieved through reinforcement learning (RL).

The new models challenge conventional AI development by proving that supervised fine-tuning (SFT) is not essential for cultivating advanced problem-solving capabilities. With benchmark results rivaling proprietary systems like OpenAI’s o1 series, DeepSeek’s models illustrate the growing potential of open-source AI in delivering competitive, high-performance tools.

The success of these models lies in their unique approaches to Reinforcement Learning (RL), the introduction of cold-start data, and an effective distillation process. These innovations have produced reasoning capabilities in coding, mathematics, and general logic tasks, underscoring the viability of open-source AI as a competitor to leading proprietary models.

Related: DeepSeek AI Open Sources VL2 Series of Vision Language Models

Benchmark Results Highlight Open-Source Potential

DeepSeek-R1’s performance in widely respected benchmarks confirms its capabilities:

In MATH-500, a dataset designed to evaluate mathematical problem-solving, DeepSeek-R1 achieved a Pass@1 score of 97.3%, matching OpenAI’s o1-1217 model. On the AIME 2024 benchmark, which focuses on advanced reasoning tasks, the model scored 79.8%, slightly outperforming OpenAI’s results.

The model’s performance in LiveCodeBench, a benchmark for coding and logic tasks, was equally noteworthy, with a Pass@1-CoT score of 65.9%. According to DeepSeek’s research, this makes it one of the top performers among open-source models in this category.

The company has also invested heavily in distillation, ensuring that smaller versions of DeepSeek-R1 retain much of the reasoning capabilities of the larger models. Notably, the 32-billion-parameter model, DeepSeek-R1-Distill-Qwen-32B, outperformed OpenAI’s o1-mini in several categories while being more computationally accessible.

Reinforcement Learning Without Supervision: DeepSeek-R1-Zero

DeepSeek-R1-Zero is the company’s bold attempt to explore RL-only training. It employs a unique algorithm, Group Relative Policy Optimization (GRPO), which streamlines RL training by eliminating the need for a separate critic model.

Instead, it uses grouped scores to estimate baselines, significantly reducing computational costs while maintaining training quality. This approach enables the model to develop reasoning behaviors, including chain-of-thought (CoT) reasoning and self-reflection.

In their research paper, the DeepSeek team stated:
“DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs. However, it struggles with repetition, readability, and language mixing, making it less suitable for real-world use cases.”

While these emergent behaviors were promising, the model’s limitations highlighted the need for refinement. For example, its outputs were occasionally repetitive or displayed mixed-language issues, reducing usability in practical scenarios.

From RL-Only to Hybrid Training: DeepSeek-R1

To address these challenges, DeepSeek developed DeepSeek-R1, combining RL with supervised fine-tuning. The process began with a curated cold-start dataset of long, human-readable CoTs designed to improve baseline coherence and readability. By training on this foundation, the model entered RL with an improved ability to meet human expectations for clarity and relevance.

Related: LLaMA AI Under Fire: What Meta Isn’t Telling You About “Open Source” Models

DeepSeek described this approach in its documentation:
“Unlike R1-Zero, to prevent the early unstable cold start phase of RL training from the base model, for R1 we construct and collect a small amount of long CoT data to fine-tune the model as the initial RL actor.”

The pipeline also included iterative RL to refine reasoning and problem-solving abilities further, producing a model capable of handling complex scenarios such as coding and mathematical proofs.

Open-Source Accessibility and Future Challenges

DeepSeek has released its models under the MIT License, emphasizing its commitment to open-source principles. This licensing model allows researchers and developers to freely use, modify, and build upon DeepSeek’s work, fostering collaboration and innovation in the AI community.

Despite its successes, the team acknowledges that challenges remain. Mixed-language outputs, prompt sensitivity, and the need for better software engineering capabilities are areas for improvement. Future iterations of DeepSeek-R1 will aim to address these limitations while expanding its functionality to new domains.

The researchers have expressed optimism about their progress, stating:
“By carefully designing the pattern for cold-start data with human priors, we observe better performance against DeepSeek-R1-Zero. We believe the iterative training is a better way for reasoning models.”

Implications for the AI Industry

DeepSeek’s work signals a shift in the AI research landscape, where open-source models can now compete with proprietary leaders. By proving that RL can achieve high-level reasoning without SFT and emphasizing distillation to scale accessibility, DeepSeek has set a benchmark for future AI research.

As open-source AI continues to evolve, DeepSeek-R1’s advancements provide a blueprint for leveraging RL to produce practical, high-performing models.

Markus Kasanmascheff
Markus Kasanmascheff
Markus has been covering the tech industry for more than 15 years. He is holding a Master´s degree in International Economics and is the founder and managing editor of Winbuzzer.com.

Recent News

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
0
We would love to hear your opinion! Please comment below.x
()
x