HomeWinBuzzer NewsHow DeepSeek R1 Surpasses ChatGPT o1 Under Sanctions, Redefining AI Efficiency Using...

How DeepSeek R1 Surpasses ChatGPT o1 Under Sanctions, Redefining AI Efficiency Using Only 2,048 GPUs

DeepSeek’s R1 model matches OpenAI’s o1 by overcoming hardware limitations, drawing admiration from prominent figures like Yann LeCun and Mark Zuckerberg.

-

DeepSeek’s new reasoning model called R1 challenges the performance of OpenAI’s ChatGPT o1—even though it relies on throttled GPUs and a comparatively small budget.

In an environment shaped by U.S. export controls restricting advanced chips, the Chinese artificial intelligence startup founded by hedge fund manager Liang Wenfeng, has shows how efficiency and resource sharing can propel AI development forward.

The company’s rise has captured the attention of technology circles in both China and the United States. DeepSeek R1 provides cutting edge performance, while being at the same time being censored according to CCP rules.

Related: Meta Employees Say Their AI Team Is in “Panic Mode” After DeepSeek R1 Model Release

DeepSeek’s Rapid Rise

DeepSeek’s journey began in 2021, when Liang, best known for his quant trading fund High-Flyer, began purchasing thousands of Nvidia GPUs.

At the time, this move seemed unusual. As one of Liang’s business partners told the Financial Times, “When we first met him, he was this very nerdy guy with a terrible hairstyle talking about building a 10,000-chip cluster to train his own models. We didn’t take him seriously.”

According to the same source, “He couldn’t articulate his vision other than saying: I want to build this, and it will be a game change. We thought this was only possible from giants like ByteDance and Alibaba.”

Despite the initial skepticism, Liang remained focused on preparing for potential U.S. export controls. This foresight enabled DeepSeek to secure a large supply of Nvidia hardware, including A100 and H800 GPUs, before sweeping restrictions took effect.

RelatedWhy U.S. Sanctions May Struggle to Curb China’s Tech Growth

DeepSeek made headlines by revealing that it had trained its 671-billion-parameter model R1 for only $5.6 million using 2,048 Nvidia H800 GPUs.

Though the H800’s performance is deliberately capped for the Chinese market due to US export restrictions to China, DeepSeek’s engineers optimized the training procedure to achieve high-level results at a fraction of the cost typically associated with large-scale language models.

In an interview published by MIT Technology Review, Zihan Wang, a former DeepSeek researcher, describes how the team managed to reduce memory usage and computational overhead while preserving accuracy.

He said that technical limitations pushed them to explore novel engineering strategies, ultimately helping them remain competitive against better-funded U.S. tech labs.

RelatedChina’s DeepSeek R1 Reasoning Model and OpenAI o1 Contender is Heavily Censored

Exceptional Results on Math and Coding Benchmarks

R1 demonstrates excellent capabilities across various math and coding benchmarks. DeepSeek revealed that R1 scored 97.3% (Pass@1) on MATH-500 and 79.8% on AIME 2024.

These numbers rival OpenAI’s o1 series, showcasing how deliberate optimization can challenge models trained on more powerful chips.

Dimitris Papailiopoulos, a principal researcher at Microsoft’s AI Frontiers lab, told MIT Technology Review, “DeepSeek aimed for accurate answers rather than detailing every logical step, significantly reducing computing time while maintaining a high level of effectiveness.”

Beyond the main model, DeepSeek has released smaller versions of R1 that can run on consumer-grade hardware. Aravind Srinivas, CEO of Perplexity, tweeted in reference to the compact variants, “DeepSeek has largely replicated o1-mini and has open sourced it.”

Chain-of-Thought Reasoning and R1-Zero

In addition to R1’s standard training, DeepSeek ventured into pure reinforcement learning with a variant called R1-Zero. This approach, detailed in the company’s research documentation, discards supervised fine-tuning in favor of Group Relative Policy Optimization (GRPO).

By removing a separate critic model and relying on grouped baseline scores, R1-Zero displayed chain-of-thought reasoning and self-reflection behaviors. However, the team acknowledged that R1-Zero produced repetitive or mixed-language outputs, indicating a need for partial supervision before it could be used in everyday applications.

The open-source ethos behind DeepSeek sets it apart from many proprietary labs. While U.S. companies such as OpenAI, Meta, and Google DeepMind often keep their training methods hidden, DeepSeek makes its code, model weights, and training recipes publicly available.

RelatedMistral AI Debuts Pixtral 12B for Text and Image Processing

According to Liang, this approach stems from a desire to build a research culture that favors transparency and collective progress. In an interview with the Chinese media outlet 36Kr, he explained that many Chinese AI ventures struggle with efficiency compared to their Western peers, and that bridging that gap requires collaboration on both hardware and training strategies.

His point of view aligns with others in China’s AI scene, where open-source releases are on the rise. Alibaba Cloud has introduced over 100 open-source models, and 01.AI, founded by Kai-Fu Lee, recently partnered with Alibaba Cloud to establish an industrial AI laboratory.

The global tech community has responded with a mix of awe and caution. On X, Marc Andreessen, co-inventor of the Mosaic web browser and now a leading investor at Andreessen Horowitz, wrote, “Deepseek R1 is one of the most amazing and impressive breakthroughs I’ve ever seen — and as open source, a profound gift to the world.”

Yann LeCun, Chief AI Scientist at Meta, noted on LinkedIn that while DeepSeek’s achievement might appear to indicate China surpassing the United States, it would be more accurate to say that open-source models collectively are catching up to proprietary alternatives.

“DeepSeek has profited from open research and open source (e.g. PyTorch and Llama from Meta),” he explained. “They came up with new ideas and built them on top of other people’s work. Because their work is published and open source, everyone can profit from it. That is the power of open research and open source.”

View on Threads

Mark Zuckerberg, Meta’s founder and CEO, hinted at a different path forward for AI in 2025 by announcing massive investments in data centers and GPU infrastructure.

On Facebook, he wrote, “This will be a defining year for AI. In 2025, I expect Meta AI will be the leading assistant serving more than 1 billion people, Llama 4 will become the leading state of the art model, and we’ll build an AI engineer that will start contributing increasing amounts of code to our R&D efforts. To power this, Meta is building a 2GW+ datacenter that is so large it would cover a significant part of Manhattan.

We’ll bring online ~1GW of compute in ’25 and we’ll end the year with more than 1.3 million GPUs. We’re planning to invest $60-65B in capex this year while also growing our AI teams significantly, and we have the capital to continue investing in the years ahead. This is a massive effort, and over the coming years it will drive our core products and business, unlock historic innovation, and extend American technology leadership. Let’s go build!”

Zuckerberg’s remarks suggest that resource-intensive strategies remain a major force in shaping the AI sector.

Related: LLaMA AI Under Fire – What Meta Isn’t Telling You About “Open Source” Models

Broadening Impact and Future Prospects

For DeepSeek, the combination of local talent, early GPU stockpiling, and an emphasis on open-source methods has propelled it into a spotlight typically reserved for large tech giants. In July 2024, Liang stated that his team aimed to address what he called an efficiency gap in Chinese AI.

He described many local AI companies requiring double the compute power to match overseas results, compounding that further when data usage is factored in. The hedge fund profits from High-Flyer give DeepSeek a buffer against immediate commercial pressures, allowing Liang and his engineers to concentrate on research priorities. Liang said:

“We estimate that the best domestic and foreign models may have a gap of one-fold in model structure and training dynamics. For this reason alone, we need to consume twice as much computing power to achieve the same effect.

In addition, there may also be a gap of one-fold in data efficiency, that is, we need to consume twice as much training data and computing power to achieve the same effect. Together, we need to consume four times more computing power. What we need to do is to continuously narrow these gaps.”

DeepSeek’s reputation in China also received a boost when Liang became the only AI leader invited to a high-profile meeting with Li Qiang, the country’s second-most powerful official, where he was urged to focus on building core technologies.

Analysts see this as one more signal that Beijing is betting heavily on smaller homegrown innovators to push AI boundaries under hardware restrictions.

While the future remains uncertain—especially as U.S. restrictions may tighten further—DeepSeek stands out for tackling challenges in ways that transform constraints into avenues for rapid problem-solving.

By publicizing its breakthroughs and offering smaller-scale training techniques, the startup has motivated broader discussions about whether resource efficiency can seriously rival massive supercomputing clusters.

As DeepSeek continues refining R1, engineers and policymakers on both sides of the Pacific are watching closely to see if this model’s achievements can pave a sustainable route for AI progress in an era of evolving restrictions.

Last Updated on January 30, 2025 8:57 pm CET

Markus Kasanmascheff
Markus Kasanmascheff
Markus has been covering the tech industry for more than 15 years. He is holding a Master´s degree in International Economics and is the founder and managing editor of Winbuzzer.com.

Recent News

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
0
We would love to hear your opinion! Please comment below.x
()
x