Alibaba’s New Qwen 2.5-Max Model Takes on DeepSeek in AI Benchmarks

Alibaba launches Qwen 2.5-Max, a new AI model challenging DeepSeek V3 in key performance benchmarks, with OpenAI API compatibility and Alibaba Cloud access.

Alibaba has introduced Qwen 2.5-Max, a large-scale Mixture-of-Experts (MoE) AI model designed to improve reasoning, problem-solving, and coding efficiency.

The model has been trained on over 20 trillion tokens and incorporates Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) to refine its accuracy across various tasks.

Qwen 2.5-Max is available through Alibaba Cloud’s API and is also integrated into Qwen Chat, where developers and researchers can explore its capabilities.

With OpenAI API compatibility, existing GPT-based applications can integrate Qwen 2.5-Max with minimal adjustments. Alibaba’s move to expand its AI portfolio signals a direct challenge to DeepSeek, which has rapidly gained attention with its own high-performing AI models.

The Rise of DeepSeek and Alibaba’s Competitive Response

DeepSeek has emerged as a major competitor in China’s AI sector, having launched two significant models in recent months. In December 2024, it introduced DeepSeek V3, a cost-efficient AI model optimized for natural language processing (NLP), multilingual applications, and conversational AI.

The model uses a Mixture-of-Experts (MoE) architecture, which allows it to activate only a subset of its parameters per task, improving efficiency while reducing computational costs.

Last week DeepSeek released DeepSeek R1, a model designed specifically for reasoning, complex problem-solving, and advanced mathematical tasks, causing turmoil in the AI industry and affecting global financial markets. Microsoft and OpenAI have started investigating after allegations emerged that DeepSeek illegally obtained confidential training data for DeepSeek R1.

Unlike V3, which prioritizes efficiency, R1 employs reinforcement learning (RL) and chain-of-thought (CoT) techniques to break down complex logical challenges into step-by-step solutions.

While DeepSeek’s V3 model is highly cost-effective, with input costs of $0.14 per million tokens and output costs of $0.28 per million tokens, the more advanced R1 model comes with a significantly higher price.

The reasoning-focused model requires $0.55 per million tokens for input and $2.19 per million tokens for output. This difference reflects the additional computational power required for reinforcement learning and advanced reasoning capabilities.

Alibaba’s Qwen 2.5-Max enters the AI competition as a hybrid solution, combining the efficiency of DeepSeek V3 with reasoning capabilities similar to R1 while maintaining a balance between cost and performance.

Benchmark Performance: Qwen 2.5-Max vs. DeepSeek V3 and Other AI Models

Alibaba reports that Qwen 2.5-Max outperforms DeepSeek V3 in multiple AI evaluation tests. The model has achieved high scores on Arena-Hard, which assesses how AI responses align with human preferences, and LiveBench, a widely used benchmark for measuring AI performance across real-world applications.

It has also demonstrated strong results in LiveCodeBench, which evaluates AI-driven coding capabilities, and GPQA-Diamond, a test designed for knowledge-based reasoning.

Source: Alibaba

The company also states that Qwen 2.5-Max has delivered competitive results in MMLU-Pro, a benchmark that assesses college-level knowledge and reasoning skills.

While proprietary models such as OpenAI’s GPT-4o and Anthropic’s Claude-3.5-Sonnet are not directly available for benchmarking, Alibaba claims that Qwen 2.5-Max performs at a comparable level based on open evaluation tests.

Source: Alibaba

Alibaba emphasized its confidence in future iterations of the model, stating, “Our base models have demonstrated significant advantages across most benchmarks, and we are optimistic that advancements in post-training techniques will elevate the next version of Qwen 2.5-Max to new heights.”

Technical Advancements in Qwen 2.5-Max

Alibaba’s AI research strategy extends beyond Qwen 2.5-Max and includes models designed for multimodal and long-context applications.

The Qwen2.5-VL model is designed for processing text, images, and videos, expanding Alibaba’s capabilities in multimodal AI. Another model, Qwen2.5-1M, can handle context sizes of up to one million tokens and uses Dual Chunk Attention (DCA) to maintain accuracy when processing extended sequences.

These advancements reflect Alibaba’s focus on improving multimodal intelligence and long-context comprehension, positioning its models to compete against OpenAI’s GPT-4o and Google’s Gemini 2.0 Flash.

DeepSeek Faces Regulatory Scrutiny

While Alibaba expands its AI portfolio, DeepSeek has come under increasing scrutiny regarding its training methods and access to high-performance computing hardware. Italian regulators have started investigating whether DeepSeek transfers European user data to China, raising concerns about compliance with the General Data Protection Regulation (GDPR).

Microsoft and OpenAI have launched an internal review to determine whether DeepSeek may have improperly accessed OpenAI’s training data during the development of its models. The investigation follows industry concerns that DeepSeek has accelerated its AI advancements in a manner that suggests possible unauthorized access to proprietary data.

Adding to the controversy, Scale AI CEO Alexandr Wang has alleged that DeepSeek obtained 50,000 Nvidia H100 GPUs, despite U.S. trade restrictions designed to limit China’s access to advanced AI chips. In an interview with CNBC, Wang stated, “DeepSeek has about 50,000 Nvidia H100 GPUs. They can’t talk about it because it violates U.S. export controls. The Chinese labs, they have more H100s than people think. The reality is that they stockpiled before the full sanctions took effect, and now they are leveraging them to push their AI forward.”

DeepSeek officially maintains that it trained V3 using 2,048 Nvidia H800 GPUs, a restricted version of the H100 that is compliant with U.S. trade policies. However, industry analysts remain skeptical of how DeepSeek has achieved such high performance at a fraction of the typical cost and computational requirements.

China’s AI Competition and Global Implications

Alibaba’s introduction of Qwen 2.5-Max underscores the growing AI competition in China, where companies are racing to advance their technology while navigating regulatory challenges and geopolitical restrictions.

With U.S. sanctions limiting China’s access to high-performance AI hardware, both Alibaba and DeepSeek are focusing on optimizing efficiency through MoE and reinforcement learning architectures.

If allegations concerning DeepSeek’s data practices and potential hardware violations are substantiated, the company could face increased scrutiny and restrictions that impact its access to global markets.

This could shift the balance of power in China’s AI sector, with Alibaba positioning itself as a leading competitor in the wake of regulatory pressure on DeepSeek.

The competition between Alibaba and DeepSeek highlights several major trends shaping the future of AI. Open-source models are increasingly closing the gap with proprietary systems such as GPT-4o and Claude-3.5-Sonnet, making high-performance AI more accessible.

Cost efficiency is becoming a key factor in AI development, with models like DeepSeek V3 and Alibaba’s Qwen series demonstrating how MoE architectures can reduce expenses while maintaining high performance.

Regulatory oversight is expanding as governments intensify their monitoring of AI training data, computing resources, and data privacy concerns. The rapid advancements in AI continue to reshape the industry, and the rivalry between Alibaba’s Qwen models and DeepSeek’s AI systems will likely play a significant role in defining the future of large-scale AI development.

Markus Kasanmascheff
Markus Kasanmascheff
Markus has been covering the tech industry for more than 15 years. He is holding a Master´s degree in International Economics and is the founder and managing editor of Winbuzzer.com.

Recent News

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
0
We would love to hear your opinion! Please comment below.x
()
x