Baidu is stepping up its efforts to compete with the likes of OpenAI, Google, Anthropic, xAI, and DeepSeek with the release of its ERNIE 4.5 and ERNIE X1 models.
Baidu is implementing a competitive pricing model, providing enterprise access to ERNIE 4.5 at a rate of only RMB 0.004 per thousand tokens for input and RMB 0.016 per thousand tokens for output. The company asserts that these prices are just 1% of the cost of OpenAI’s GPT-4.5 model.
Baidu’s introduction of ERNIE 4.5 and ERNIE X1 showcases important breakthroughs in both multimodal understanding and advanced reasoning capabilities. ERNIE 4.5 is capable of processing and synthesizing text, images, audio, and video, whereas ERNIE X1 enhances reasoning power and the ability to interact with external tools.
This decision positions Baidu to make a major impact in the AI market, not just in China but also internationally, where US Anthropic and OpenAI currently lead.
We've just unveiled ERNIE 4.5 & X1! 🚀
— Baidu Inc. (@Baidu_Inc) March 16, 2025
As a deep-thinking reasoning model with multimodal capabilities, ERNIE X1 delivers performance on par with DeepSeek R1 at only half the price. Meanwhile, ERNIE 4.5 is our latest foundation model and new-generation native multimodal model.… pic.twitter.com/cLKVHYvbzw
ERNIE 4.5: Leading the Charge in Multimodal AI
At the core of Baidu’s strategy is ERNIE 4.5, a multimodal AI model capable of processing both text and images, which puts it in direct competition with models like OpenAI’s GPT-4o and GPT-4.5.
When it comes to text-based tasks, ERNIE 4.5 competes directly with OpenAI’s GPT-4.5. ERNIE 4.5 scored a solid 79.6% in text-based benchmarks, surpassing GPT-4o, GPT-4.5 and DeepSeek’s V3 model in various benchmarks.
ERNIE 4.5 performed particularly well in Chinese-language tasks like Chinese MMLU and Chinese SimpleQA, areas where OpenAI’s GPT-4 and GPT-4.5 tend to fall short. This strong showing in tasks relevant to the Chinese market gives ERNIE 4.5 an edge in regions where language and cultural nuances matter.

While GPT-4.5 outperforms ERNIE 4.5 in more intricate problem-solving tasks like C-Eval and BBH, ERNIE 4.5 does represent a serious contender for text-based tasks, especially those in the Chinese-language domain.
In multimodal benchmarks—where models are tested on their ability to handle both text and image data—ERNIE 4.5 excels. The model achieved a robust 77.77% in multimodal tests, outperforming GPT-4o, which scored 73.92% in the same set of tests.
This demonstrates that ERNIE 4.5 is particularly adept at cross-modal tasks, such as image captioning and multimodal problem-solving—areas where traditional models have historically struggled.

The strong multimodal performance of ERNIE 4.5 can be attributed to its architecture, which Baidu says effectively integrates textual and visual data. For example, in tasks like visual question answering, ERNIE 4.5 showcases its ability to handle both modalities seamlessly, offering significant advancements over GPT-4o.
As with any multimodal AI system, the computational cost associated with ERNIE 4.5’s capabilities could pose a challenge for companies seeking to deploy it on a large scale.
Multimodal systems require significant processing power and energy resources, which could limit their applicability in environments with stringent performance or power constraints.
ERNIE X1: Pushing the Boundaries of Deep Reasoning
While ERNIE 4.5 is designed for multimodal tasks, Baidu’s ERNIE X1 model is focused on deep reasoning and is aimed at solving complex, multi-step problems.
Baidu claims that ERNIE X1 delivers performance comparable to DeepSeek R1 at half the price, which would positions the company as a technological leader in the AI space. The ability to achieve this while offering dramatically lower pricing suggests either significant efficiency advantages or a strategic willingness to operate at lower margins.
ERNIE X1 is optimized for multi-step reasoning, making it ideal for applications in industries such as finance, law, and healthcare, where complex decision-making and deep logical analysis are crucial.
This would position ERNIE X1 as a direct competitor to o3-mini, Claude 3.7 Sonnet, and xAI’s Grok-3, which currently dominate the reasoning AI space. However, so far, comparable benchmark results for X1 have not been provided by Baidu.
However, the power demands associated with such advanced reasoning can drive up operational costs, particularly in real-time environments where latency is a critical concern.
Baidu vs. OpenAI, Anthropic, xAI and DeepSeek
Baidu enters a rapidly evolving and highly competitive AI landscape, where industry leaders like OpenAI and Anthropic dominate the global scene. OpenAI’s GPT-4o has established itself as the benchmark for general-purpose reasoning and multimodal capabilities, setting a high bar in tasks requiring both advanced language processing and logical inference.
Similarly, Anthropic’s Claude 3.7 Sonnet has emerged as the gold standard for logical reasoning, outperforming many competitors in key problem-solving benchmarks. These two models, along with the performance of other global contenders, have shaped the current AI market, making it difficult for new entrants to gain significant ground.
However, Baidu is not alone in its ambition to capture the AI market. In China, the competition is fierce, with major players like Tencent’s Hunyuan Turbo and Alibaba’s Qwen models rapidly advancing in areas like multimodal AI and deep reasoning.
Tencent’s Hunyuan Turbo-S model, for instance, has demonstrated impressive performance in benchmarks that emphasize reasoning speed and accuracy, challenging both OpenAI and other emerging AI systems.

Similarly, Alibaba’s Qwen models have made significant progress in the multimodal AI space, positioning the company as a serious contender in the Chinese and global markets. These Chinese competitors are vying for dominance in their home market and globally, making the competitive landscape even more challenging for Baidu’s ERNIE models.
In this context, Baidu’s ERNIE 4.5 and ERNIE X1 models might carve out their own space by offering strong multimodal and reasoning capabilities. ERNIE 4.5 is particularly competitive in cross-modal tasks, showcasing its ability to handle both text and images effectively. ERNIE X1, on the other hand, focuses on deep reasoning, positioning itself as a formidable option for industries requiring advanced decision-making and logical problem-solving.
Despite their strengths, Baidu’s ERNIE models will still face significant challenges when compared to the industry leaders, particularly in high-stakes areas like logical reasoning, where Claude 3.7 Sonnet and OpenAI o3-mini have set the standard.
In China, DeepSeek is currently reading the successor to its highly impactful R1 reasoning model in May, dubbed DeepSeek R2, which will most probably again set the bar higher.
Furthermore, while Baidu has made progress with its AI models, it must continue to address key performance gaps to compete on equal footing with its global and domestic competitors, particularly those from Tencent and Alibaba, who are expanding their AI offerings with high efficiency.
A crucial element of Baidu’s strategy is its decision to offer ERNIE Bot for free, a move that could be a game-changer in the AI race. By making ERNIE Bot available to the public, Baidu not only stands to expand its user base but also gains valuable user data, which will help refine its models over time.
While the move might slow down immediate monetization, Baidu’s long-term vision is clear: By continuously refining its AI models and gathering user feedback, it aims to position its ERNIE series as more competitive alternatives to both OpenAI and Anthropic, as well as to its Chinese competitors like Tencent and Alibaba.
Baidu’s goal is to establish itself as a dominant player not just in China, where it faces considerable local competition, but eventually on the global stage.
This strategy is in line with Baidu’s broader efforts to stay relevant in the fiercely competitive Chinese AI market while simultaneously positioning its models for a broader global audience.
Looking to the future, Baidu’s ERNIE 5 model is expected to further advance both multimodal and reasoning capabilities, with a release scheduled for the latter half of 2025.
ERNIE 5 will likely focus on real-time video processing and enhanced logical inference, which are crucial for high-performance AI systems in industries that require instant data interpretation.