Alibaba has released Qwen3, a new family of large language models aiming to compete with top AI offerings from OpenAI and Google through distinctive features like “hybrid thinking” modes and broad multilingual support. In a move consistent with its previous open-source releases like Wan 2.1, the Chinese tech giant made most models in the Qwen3 series available under an Apache 2.0 license via platforms including Hugging Face, GitHub, Alibaba’s ModelScope, and Kaggle.
Hybrid Thinking and Multilingual Capabilities
Central to Qwen3 is its dual operational approach. A default “Thinking Mode” allows the models to perform step-by-step reasoning for complex tasks like math or coding, often outputting these intermediate steps within tags before the final response.
Conversely, a “Non-Thinking Mode” provides faster, direct answers for simpler interactions. Developers can toggle this behavior using an enable_thinking
parameter or specific /think
and /no_think
tags within prompts. The Qwen team’s announcement highlighted this flexibility: “This design enables users to configure task-specific budgets with greater ease, achieving a more optimal balance between cost efficiency and inference quality.”
Best practices documentation on the Hugging Face model card for Qwen3-0.6B-FP8 advises distinct sampling parameters for each mode and warns against greedy decoding in thinking mode.
Qwen3 models also support 119 languages and dialects, aiming for robust multilingual instruction following. The models handle various context lengths; smaller models like the 0.6B variant have a native 32K token window, while larger models can reportedly support up to 128K or 131K tokens through techniques like YaRN scaling.
Performance Claims and Model Variants
The Qwen3 series includes several open-weight models, such as dense versions from 0.6B to 32B parameters, and two Mixture-of-Experts (MoE) models: Qwen3-30B-A3B and the flagship Qwen3-235B-A22B (which isn’t yet downloadable). These MoE models use 128 total experts but only activate 8 per token (around 3B active parameters for the 30B model, 22B for the 235B variant), a technique designed for computational efficiency, possibly spurred by US sanctions limiting access to high-performance chips.
Alibaba positions Qwen3’s performance aggressively. The flagship 235B model is claimed to rival models like OpenAI’s o3-mini and Google’s Gemini 2.5 Pro on specific coding and math benchmarks.

The Qwen team states their open Qwen3-30B-A3B model outcompetes their previous QwQ-32B model, and that the small Qwen3-4B can rival the performance of the much larger Qwen2.5-72B-Instruct. The publicly available Qwen3-32B is also claimed to surpass OpenAI’s o1 model on coding tests like LiveCodeBench. These claims follow earlier reports where Alibaba benchmarked its Qwen 2.5-Max model favorably against DeepSeek V3.

Training, Architecture, and Usage
The models were pre-trained on a dataset reported to be around 36 trillion tokens, incorporating web text, code, text extracted from PDFs (using Qwen2.5-VL), and synthetic data generated via earlier Qwen models specialized in math and code. The post-training process involved four stages, including reinforcement learning and specific steps to fuse the thinking and non-thinking capabilities. For agentic tasks, Qwen3 supports the Model Context Protocol (MCP), with Alibaba recommending its Qwen-Agent framework.
Developers can use Qwen3 via standard Hugging Face `transformers` (latest version advised), deployment frameworks like SGLang and vLLM, or local tools such as Ollama and LMStudio. An FP8-quantized 0.6B model is offered for efficiency, though potential adjustments might be needed for certain frameworks like vLLM. Alibaba also clarified its new naming scheme, removing “-Instruct” from post-trained models and adding “-Base” to base models.
Qwen3 enters a dynamic AI landscape. Alibaba claims the Qwen family constitutes the world’s largest open-source AI ecosystem by derivative models, citing over 100,000 on Hugging Face. Qwen3 is already integrated into Alibaba’s Quark AI assistant, which led Chinese chatbots in monthly active users in March 2025. The release follows Alibaba’s earlier Qwen 2.5 (January 2025) and QwQ models (Feb/March 2025).
China’s Crowded AI Arena
Qwen3 emerges into a fiercely competitive domestic AI market. DeepSeek AI made significant waves with its efficient DeepSeek V3 (Dec 2024) and the potent DeepSeek R1 reasoning model (Jan 2025). However, DeepSeek has since faced considerable international scrutiny, including data privacy investigations in Italy, an internal review by Microsoft and OpenAI over alleged improper data access, and a critical report from the US House Select Committee on the CCP (April 16, 2025) labeling it a national security risk and alleging espionage and IP theft.
Scale AI CEO Alexandr Wang also claimed in late January that “DeepSeek has about 50,000 Nvidia H100 GPUs. They can’t talk about it because it violates U.S. export controls… The reality is that they stockpiled before the full sanctions took effect…” DeepSeek officially maintains it used compliant H800 GPUs. Recently, DeepSeek has shifted towards open-sourcing infrastructure like the 3FS file system and research like Self-Principled Critique Tuning (SPCT), while other players use DeepSeek’s open-source data to create modified versions such as the recently released DeepSeek-R1T-Chimera model, which is merging R1 and V3 components.
Other major players are also pushing hard. Baidu recently escalated the price war with its ERNIE Turbo models (April 25, 2025), offering significant cost reductions after launching the capable ERNIE 4.5 and X1 models in March and making its ERNIE Bot free in February.
Tencent launched its Hunyuan Turbo S (Feb 2025) focused on speed and the reasoning-centric Hunyuan T1 (March 2025), while also confirming its use of DeepSeek models for efficiency. Meanwhile, Zhipu AI, backed partly by Alibaba, released its free AutoGLM agent (March 2025) and is pursuing an IPO. Alibaba itself integrated earlier Qwen models into its Quark AI assistant.