Why Alibaba’s New Free AI Video Models Are a Big Deal

Alibaba has released its AI video model, Wan 2.1, as open-source, challenging OpenAI’s Sora and Google’s Veo 2, which remain paywalled.

Alibaba has made its AI-powered video and image generation model series, Wan 2.1, freely available as open-source software, positioning itself against proprietary models like OpenAI’s Sora and Google’s Veo 2.

The move signals a major shift in the AI video market, where most high-end models remain locked behind paywalls. While OpenAI and Google have tightly controlled access to their models, Alibaba is betting on broader adoption through open access.

Alibaba’s open-source release of Wan 2.1 is part of a larger push by Alibaba to expand its AI offerings. In December 2024, the company slashed the price of its Qwen-VL models by 85%, making its AI more accessible.

The following month, Alibaba launched Qwen 2.5, a multimodal AI model with a 1-million-token context length, and soon after, it unveiled Qwen 2.5-Max, which uses Mixture-of-Experts (MoE) architecture to optimize processing power.

Now, with Wan 2.1 available for free, Alibaba is increasing competitive pressure on AI firms that continue to commercialize their video models.

Video scene examples created with Wan 2.1 (Source: Alibaba)

Technical Details of Wan 2.1 Series

Wan 2.1 is an open-source AI video generation model series designed for efficiency, scalability, and accessibility. Based on its computational performance and benchmarking results, it stands out as a cost-effective alternative to proprietary AI video tools while maintaining competitive quality.

The Wan 2.1 series includes four AI video generation models optimized for different tasks and computational needs. The T2V-1.3B model is a lightweight text-to-video variant designed for 480P resolution and can run efficiently on consumer GPUs like the RTX 4090.

The T2V-14B and I2V-14B models offer higher quality 720P video generation, requiring enterprise-grade GPUs such as A100, H100, and H800 for optimal performance. Additionally, Wan 2.1 supports image-to-video (I2V), video-to-animation (V2A), and text-to-image (T2I) tasks, enabling smooth motion synthesis and enhanced resolution scaling.

While smaller models prioritize efficiency, larger versions focus on high-quality AI-generated video with improved motion continuity and scene accuracy, making Wan 2.1 one of the most versatile open-source AI video generation frameworks available.

The Wan 2.1 models demonstrate strong scalability across different GPUs, making it accessible to users with both consumer-grade and high-end enterprise hardware. Performance varies depending on the model, resolution, and number of GPUs used.

NVIDIA’s RTX 4090 can handle the T2V-1.3B model at 480P resolution in 261.4 seconds on a single GPU, using 8.19GB of VRAM. When scaled to eight GPUs, performance improves to 112.3 seconds, with memory usage increasing to 12.2GB. The more advanced T2V-14B model at 720P resolution runs on H800/H100 GPUs, with processing time decreasing from 1837.9 seconds on one GPU to 287.9 seconds on eight GPUs, while memory consumption drops from 69.1GB to 29.9GB.

Alibaba’s H20 hardware is optimized for larger models like I2V-14B, requiring 5494.8 seconds and 76.7GB of memory on a single GPU. However, when using eight GPUs, processing time reduces to 778.2 seconds, with peak memory dropping to 32.9GB. The scalability of Wan 2.1 makes it an attractive option for users without access to expensive AI accelerators like those used by OpenAI’s Sora or Google’s Veo.

Wan 2.1 stands out due to its open-source accessibility, making it a unique offering in the AI video generation space. Unlike Sora and Veo 2, which remain proprietary and require enterprise-level infrastructure, Wan 2.1 is available under the Apache 2.0 license, enabling developers and researchers to integrate it into custom AI applications without restrictions.

The model supports multiple AI tasks, including text-to-video (T2V), image-to-video (I2V), video editing (V2A), and text-to-image (T2I). Its low VRAM requirement allows it to run efficiently on consumer GPUs like the RTX 4090, making it far more accessible than competing models, which demand expensive AI accelerators.

Another key advantage is its high temporal precision, achieved through its Video VAE encoder-decoder system, ensuring consistent video coherence at 1080P resolution. Furthermore, Wan 2.1 is optimized for both English and Chinese, making it accessible for global users.

These technical strengths position Wan 2.1 as an affordable, scalable, and high-performance alternative in AI video generation, providing developers with greater flexibility compared to proprietary solutions from OpenAI, Google, and Meta.

How Wan 2.1 Performs Agains OpenAI’s Sora

Alibaba has shared the following benchmark results based on WAN-Bench, a framework designed to evaluate the performance and quality of AI-generated video models, specifically those in the Wan 2.1 series. It provides a structured and standardized assessment across multiple dimensions of video generation, allowing direct comparison with state-of-the-art models like OpenAI’s Sora, Mochi, CogVideoX, and CNTopA variants. WAN-Bench measures different aspects of AI video generation based on objective and subjective criteria.

Source: Alibaba

How Alibaba’s Move Reshapes the AI Video Market

The AI video sector has become one of the most competitive areas of artificial intelligence, with companies racing to deliver more advanced and accessible tools. OpenAI’s Sora made headlines for its ability to generate detailed video content from text, but the model remains behind a paywall.

Google, through Veo 2, has introduced 4K high-definition AI video generation, using the currently the best video generation model available. YouTube has already integrated Veo 2 into its popular Shorts platform.

Amazon has taken a different route with Nova AI, which integrates text, image, and video generation in a cost-optimized way for businesses. Unlike Alibaba, Amazon still monetizes access, but the increasing availability of free AI models could force adjustments in pricing strategies across the industry.

Runway’s Gen-3 Alpha Turbo API is another example of how AI video models are evolving beyond just a few key players. By offering faster processing and more accessible tools, Runway has attracted independent creators and production companies. If open-source alternatives like Wan 2.1 can match this level of quality, it could lead to a fundamental change in how AI video technology is distributed.

Regulatory Concerns and Ethical Implications

The release of an open-source AI video model introduces concerns about misuse, particularly in areas like misinformation and deepfake creation. Governments are already moving to regulate AI-generated content, with the European Union enforcing stricter transparency measures and pushing for digital watermarking in AI-generated media.

In response, companies like Google and Meta have implemented tools such as SynthID and Video Seal, ensuring AI-generated content can be tracked even after modifications.

ByteDance has faced scrutiny after recently releasing OmniHuman-1, an AI that can generate highly realistic deepfake-style videos from a single image. The concerns surrounding AI-generated content highlight the importance of security features, yet Alibaba has not announced whether it will integrate similar protections into Wan 2.1.

How Open-Source AI Could Shift the Industry

Alibaba’s move challenges the notion that high-quality AI models must remain proprietary. It follows a trend seen in image generation, where open-source models like Stability AI’s Stable Diffusion 3.5 have disrupted the dominance of closed systems like OpenAI’s DALL·E. If developers embrace Wan 2.1 at scale, it could pressure companies like OpenAI and Google to reconsider their commercial models or risk losing market share in AI video tools.

Alibaba’s decision to open-source Wan 2.1 could influence the direction of the AI industry. By providing unrestricted access, it lowers the barrier for developers and businesses looking to integrate AI video generation into their products. The model’s availability could also force competitors to reconsider their approach, especially as companies weigh the benefits of openness against the risks of losing control over proprietary technology.

Markus Kasanmascheff
Markus Kasanmascheff
Markus has been covering the tech industry for more than 15 years. He is holding a Master´s degree in International Economics and is the founder and managing editor of Winbuzzer.com.

Recent News

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
0
We would love to hear your opinion! Please comment below.x
()
x