DeepSeek, the Chinese AI startup currently making waves in the tech world and sending stock markets lower globally, has launched its Janus model family, a groundbreaking series of multimodal models designed for image understanding and generation.
The release of the Janus model series follows the impactful release of the DeepSeek R1 reasoning model last week, which is has disrupted the AI industry with unprecedented high performance and cost efficiency.
Trained on Nvidia H800 GPUs under U.S. sanctions, R1 matches or exceeds the benchmarks of models like OpenAI’s o1 while costing a fraction to develop. The company’s rise has drawn global attention, with its app now surpassing ChatGPT as the top download on Apple’s U.S. App Store.
Related: Meta Employees Say Their AI Team Is in “Panic Mode” After DeepSeek R1 Model Release
DeepSeek’s Janus Pro Model Beats DALL-E 3 and Stable Diffusion XL
Available under the open-source MIT license, Janus models range from 1 billion to 7 billion parameters and outperform OpenAI’s DALL-E 3 and Stability AI’s Stable Diffusion XL in key benchmarks like GenEval and DPG-Bench.
With this release, DeepSeek challenges once more the assumption that large-scale computational resources are essential for cutting-edge AI development.
The Janus-Pro-7B, the largest model in the series, demonstrates exceptional capabilities, achieving a 9.51 MJHQ FID score for image generation and delivering unmatched performance in instruction-following tasks.
Related: China’s DeepSeek R1 Reasoning Model and OpenAI o1 Contender is Heavily Censored
In addition to the Janus-Pro model, the Janus series includes smaller, versatile models designed for a wide range of applications.
The Janus-1.3B model is one of the smaller-scale versions in the Janus family, designed to balance computational efficiency with multimodal capabilities. Featuring 1.3 billion parameters, this model is particularly well-suited for tasks requiring compact yet effective AI solutions, such as lightweight deployments on consumer-grade hardware or edge devices.
Its architecture incorporates a decoupled encoder system, which separates visual understanding from generation tasks, reducing interference and improving task-specific accuracy. While smaller than Janus-Pro, Janus-1.3B achieves commendable results on benchmarks like GenEval, demonstrating its ability to perform well in instruction-following and image analysis with fewer resources.
The JanusFlow-1.3B model serves as a foundational entry in the series, pioneering the integration of rectified flow for image generation tasks. Rectified flow optimizes the dynamics of latent variables, delivering higher semantic consistency and visual fidelity without the iterative noise-reduction processes characteristic of diffusion models.
This minimalist approach reduces computational overhead, making JanusFlow a practical solution for environments with hardware constraints. JanusFlow’s architecture also features a decoupled encoder system, which has proven critical in improving multimodal task performance by isolating visual understanding and generation pathways.
At the heart of the Janus models is a decoupled encoder design, which separates visual understanding from image generation tasks. This architectural choice ensures that neither task interferes with the other, optimizing performance for both.

As DeepSeek explains in its documentation, “The performance differences between shared and decoupled encoder designs validate the necessity of separate visual encoders for understanding and generation tasks.”
The models also leverage rectified flow, a technique that simplifies image generation by optimizing latent variable dynamics. Unlike diffusion models, which rely on iterative noise-reduction processes, rectified flow enhances semantic accuracy and visual fidelity while reducing computational complexity.

Training Strategy and Efficiency
The training of Janus models follows a meticulous three-stage process:
- Adaptation of Components: Randomly initialized encoders and decoders are optimized for specific tasks.
- Unified Pre-Training: Multimodal datasets are employed to develop understanding and generation capabilities simultaneously.
- Supervised Fine-Tuning: Task-specific datasets enhance the models’ accuracy in real-world applications.
This streamlined approach allows the Janus models to outperform larger models while maintaining a manageable computational footprint.
“We train our model in three sequential stages: adaptation of randomly initialized components, unified pre-training with multimodal data, and supervised fine-tuning using instruction tuning datasets,” DeepSeek notes.

The Janus family builds on the success of DeepSeek’s R1 model, which is currently making waves by demonstrating that high-performance AI can be achieved under stringent hardware restrictions.
Using Nvidia H800 GPUs, a throttled version of advanced chips restricted by U.S. export controls, DeepSeek optimized training processes to achieve benchmarks comparable to OpenAI’s o1 model—at a fraction of the cost.
“We estimate that the best domestic and foreign models may have a gap of one-fold in model structure and training dynamics,” said Liang Wenfeng, founder of DeepSeek. “For this reason, we need to consume four times more computing power to achieve the same effect. What we need to do is continuously narrow these gaps.”
DeepSeek’s commitment to open-source collaboration sets it apart from competitors. By releasing the Janus models under an MIT license, the company provides developers worldwide with access to training recipes, model weights, and implementation details. This transparency encourages collaboration and innovation within the AI community.
Geopolitical Implications and Industry Competition
DeepSeek’s rise coincides with heightened geopolitical tensions between the U.S. and China over access to advanced AI technologies. U.S. export restrictions aimed at limiting China’s technological advancements have inadvertently pushed Chinese companies like DeepSeek to innovate under constrained conditions.
By stockpiling H800 GPUs and optimizing their usage, DeepSeek has turned hardware limitations into a competitive advantage.
The Janus models also pose a challenge to resource-heavy strategies adopted by competitors like Meta.
Meta CEO Mark Zuckerberg recently announced plans to deploy over 1.3 million GPUs in 2025, emphasizing the company’s focus on large-scale infrastructure. In contrast, DeepSeek’s leaner approach proves that efficiency and strategic innovation can rival brute computational force.