Mistral AI Introduces Mistral Small 3, a High-Speed Open LLM to Compete with GPT-4o Mini

Mistral AI's Small 3 has debuted as an open-source alternative, offering impressive performance and local deployment that competes with models like GPT-4o mini.

Mistral AI, the French artificial intelligence company, has introduced Small 3, a 24-billion-parameter large language model (LLM) designed as a local, open-source alternative to proprietary models such as OpenAI’s GPT-4o mini.

The company is positioning Small 3 as a highly efficient model optimized for rapid, accurate responses, capable of running on a standard MacBook with 32GB of RAM, making it accessible to a wider range of users.

The launch arrives in tandem with the Allen Institute for AI (Ai2) releasing Tülu 3 405B, a customized version of Meta’s Llama 3.1, emphasizing the continuing strength of open-source advancements in the AI sector. The new releases come at a time when Mistral AI has confirmed plans for an initial public offering and is expanding its operations into the Asia-Pacific region.

Optimized for Speed and Efficiency

Mistral AI’s Small 3 distinguishes itself through its focus on speed and accessibility. Unlike many larger models, Small 3 has fewer layers, which notably reduces processing time. This design allows it to perform on par with models such as Meta’s Llama 3.3 70B and Alibaba’s Qwen 32B, but with significantly lower latency.

Mistral AI reports that Small 3 achieved an accuracy of over 81% on the MMLU benchmark, a test for measuring language model accuracy across a diverse range of subjects, without the use of reinforcement learning or synthetic data.

Source: Mistral

This places the model earlier in its production stage, compared to models like DeepSeek R1, positioning it as a strong base for further customization and development of reasoning capabilities.

Performance Benchmarks

In internal assessments, Mistral Small 3 demonstrated an output quality that is comparable to larger models, such as Llama 3.3 70B, while achieving significantly faster response times.

Moreover, it exhibited higher output quality and lower latency than OpenAI’s GPT-4o mini. Mistral reports that the model can operate on a single RTX 4090 or a MacBook with 32GB of RAM when quantized.

Quantization is a technique that reduces the precision of the numbers used in AI models, enabling them to run on devices with less computing power. This makes it ideal for hobbyists and organizations that require local inference while handling sensitive data.

The model’s capabilities position it for uses like fast-response conversational assistance, low-latency function execution, and fine-tuning for specific subject areas.

Testing shows that it surpasses the performance of the original Llama model across numerous benchmarks.

Practical Applications of Small 3

Mistral views Small 3 as a versatile solution for many generative AI tasks requiring rapid and reliable instruction following. According to Mistral, clients are evaluating the model for diverse applications, such as fraud detection in financial services, customer triaging in healthcare, and on-device command and control in robotics and manufacturing.

Mistral researchers stated in a blog post that, “Mistral Small 3 is a pre-trained and instructed model catered to the ‘80%’ of generative AI tasks—those that require robust language and instruction following performance, with very low latency.” Its low latency makes it well-suited for automated workflows and AI agents that engage with external applications.

Open-Source Accessibility

The simultaneous release of open-source models by Mistral AI and the Allen Institute for AI demonstrates a commitment to accessible AI development. Mistral’s Small 3 is released under the Apache 2.0 license, which permits its free use, modification, and deployment. Ai2’s Tülu 3 405B, a 405-billion-parameter model derived from Meta’s Llama 3.1, was developed with a thorough post-training process.

This process includes supervised fine-tuning, a method to improve the model’s responsiveness to user prompts, along with DPO for aligning output with user preferences, and the use of the internally developed RLVR, a variation of reinforcement learning that enhances the model’s capacity for tasks like solving math problems.

Strategic Positioning

These model releases occur as Mistral AI is strengthening its position within the global AI industry. CEO Arthur Mensch has stated in an interview with Bloomberg, “We are not for sale,” as the company pursues independence and global growth, as it has confirmed plans for an initial public offering and is expanding its operations into the Asia-Pacific region.

This approach contrasts with some larger AI players’ strategies, while aligning with the current trend toward releasing smaller, efficient models. The launch of Mistral Small 3 comes after Microsoft’s open-sourcing of Phi-4, a 14-billion-parameter model known for its strong performance in reasoning tasks.

Microsoft has also open-sourced the code for rStar-Math, a framework that enables smaller models to exceed the performance of larger AI systems in mathematical reasoning. This direction towards smaller, more efficient models is driven by the understanding that the scaling of AI models is not the only factor to achieve true reasoning and comprehension, as highlighted by Meta’s Yann LeCun.

Markus Kasanmascheff
Markus Kasanmascheff
Markus has been covering the tech industry for more than 15 years. He is holding a Master´s degree in International Economics and is the founder and managing editor of Winbuzzer.com.

Recent News

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
0
We would love to hear your opinion! Please comment below.x
()
x