Google Announces Multi-Tiered Gemini AI to Enhance Multimodal Capabilities

Just a few days after rumors it would be delayed, Google has officially introduced its spirited response to Microsoft's and OpenAI's advancements in generative AI with a new suite of models collectively named Gemini. The tech giant has developed three distinct versions of the Gemini AI model, each targeting different levels of computational complexity and application environments.

The Gemini Portfolio: Ultra, Pro, and Nano

At the forefront of Google's announcement is Gemini Ultra, crafted to handle “highly complex tasks” with a marked performance edge. Meanwhile, Gemini Pro is intended to deliver robust and versatile functionality across a broader spectrum of tasks. In contrast, Gemini Nano is designed with a more focused aim: to bring AI capabilities directly to devices through tasks that require less computational heft.

Google contends that Gemini Ultra surpasses OpenAI's GPT-4, the current engine behind Microsoft's Copilot, in a majority of significant language model benchmarks. In a strategic move to interweave different data types, Google developed Gemini with multimodal capabilities from the outset, ensuring it could comprehend and process inputs such as text, images, audio, and video more comprehensively than any existing models of a similar type.

Implications for Coders and Consumers

Google shines a spotlight on the model's prowess with code, stating that the inaugural version of Gemini can autonomously comprehend and generate quality code snippets in widely-used programming languages such as Python, Java, C++, and Go. This aspect marks it as a leading foundational model for coding applications.

“We've been rigorously testing our Gemini models and evaluating their performance on a wide variety of tasks,” Google explains in its announcement. “From natural image, audio and video understanding to mathematical reasoning, Gemini Ultra's performance exceeds current state-of-the-art results on 30 of the 32 widely-used academic benchmarks used in large language model (LLM) research and development.

With a score of 90.0%, Gemini Ultra is the first model to outperform human experts on MMLU (massive multitask language understanding), which uses a combination of 57 subjects such as math, physics, history, law, medicine and ethics for testing both world knowledge and problem-solving abilities.”

Simultaneously, the company has rolled out an update to its Bard chatbot, incorporating Gemini Pro to enrich the tool's advanced reasoning and understanding capabilities, initially available in English. Google foresees further enhancements with the future integration of Gemini Ultra into Bard, promising even more sophisticated features.

Pixel 8 Pro users will also experience the strength of the Gemini model, as Google releases a feature update that injects the on-device Gemini Nano into the smartphone. It empowers the device to summarize recorded audio like conversations and interviews, as well as propose suggested replies to messages.

In driving this forward, Google is setting a stage for acute competition in the realm of generative AI, providing powerful new tools across various platforms and signaling continued innovation in the sector.

Google Announces Multi-Tiered Gemini AI to Enhance Multimodal Capabilities

The Gemini Portfolio: Ultra, Pro, and Nano

Implications for Coders and Consumers

Recent News

Reddit Launches Dynamic Product Ads in Global Public Beta

Google Announces Direct Microsoft 365 App Access on ChromeOS