Google has launched Gemini 1.5 Pro within its paid Gemini Advanced subscription, bringing a host of new capabilities to the platform. The update during Google I/O 2024 includes a long context window starting at 1 million tokens, improved code generation, logical reasoning, multi-turn conversation, and enhanced audio and image understanding.
Gemini Advanced with Gemini 1.5 Pro
Gemini Advanced can now handle multiple large documents up to 1,500 pages or summarize 100 emails. Users can upload files via Google Drive or directly from their devices to get insights about dense documents. Google emphasizes that user files remain private and are not used to train their models.
An upcoming feature will allow users to upload and understand spreadsheets and other data files, enabling analysis and custom visualizations. This feature will support Google Sheets, CSVs, and Excel files and is expected to roll out in the coming weeks.
Gemini 1.5 Pro also improves image understanding, allowing users to snap a photo of a dish and get a recipe or take a picture of a math problem for step-by-step instructions. Additionally, it will soon handle an hour of video content or codebases with over 30,000 lines.
Gemini Extensions
Google is expanding Gemini Extensions to include Google Calendar, Tasks, Keep, and other utilities like the Clock app. For instance, users can take a picture of a printed schedule and have Gemini create Calendar events.
The YouTube Music extension, which allows users to search for songs by mentioning a favorite verse or featured artist, is also launching today. These new extensions join existing ones for Gmail, Drive, Docs, Google Flights, Hotels, Maps, and YouTube, and are available to both free and paid Gemini users.
Custom Gemini ‘Gems’
In the coming months, Gemini Advanced users and business customers will be able to create “Gems,” or customized versions of Gemini. These can serve various roles such as a gym buddy, sous chef, coding partner, or creative writing guide. Users can describe their desired Gem’s function and personality, and Gemini will create it based on those instructions.
Pre-made Gems like Learning Coach will be available to all Gemini users, enhancing the platform’s versatility.
Immersive Planner for Gemini Advanced
Gemini Advanced will soon feature an “immersive planner” on the web, capable of creating custom, timeline-based itineraries. This planning tool will integrate flight information from Gmail, local recommendations from Google Maps, and other activities, presenting them in a dynamic UI for easy editing.
Developer Updates: Gemini 1.5 Flash and Gemma 2
Google has also introduced Gemini 1.5 Flash, its fastest and most versatile multimodal AI model. With the same 1 million context window, it is designed for low latency and cost-effective use cases like summarization, chat applications, and data extraction. Flash joins other models such as Gemini Nano, Pro, and Ultra, and is available through the Gemini API in Google AI Studio.
Google is also previewing a 2 million context window for Gemini 1.5 Pro and has added features like parallel function calling and native video frame extraction. A Context Caching capability will soon be available, ideal for scenarios like brainstorming content ideas or analyzing complex documents.
Additionally, Google teased Gemma 2, a 27B parameter model that outperforms larger models and runs on a single TPUv5e, and PliGemma which is a versatile and lightweight vision-language model (VLM). The company also announced its 6th generation TPU, “Trillium”, which offers a significant increase in peak compute performance per chip compared to TPU v5e.
Subscription and Availability
The Gemini Advanced subscription, which costs $20 a month with a two-month free trial, is now available in over 35 languages and 150 countries. Google plans to give two million tokens to Gemini Advanced later this year, enhancing its ability to handle larger files and more complex tasks.
Last Updated on November 7, 2024 8:26 pm CET