Alphabet CEO Sundar Pichai has announced that Google's Gemini 1.5 Pro model will feature an expanded context window of 2 million tokens, doubling its previous capacity. This update, revealed during the Google I/O 2024 developers conference, aims to enhance the performance of Google's large language model (LLM).
Enhanced Data Analysis and Understanding
The context window's expansion from 1 million to 2 million tokens is expected to improve the model's ability to analyze and understand larger sets of data. Tokens, which are segments of words, play a crucial role in how LLMs process and generate language. Each token consists of four characters in English, and the increased capacity allows the model to handle more comprehensive data inputs and outputs.
Google Aims for “Infinite Context”
Tokens are essential for LLMs as they break down words into smaller units for analysis and response generation. The context window determines the amount of data the AI model can remember and utilize, and increasing the number of tokens in this window allows for more detailed and accurate AI responses.
Pichai also mentioned a future goal of achieving “infinite context,” where LLMs can process and output an unlimited amount of data. However, this goal is currently constrained by computational power. Google's research has so far achieved a context window of 10 million tokens, indicating ongoing efforts to push these boundaries.
Advanced Capabilities of Gemini 1.5 Pro
According to Google, Gemini 1.5 Pro can process vast amounts of information in one go, including 1 hour of video, 11 hours of audio, codebases with over 30,000 lines of code, or over 700,000 words. The model can seamlessly analyze, classify, and summarize large amounts of content within a given prompt, such as the 402-page transcripts from Apollo 11's mission to the moon.
Google also says that the model can perform highly sophisticated understanding and reasoning tasks for different modalities, including video, and can accurately analyze various plot points and events in a 44-minute silent Buster Keaton movie. Additionally, Gemini 1.5 Pro can perform more relevant problem-solving tasks across longer blocks of code, such as reasoning across examples, suggesting helpful modifications, and explaining how different parts of the code work.
Gemini 1.5 Pro has also improved “in-context learning” skills, meaning it can learn a new skill from information given in a long prompt without needing additional fine-tuning.
Performance and Evaluation
Gemini 1.5 Pro outperforms 1.0 Pro on 87% of the benchmarks used for developing large language models (LLMs) and performs at a broadly similar level to 1.0 Ultra on the same benchmarks. In the Needle In A Haystack (NIAH) evaluation, 1.5 Pro found the embedded text 99% of the time in blocks of data as long as 1 million tokens.
Google says it has conducted extensive ethics and safety testing for Gemini 1.5 Pro, including evaluations across areas such as content safety and representational harms.
Private Preview and Developer Access
Starting now, a limited group of developers and enterprise customers can try Gemini 1.5 Pro with a context window of up to 1 million tokens via AI Studio and Vertex AI in private preview. This phased rollout allows developers to test and provide feedback on the enhanced model before it becomes widely accessible.