Google has officially launched the public preview of Gemini Pro 1.5, its most advanced generative AI model to date, on Vertex AI, the company's platform dedicated to enterprise AI development. The announcement was made at Google's annual Cloud Next conference in Las Vegas. Gemini Pro 1.5, which was initially launched in February, is a significant addition to Google's Gemini family of generative AI models. It stands out for its capability to process a substantial amount of context, ranging from 128,000 tokens to an impressive 1 million tokens.
Technical Capabilities and Applications
A token represents a subdivided piece of raw data, such as parts of a word. For example, the word “fantastic” can be broken down into the tokens “fan,” “tas,” and “tic.” In practical terms, 1 million tokens equate to approximately 700,000 words or around 30,000 lines of code. This capacity is notably four times greater than that of Anthropic's flagship model, Claude 3, and eight times higher than the maximum context of OpenAI's GPT-4 Turbo. The context window of a model is important as it determines the initial data set the model analyzes before generating output. Models with larger context windows can maintain topic relevance over longer conversations or documents, understand narrative flow better, and produce contextually richer responses without needing as much fine-tuning.
Gemini Pro 1.5's extensive context window enables a wide array of applications, from analyzing code libraries and reasoning through lengthy documents to sustaining extended conversations with chatbots. Moreover, its multilingual and multimodal capabilities allow it to understand and analyze content across different media types, including images, videos, and now audio streams, in various languages. One million tokens can represent about an hour of video or approximately 11 hours of audio, enhancing the model's utility in processing and transcribing multimedia content.
Early Adoption and Future Prospects
Several early adopters, including United Wholesale Mortgage, TBS, and Replit, are already leveraging Gemini Pro 1.5 Pro's large context window for diverse tasks such as mortgage underwriting, automating metadata tagging on media archives, and code generation and transformation. Despite its advanced capabilities, the model's processing time, which ranges from 20 seconds to a minute per search, is an area Google aims to optimize further.