Google has expanded its Gemma series of lightweight open AI models with the launch of Gemma 2. The new models, available in Google AI Studio, promise better performance and efficiency through a revamped architecture.
Google also opened developers access to the 2 million context window for Gemini 1.5 Pro, code execution capabilities in the Gemini API.
Configurations and Performance
The models can be hosted on a single NVIDIA A100 80GB Tensor Core GPU, NVIDIA H100 Tensor Core GPU, or Cloud Tensor Processing Units (TPUs), which lowers AI infrastructure expenses. Additionally, they are compatible with NVIDIA RTX or GeForce RTX desktop GPUs through Hugging Face Transformers. Starting next month, Google Cloud customers can deploy Gemma 2 on Vertex AI, and developers can test them on Google AI Studio.
Gemma 2 comes in two versions: one with 9 billion parameters and another with 27 billion. The larger model delivers performance comparable to models more than twice its size, while the 9B model exceeds the capabilities of similar-sized competitors like Llama 3 8B. A smaller 2.6B parameter version tailored for smartphone AI applications is expected soon.
Applications and Accessibility
Gemma 2 is accessible for experimentation on Google AI Studio. Meanwhile, Gemini 1.5 Flash, optimized for speed and cost, is being used in various applications.
Envision for example uses it to help visually impaired users comprehend their environment in real-time, Plural, an automated policy analysis and monitoring platform, leverages it to summarize complex legislative texts for NGOs and citizens, automation platform Zapier employs its video reasoning for automation in video editing, and AI provider Dot uses it for information compression in their long-term memory system.
Commitment to Responsible AI
Google says Gemma 2 meets safety standards by filtering pre-training data and evaluating the models against various safety metrics to minimize biases.
The models are free via the machine learning and data science community Kaggle or the Google Colab free tier, and academic researchers can apply for Google Cloud credits through the Gemma 2 Academic Research Program.
Extended Context Window and Code Execution in Gemini 1.5 Pro
Google has made a 2 million token context window for Gemini 1.5 Pro available to all developers, allowing the handling of larger inputs. Although this might increase costs, Google plans to address this with context caching in the Gemini API.
The Gemini API now features code execution for Gemini 1.5 Pro and 1.5 Flash models, enabling them to generate and execute Python code. This enhances their abilities in solving complex math and data reasoning tasks. The execution occurs in a secure, offline environment and includes several numerical libraries.
Google is also preparing to make Gemini 1.5 Flash tuning available for all developers, with text tuning now available for red-teaming and gradual rollout.
Last Updated on November 7, 2024 3:45 pm CET