- Developer Availability: Google is making Nano Banana 2 Lite and Gemini Omni Flash available for developer and enterprise workflows.
- Image Pricing: Google says the new image model generates drafts in about four seconds at $0.034 per 1,000 images.
- Video Preview: Gemini Omni Flash adds 10-second video generation and conversational editing, but it remains in public preview.
- Trust Pressure: Watermarking, content credentials and crowded AI media alternatives keep provenance and quality central for creative teams.
Google has released Nano Banana 2 Lite and Gemini Omni Flash, giving developers a lower-cost Gemini image model and a public-preview video generation and editing model inside the same workflow. Together, the tools turn Google’s latest generated-media update into an image-to-video pipeline.
Nano Banana 2 Lite can generate images in about four seconds at $0.034 per 1,000 images, while Gemini Omni Flash supports 10-second video output at $0.10 per second. Developers can start in Google AI Studio, build through the Gemini API or use the Gemini Enterprise Agent Platform, but the video model remains in public preview before all production limits are resolved.
Google AI Studio is the browser-based developer workspace, the Gemini API is the developer interface for building Gemini features into apps, and the Gemini Enterprise Agent Platform is Google’s enterprise platform for agent and media workflows. Access through all three surfaces gives teams a way to create many image drafts, then animate selected assets into short clips without moving every step into a consumer app.
How Google’s New Media Models Work
Nano Banana 2 Lite is the public product name for Gemini 3.1 Flash-Lite Image, the formal API model name. Google launched Gemini 2.5 Flash Image in October 2025 before the Lite model, but Google’s new pitch is narrower: fast 1K draft generation at lower cost rather than a higher-control image editor for every production task.
For image workflows, lower cost changes the early creative step. A marketing or product team can generate many still-image candidates, narrow them through human review and then move selected assets into video generation.
introducing nano banana 2 lite: our fastest, most cost-effective gemini image model yet
built for high-velocity developer pipelines, it delivers text-to-image outputs in 4 seconds at just $0.034 per 1K-resolution image
swap it into your workflow today via ai studio and the… pic.twitter.com/ll16KOZxse
— Google AI Studio (@GoogleAIStudio) June 30, 2026
On the video side, Gemini Omni Flash accepts text, still images and video as inputs for short 720p outputs. Through the Gemini API, developers get 3-to-10-second 720p video generation and conversational editing through the Interactions API, the API path for multi-step editing sessions.
Users can create a product image with Nano Banana 2 Lite, then use natural-language instructions to change camera angle, lighting or character placement in a short clip.
gemini omni flash is here: our high-quality, cost-efficient model for video generation and conversational editing
designed to support multimodal workflows, it enables you to refine videos using natural language and simple prompting
start building with it today via ai studio and… pic.twitter.com/qyPnEhss38
— Google AI Studio (@GoogleAIStudio) June 30, 2026
Because Omni Flash remains in public preview, limits still define the video side. Omni Flash can produce 10-second clips, but video references up to 3 seconds are accepted by the API schema even though the model does not process them correctly yet. Multi-step edits also preserve context for only up to three consecutive edits, making the preview useful for testing repeatable workflows but not a finished long-form video suite.
Enterprise Workflows and Product Examples
For enterprise media work, advertising and public relations company WPP had early access through WPP Open, WPP’s marketing workflow platform. Google presents this as a business-facing example for generated media beyond consumer demos, especially for agencies that already manage campaign assets across clients and channels.
Google Cloud extends that enterprise lane by targeting asset localization, product swaps and style transfer for teams that need many localized image variations before selecting assets for video. In practice, a team could create regional product images, review them for brand fit and then ask Omni Flash to turn a smaller approved set into short moving assets.
Consumer and design surfaces show where the same technology may appear after the developer release. Google used a similar product-surface strategy with personalized Nano Banana image generation in Gemini, where an image model appeared inside a user-facing Gemini feature before the cheaper developer launch.
NotebookLM Short Video Overviews remain a second example. NotebookLM is slated to use Nano Banana 2 Lite for 60-second portrait videos with narrated explanations and educational animations, showing how the image model can feed consumer-facing media features without making NotebookLM the main story.
Competitors, Watermarks and Preview Limits
In the wider market, Adobe Firefly, OpenAI Sora, Midjourney, Krea, Luma Dream Machine and Pika give developers crowded generative AI market alternatives for image and video work. Luma AI’s Uni-1 benchmark results also show why speed and price do not remove quality pressure from rival image models. For Google, cheaper drafts matter only if teams can use the assets in workflows where quality, control and output timing still count.
For adoption, trust features are part of the same test. Google’s new media models support invisible SynthID digital watermarking and content credentials that help identify AI-generated media. Creative communities still direct criticism at AI image and video tools as platforms market them for advertising and production, making provenance part of the same workflow decision as price and clip length.
Until those limits change, developers can test the lower-cost image model and short-video editing. Production teams still need Google to improve Omni Flash beyond 10-second output, fix video-reference handling and extend edit-session depth before the model can support repeatable campaign work at scale.


