Google significantly upgraded its artificial intelligence content creation arsenal, launching Veo 3, its latest video generation model now capable of creating and integrating audio, including dialogue and environmental effects. This move directly challenges competitors like OpenAI’s Sora, with integrated audio a key differentiator. Alongside Veo 3, Google introduced Flow, a new AI filmmaking tool, and Imagen 4 for enhanced image generation, signaling a major push into sophisticated multimodal AI tools.
The new capabilities are initially accessible in the U.S. via the Flow interface for subscribers to Google’s new $249.99 per month AI Ultra plan, with enterprise access through Vertex AI. This pricing strategy underscores Google’s intent to monetize its advanced AI, offering powerful tools that could transform creative workflows for filmmakers, marketers, and artists by simplifying the production of more immersive content.
Veo 3: Bringing Sound to AI-Generated Video
Veo 3 marks a notable advancement by incorporating native audio generation, a feature its predecessor, Veo 2, lacked. Eli Collins, Google DeepMind’s product vice president, stated that “Veo 3 excels from text and image prompting to real-world physics and accurate lip syncing.” This builds on Veo 2’s foundation, which already offered 4K output and understood cinematic prompts, having been trained on the “language of cinematography.”
The ability to generate synchronized audio—from character dialogue with lip-syncing to ambient background noise—directly within the video creation process is a significant step. Google DeepMind CEO Demis Hassabis remarked that with Veo 3, “we’re emerging from the silent era of video generation.”
Google’s internal evaluations for Veo 2 had already suggested a competitive edge, with 59% of users reportedly preferring its visual outputs over Sora Turbo. Veo 3 is also in private preview on Vertex AI, where it can generate video from text and image prompts, incorporating speech and various audio elements.
Flow: A Dedicated Toolkit for AI Filmmaking
The newly introduced AI filmmaking tool, Flow, is engineered for creatives, integrating Google’s leading models: Veo for video, Imagen for generating visual “ingredients” like characters or scenes from text, and Gemini for intuitive, natural language prompting. The Google Blog describes Flow as an evolution of the VideoFX Google Labs experiment, designed to make creation “effortless, iterative, and full of possibility.”
Google explains that Flow is custom-designed for Veo, leveraging its exceptional prompt adherence and ability to produce stunning, realistic cinematic outputs, while Gemini facilitates intuitive prompting in everyday language, and Imagen allows users to create or import assets with consistency.
Flow includes features such as precise Camera Controls, a Scenebuilder for editing and extending shots while maintaining consistency, Asset Management, and Flow TV—a showcase from Google Labs where users can learn from shared prompts and techniques.
Access is available through the Google AI Pro plan, which offers key Flow features and 100 generations per month, and the premium Google AI Ultra plan, which provides the highest usage limits and early access to Veo 3 with its integrated audio.
Google highlighted collaborations with filmmakers like Dave Clark, who utilized Flow for short film development. Filmmaker Darren Aronofsky commented on the evolving landscape, stating that “Filmmaking has always been driven by technology,” and added that “Now is the moment to explore these new tools and shape them for the future of storytelling.”
Imagen 4 and Broader AI Enhancements
Google also unveiled Imagen 4, its latest text-to-image model, promising improved speed, performance, and the generation of fine details. PetaPixel also detailed that Imagen 4 supports various styles, more aspect ratios, up to 2K resolution, and is better at rendering text, with a “fast variant” planned to be up to 10 times faster than Imagen 3.

This addresses past criticisms of Google’s image generation, such as when Imagen 3 produced historically inaccurate results, an issue Google co-founder Sergey Brin attributed to a lack of “thorough testing.” Imagen 4 is now in public preview on Vertex AI, delivering enhanced text rendering and prompt adherence.
Further expanding its creative AI suite, Google updated the Veo 2 video generator to allow users to add or remove objects from videos using text prompts. The Lyria 2 music-generation model is now generally available in Vertex AI, offering high-fidelity music creation with greater control over instruments and BPM from text prompts, an update from its initial introduction in April 2025.
Market Context, Competition, and Ongoing Considerations
These launches occur as AI-driven image and video generation tools see surging popularity. OpenAI CEO Sam Altman, for instance, remarked that ChatGPT’s 4o image generator was so heavily used after its launch that it caused the company’s computing chips to “melt.”
Google’s tiered subscription model for Flow and Veo 3, including the comprehensive Google AI Ultra plan which bundles these tools with YouTube Premium and 30TB of cloud storage clearly targets both enthusiast and enterprise users.
Ethical considerations and responsible AI development remain central. All content from Veo 3, Imagen 4, and Lyria 2 will feature SynthID watermarks, and Google has introduced a public SynthID Detector tool to verify AI-generated content.
However, transparency regarding the datasets used for training these models continues to be a subject of industry discussion, particularly with regulations like the European Union’s AI Act. Google’s Gemini privacy policy notes data collection from chats and files. Ultimately, Google’s vision, as previously articulated by DeepMind CEO Demis Hassabis is to “Google will eventually combine its Gemini and Veo AI models to enhance understanding of the physical world,” suggesting a future of even more deeply integrated multimodal AI applications.