HomeWinBuzzer NewsGoogle's VideoPoet Reinvents Video AI with Large Language Models

Google’s VideoPoet Reinvents Video AI with Large Language Models

Google's VideoPoet: LLM breakthrough generates stunning videos from text or images. Sidesteps diffusion for LLM, trained on massive video/text/image data.


Google Research has revealed VideoPoet, a novel large model (LLM) which stands as a significant breakthrough capable of generating videos. The model distinguishes itself by sidestepping the established diffusion-based methods commonly used in the industry and opting for an LLM. This method has historically been employed mainly for text and code production but is now tailored by to craft videos.

Pre-training Drives Success

Instead of depending on diffusion techniques like those seen in Stable Diffusion, VideoPoet capitalizes on extensive pre-training. The model has processed 270 million videos and over a billion text-and-image pairs collected from public domains and various sources. By converting this vast dataset into text embeddings, visual tokens, and audio tokens, VideoPoet can generate sophisticated video content with remarkable adherence to input prompts.

Outperforming the Competition

VideoPoet excels at producing longer video clips of high quality, showcasing more consistent motion than its diffusion-based counterparts, which tend to struggle to maintain coherence over extended frames. By employing 31 researchers, Google has succeeded in creating a solution that eliminates many of the constraints and issues plaguing contemporary video generators. Human raters have acknowledged the enhanced capability of VideoPoet, expressing a clear preference for its outputs over rival products, including those produced by other leading models in motion quality and prompt adherence.

Google has tailored VideoPoet to default to vertical video production, catering to the burgeoning mobile video market. In the future, the tech giant aims to broaden the model's capacity to encompass a variety of generation tasks, such as text-to-audio and audio-to-video, redefining the limits of both video and audio generation capabilities.

Nevertheless, the widely-anticipated VideoPoet is not yet accessible to the public. Upon inquiry, Google has not specified when the tool might become available. For now, industry professionals and enthusiasts alike look forward to the impact VideoPoet will have on the market, as they eagerly await its release.

Last Updated on May 14, 2024 11:06 am CEST

Luke Jones
Luke Jones
Luke has been writing about all things tech for more than five years. He is following Microsoft closely to bring you the latest news about Windows, Office, Azure, Skype, HoloLens and all the rest of their products.

Recent News