Stability AI has introduced Stable Video 4D, a new generative AI model set to elevate video creation. This model is an extension of the company's prior developments and aims to bring advancements to areas like film production, gaming, and AR/VR experiences.
Technical Advancements and Capabilities
Stable Video 4D is unique in its ability to produce multiple viewpoint videos from a single input. According to Varun Jampani, head of 3D Research at Stability AI, the model merges the capabilities of Stable Video Diffusion and Stable Video 3D. Trained with a collection of dynamic 3D objects, it can create videos with various angles and time points.
The “4D” aspect represents dimensions of width (x), height (y), depth (z), and time (t). This allows it to generate 3D objects dynamically viewed from different perspectives over time. Rather than employing traditional infill techniques, Stable Video 4D synthesizes eight new viewpoint videos based on the initial input, ensuring better 3D and temporal coherence.
Performance and Customizability
Creating five frames across eight views takes roughly 40 seconds per input. Users can define camera angles to tailor their creations. The optimization process spans about 20 to 25 minutes. Unlike older models needing multiple runs for consistency, Stable Video 4D generates all viewpoint videos simultaneously, maintaining spatial and temporal cohesion.
Currently available for research on Hugging Face, Stable Video 4D processes short, single-object videos with simple backgrounds. The team plans to expand its capabilities to handle longer and more complex videos. Stability AI continues to refine the model to work with a broader range of real-world scenes beyond its initial synthetic training dataset.
Technical Insights and Report
A detailed technical report has also been published on arXiv by Stability AI, outlining the methodologies, challenges, and breakthroughs during the model's development. This document provides a deeper look into the creation and technological advances of Stable Video 4D.
Though primarily focused on research at this stage, the introduction of Stable Video 4D marks a major leap in generative AI, offering advanced video solutions. While commercial applications have not yet been disclosed, Stability AI envisions its use across various industries that require sophisticated 3D video generation capabilities.
Video AI, a Growing Interest for Tech Companies
Stability AI is not the only company exploring video generating AI. OpenAI is also in the market with its Sora AI model. Sora represents OpenAI's latest foray into video generation technology, designed to revolutionize how movies are made. By leveraging artificial intelligence, Sora aims to streamline the filmmaking process, offering tools that could potentially transform storytelling, special effects, and even actor performances.