If you are a fan of DALL·E, the text-to-image generator by OpenAI, you might be wondering what it can do with 3D objects. OpenAI has just answered this by releasing Shap·E, a new generative model that can create realistic and diverse 3D assets from textual prompts.
Shap·E is not just another 3D model generator and can directly generate the parameters of implicit functions that can be rendered as both textured meshes and neural radiance fields (NeRFs). This means that Shap·E can produce high-quality 3D assets that have fine-grained textures and complex shapes, unlike previous models that only output point clouds or voxels.
High Quality 3D Assets Within Seconds
Shap·E is trained in two stages: first, an encoder is trained to deterministically map 3D assets into the parameters of an implicit function; second, a conditional diffusion model is trained on the outputs of the encoder. The encoder ensures that the generated 3D assets are consistent with the input text, while the diffusion model adds diversity and realism to the output.
OpenAI has showcased several instances of Shap·E's outcomes, such as 3D renderings based on text prompts like a bowl of food, a penguin, a voxelized dog, a campfire, an avocado-shaped chair, among others. Models developed using Shap·E have exhibited exceptional performance, generating high-quality results in just a matter of seconds.
Shap·E is capable of generating 3D assets in a matter of seconds, faster than other models like Point·E. It can also handle a wide range of textual prompts, from simple descriptions to complex queries. For example, you can ask Shap·E to generate “a red car with a spoiler and a sunroof” or “a dragon with scales and wings”. Shap·E will try its best to fulfill your request and produce stunning 3D assets that you can view from different angles and lighting conditions.
Shap·E is a breakthrough in 3D asset generation and has many potential applications in various domains, such as gaming, animation, education, and e-commerce. Imagine being able to create your own 3D characters, scenes, and products with just a few words. Shap·E makes this possible and opens up new possibilities for creativity and innovation.
If you want to try out Shap·E for yourself, you can visit the official GitHub page, where you can find the model weights, inference code, and samples. The related research paper explains more about the technical details of Shap·E.
How Shap·E Uses Diffusion Models to Create 3D Assets
Shap·E is using diffusion models, which are a new class of deep generative models that can produce high-quality synthetic data, such as images, videos, and molecules. They are based on the idea of reversing a stochastic process that gradually adds noise to the data until it becomes random. By learning to undo this process, diffusion models can generate realistic samples from simple noise.
The basic idea of diffusion models is to model the data distribution as the stationary distribution of a diffusion process. A diffusion process is a stochastic process that adds Gaussian noise to the data at each step, making it more and more random. The reverse process, which removes noise from random samples, can be used to generate data.
To assess its capabilities, Shap·E was compared to another generative model known as Point·E, which creates explicit representations using point clouds. Even though Shap·E deals with a more complex, multi-representational output space, it displayed quicker convergence and achieved similar or superior sample quality in the comparison.
Tip of the day: Tired of Windows´s default notification and other system sounds? In our tutorial we show you how to change windows sounds or turn off system sounds entirely.