A new version of DALL-E, the generative AI technology that can create images from text descriptions, has been leaked online. DALL-E 3 is still in development, but the leaked version shows that it has a number of new features that could make it even more powerful than its predecessor. Decoder reports the leak came from an internal OpenAI email posted on Discord.
One of the most notable new features in DALL-E 3 is the ability to generate images from more complex text descriptions. For example, users can now ask DALL-E to create images that depict specific scenes from movies or books, or that incorporate multiple different objects or concepts.
Another new feature in DALL-E 3 is the ability to control the style of the generated images. Users can now specify the art style that they want DALL-E to use, such as impressionism, cubism, or pop art. This could make it easier for users to create images that match their specific needs or preferences.
The leaked version of DALL-E 3 is still under development, so it's not yet clear when it will be released to the public. However, the new features that have been revealed so far suggest that DALL-E 3 could be a powerful tool for artists, designers, and creative professionals.
In addition to the new text features, the leaked version of DALL-E 3 also includes a number of other improvements, such as the ability to generate higher-resolution images and to support more languages. These improvements could make DALL-E 3 even more versatile and useful than its predecessor.
AI Image Generation from DALL-E
However, it's important to note that the leaked version is not the final product. It's possible that some of the features that have been revealed may not be included in the final version of DALL-E 3. Nevertheless, the leak provides a glimpse of what's to come from DALL-E 3. If the final version of the technology lives up to the hype, it could have a major impact on the way that we create and consume images.
DALL-E is based on a large-scale neural network that has been trained on a massive dataset of text and image pairs, using a technique called self-attention. The model learns to encode the meaning and context of the text prompt, and then decode it into a corresponding image. The model can also use additional information, such as geo-coordinates or color codes, to refine the image generation process.
One of the main challenges of image generation is to ensure that the images are coherent and consistent with the text prompt, as well as realistic and diverse. DALL-E addresses these challenges by using a novel loss function that balances reconstruction accuracy, diversity, and semantic alignment. The loss function also incorporates a contrastive learning component that encourages the model to generate images that are distinct from other images in the dataset.
DALL-E was co-developed by OpenAI and Microsoft. Redmond provided an Azure-powered supercomputer to create the AI. This was the same computing system that built the GPT AI engine, which is now up to GPT-4 and powering services such as Bing Chat and Microsoft 365 Copilot. At Ignite 2022, Microsoft announced a big integration for DALL∙E 2 in Azure DevOps Service, and released the Microsoft Designer app for Windows 11 which leverages the AI. In March, Microsoft launched Bing Image Creator, which adds DALL-E/Microsoft Designer capabilities directly into Bing.
Keeping Pace in a Competitive Market
OpenAI is competing with several Big Tech companies in the field of image generative AI. Several companies and organizations have been developing and improving their own AI image generators, using different techniques and datasets.
Recent examples of image generative AI
- NVIDIA has been advancing the state-of-the-art in generative AI research, with new methods to enhance the realism and quality of AI-generated images.
- OpenAI, the research organization behind DALL-E, has also introduced ShapE, a generative model that can create 3D models from text, opening up new possibilities for AI in image creation.
- Stability AI, a startup that focuses on generative AI, has released StableStudio, an open-source web app that uses its Stable Diffusion model to generate images from text prompts. Users can also use DreamStudio features to make multiple variations of an image with different styles and attributes.
- Meta, the company formerly known as Facebook, has unveiled I-JEPA, its own AI image generator based on its generative transformer model. I-JEPA can learn the associations between words and images, and generate realistic images from text descriptions.
- Alibaba, the Chinese e-commerce giant, has launched Tongyi Wanxiang, a generative AI image generator that can handle both Chinese and English languages. Users can customize the image output parameters using Composer, a large model developed by Alibaba Cloud.