Meta Unveils Voicebox Generative AI for Text-to-Speech

Meta, the parent company of Facebook, has announced the development of a new generative artificial intelligence (AI) model named Voicebox. The state-of-the-art AI model is designed to perform a variety of speech generation tasks, including editing, sampling, and stylizing, even if it wasn't specifically trained for these tasks.

Voicebox: A Multilingual AI Model

Voicebox is capable of producing high-quality audio clips and editing pre-recorded audio, such as removing unwanted noises like car horns or a dog barking, all while preserving the content and style of the audio. The model is also multilingual, capable of producing speech in six languages: English, French, German, Spanish, Polish, and Portuguese.

“Voicebox can produce a reading of the text in any of those languages, even when the sample speech and the text are in different languages. This capability could be used in the future to help people communicate in a natural, authentic way even if they don't speak the same languages,” the official announcement from Meta says.

Introducing Voicebox, a new breakthrough generative speech system based on Flow Matching, a new method proposed by Meta AI. It can synthesize speech across six languages, perform noise removal, edit content, transfer audio style & more.

More details on this work & examples ⬇️

— Meta AI (@MetaAI) June 16, 2023

Potential Applications of Voicebox

The potential applications of Voicebox are vast. It could give natural-sounding voices to virtual assistants and non-player characters in the metaverse. It could also allow visually impaired people to hear written messages from friends read by AI in their voices. Furthermore, it could provide creators with new tools to easily create and edit audio tracks for videos and much more.

Breakthrough for Audio Editing

One of the most impressive features of Voicebox is its ability to edit and reduce noise in audio clips. The AI model can recreate a portion of speech that's interrupted by noise or replace misspoken words without having to re-record an entire speech.

“For example, you can identify a segment of a speech that's interrupted by a dog barking, crop it, and instruct Voicebox to re-generate that segment – like an eraser for audio editing,” the company says.

Voicebox represents another significant step forward in generative AI research. It illustrates AI's potential to transform how we interact with technology and each other. As Meta continues to explore the audio space, the tech community eagerly awaits to see how other researchers will build on this groundbreaking work.

Meta and AI: Recent Developments

On March 31, 2023, Meta announced a project that gives robots “eyes” through an artificial visual cortex. This project, called SEER (Self-supervised), is a computer vision model that can learn from any random group of images on the internet, which is a significant shift from previous models that required manually labeled datasets. This technology could potentially revolutionize the way AI understands and interacts with the visual world.
On May 20, 2023, Meta unveiled CodeCompose, an AI-driven coding tool similar to GitHub Copilot. CodeCompose is a generative AI-based coding assistant developed to make developers more productive throughout the software development lifecycle. The tool offers code suggestions for various languages as developers type in Integrated Development Environments (IDEs) like VS Code.
On June 6, 2023, leaked images emerged that hint at Instagram's development of an AI chatbot. These “AI agents” will be able to answer questions or provide advice to users, with the option to choose from 30 different AI personalities. The development of this AI chatbot seems to be a response to changing user behavior, with more conversations on Instagram shifting to DMs.
On June 9, 2023, Meta's CEO Mark Zuckerberg announced plans to integrate generative AI into its flagship products, such as Facebook and Instagram. This move is expected to transform the way we create, share, and experience content. Zuckerberg also mentioned the development of AI personas that can assist users in various ways.
On June 14, 2023, Meta announced a new AI image creation model called I-JEPA. I-JEPA, or Image Joint Embedding Predictive Architecture, is designed to create realistic images from text descriptions. This technology could be used for a variety of purposes, including creating marketing materials, designing products, and generating art.

Meta Unveils Voicebox Generative AI for Text-to-Speech

Voicebox: A Multilingual AI Model

Potential Applications of Voicebox

Breakthrough for Audio Editing

Meta and AI: Recent Developments

Recent News

Reddit Launches Dynamic Product Ads in Global Public Beta

Google Announces Direct Microsoft 365 App Access on ChromeOS