ElevenLabs, a startup established by former employees of Google and Palantir, announced recently the launch of AI Dubbing, an advanced product that can translate long-form speech content into more than 20 languages. Available for all platform users, the solution provides a novel way to dub video and audio content and revamp an area that has remained predominantly manual.
Mati Staniszewski, the CEO and co-founder of ElevenLabs, stated that the new feature was developed and tested with assistance from hundreds of content creators. The goal is to make content more accessible for a wider audience, particularly for independent creators who lack the means to hire translators to globalize their content.
How AI Dubbing Works
Underneath multiple layers of AI-driven tasks such as background noise removal and speech translation, AI Dubbing serves as a user-friendly tool. Users simply select the AI Dubbing tool on ElevenLabs, create a new project, determine the source and target languages, and upload the content file. Once uploaded, the tool identifies the number of speakers and begins the transformation process, which is tracked by an on-screen progress bar. After the file is fully processed, it can be downloaded and used.
The tool employs a proprietary algorithm from ElevenLabs to remove background noises and accurately differentiate between dialogue, music, and other sounds. It also maintains the integrity of the original speaker's voice, effectively captures their emotion, and ensures appropriate timing for the translated speech.
AI-Based Voices on the Horizon
While ElevenLabs is gaining attention for its developments, other tech players including OpenAI and WellSaid Labs are also exploring AI-based voice synthesis. Some companies, such as Spotify, are already applying this technology to allow Podcasters to transcribe their content into different languages whilst retaining their original voice.
Despite the competition, Staniszewski is confident in ElevenLabs' AI Dubbing tool's ability. The tool sets itself apart through its capability to translate long form audio or video content from any number of speakers, preserving their voices and emotions in up to 20 languages, and delivering top-quality results.
According to Market US, the global market for such tools was valued at $1.2 billion in 2022 and is expected to reach almost $5 billion by 2032, growing at an annual rate of over 15.40%.
Using AI to Generate Audiobooks
ElevenLabs is becoming a major contributor to audio AI tools. In September, the company launched an AI solution that generates audiobooks. Known as Projects, the AI based tool aimed at simplifying the generation and editing of long-form audio like audiobooks. It is based on the company's research into long-form speech synthesis, audio conditioning, and parallelized audio generation.
In August, the company rolled out a voice AI that supports 30 languages. Eleven Multilingual v2 is a model supports multiple languages, marking a significant leap in AI voice generation and cloning. Users of the platform can seamlessly utilize ElevenLabs' renowned text-to-speech and voice-cloning tools across this diverse linguistic range.
Back in June, ElevenLabs' AI speech classifier was launched. This was a first-of-its-kind verification mechanism that lets users upload any audio sample to identify if it contains AI-generated audio. ElevenLabs points out that the AI Speech Classifier is up to 99% accurate when dealing with one audio sample.