DeepL, known for its precise language translation services, has unveiled DeepL Voice, during an event in Berlin. This step marks a shift from the company’s text-focused roots into the dynamic space of real-time voice translation, an increasingly competitive sector with fast developing, AI-driven communication tools.
A Dual Approach for Enhanced Communication
DeepL Voice arrives with two distinct tools: Voice for Meetings and Voice for Conversations. Voice for Meetings is tailored for virtual conferences, where participants can speak in their native language while captions are displayed in real-time, ensuring each person can follow along comfortably.
Voice for Conversations brings this experience to face-to-face interactions, where mobile devices can be used to share translated text in real-time on one screen. Both options currently support spoken language input in English, German, Japanese, Korean, and 10 other languages, with translation output available in all 33 languages supported by DeepL’s text services.
Jarek Kutylowski, DeepL’s CEO, noted the importance of real-time processing for the company.
“Bringing the quality and security that DeepL is known for to real-time speech translation was the next frontier for us as a business and we’re excited to finally unveil our first products. Building on the expertise and models we’ve developed since our launch in 2017, we’ve been working closely with customers as part of a beta program to ensure that we’re delivering a solution that solves real-life challenges businesses face.”
Tracing DeepL’s Path to Voice Translation
DeepL’s success in text translation stems from its robust neural network architecture, which outperformed its competitors soon after its launch in 2017. Backed by a 5.1-petaflop supercomputer in Iceland, DeepL Translator gained attention for setting high BLEU scores—31.1 for English-German translations and 44.7 for English-French—demonstrating its capacity for precision and nuance in language conversion. Gereon Frahling, then CEO, highlighted how arranging neural connections differently allowed for more comprehensive language mapping.
DeepL’s origins trace back to Linguee, a search engine launched in 2009 for translation examples, which laid the groundwork for the company’s AI models. By integrating machine learning, DeepL tapped into a decade’s worth of high-quality translations, building tools that attracted over 100,000 business users and secured a $300 million investment in 2024.
Context in a Competitive Market
The release of DeepL Voice follows a period of significant advances in voice AI technology. ElevenLabs, for instance, launched a multilingual voice tool capable of speech synthesis across nearly 30 languages.
The tool is designed to maintain consistent voice characteristics, catering to content creators who seek to provide authentic experiences in various languages. Microsoft, too, has made strides in AI voice tech with its recent release of HD neural voices in Azure AI Speech, introducing features like emotional tone recognition and conversational pacing that make interactions more lifelike.
DeepL’s entry, however, takes a unique approach by focusing on text-based voice translation rather than synthesized speech. This decision emphasizes minimal latency, a crucial feature for live conversations where speed and accuracy are essential. Kutylowski mentioned that while audio output remains technically challenging, prioritizing text ensures faster response times.
Navigating Privacy and Professional Concerns
Like many in the AI field, DeepL faces questions about data privacy. The company states that while voice data must be processed on servers, it is neither stored nor used to train their models, aligning with GDPR requirements. This assurance responds to early feedback from translators and professionals who raised concerns about data security with automated tools.
Currently, DeepL Voice integrates with Microsoft Teams, catering to businesses that need seamless multilingual communication in remote settings. While there’s no official word on future support for platforms like Zoom or Google Meet, such expansions could enhance the tool’s reach in the growing market for AI-assisted communication.