Google Launches Gemini 3.5 Live Voice Translation For 70 Languages

Google has released Gemini 3.5 Live Translate for speech translation, adding 70+ language detection, developer previews, Meet access, and AI audio watermarking.

TL;DR
  • Model Launch: Google released Gemini 3.5 Live Translate for near real-time speech-to-speech translation across more than 70 languages.
  • Voice Mechanism: The model generates translated speech continuously, staying a few seconds behind the speaker, and preserving intonation, pacing, and pitch.
  • Preview Access: Developers can test public previews, while Google limits Meet access to selected Workspace customers before wider availability.
  • Real-World Test: Android earpiece listening, SynthID audio watermarking, and noisy conversations remain key checks for everyday use.

Google has released Gemini 3.5 Live Translate as a near real-time speech-to-speech translation model for more than 70 languages. With the Model, spoken input in one language becomes spoken output in another, moving Gemini-powered voice translation toward consumer conversations, business meetings, and developer-built services.

Gemini 3.5 Live Translate supports translated speech rather than only captions or written text. Gemini 3.5 Live Translate is also rolling out on the Google Translate app on Android and iOS, while selected business users of Google’s Workspace productivity suite get a Google Meet private preview first.

Developers can begin testing through public preview access in the Gemini Live API, the software interface for live Gemini interactions, and Google AI Studio, Google’s model-building tool. App users, IT teams, and builders can use the new model from different starting points as Google separates consumer app use, meeting previews, and developer tools.

Spoken output still trails the original speaker by a few seconds. Nvidia’s PersonaPlex latency benchmarks show why that delay remains a practical test for real-time voice AI, so Google’s claim is near real time rather than delay-free translation.

How Gemini 3.5 Live Translate Works

Gemini 3.5 Live Translate can detect more than 70 languages and preserve intonation, pacing, and pitch while speech continues. Continuous translated audio avoids waiting for a full speaker turn. More context can improve output quality, but users will still hear the few-second gap in live conversations.

 

Android users get the clearest phone-level difference. Android listening mode plays translated speech through a phone earpiece instead of the built-in speaker, while iOS does not offer the same earpiece path. Phone-to-ear playback narrows the mobile experience toward quieter one-to-one conversations.

Google Meet will adds the enterprise scale point with Gemini 3.5 Live Translate. Meet language coverage is going to expand from five languages to more than 2,000 language combinations, with selected-customers getting the upgrade first. 

Audio generated with Gemini 3.5 Live Translate uses SynthID-watermarks, extending a watermarking system that already had a public SynthID detector for AI media.

Translate, Meet, and Developers Get Different Rollout Paths

Consumer access, meeting-room testing, and developer experimentation are moving on different schedules. Google Translate users get the app path, Workspace customers get limited Meet access, and developers can build through the Gemini Live API and Google AI Studio. Developer access turns the model into a platform feature rather than only a Google app feature.

Agora, Fishjam, LiveKit, Pipecat, and Vision Agents are named as platforms that are going to support real-time voice translation applications built on the model. Grab, a Southeast Asian ride-hailing platform, is already testing the model for driver and traveler communication at pickups. Grab’s platform handles more than 10 million voice calls per month, giving Google a high-volume conversation setting outside its own apps.

 

Competition and Real-World Checks

Related products already include Zoom translated captions, KUDO AI Speech Translator, Wordly AI Translation, and HeyGen’s video localization product which works with more than 175 languages. Google’s advantage will depend on whether one model can work reliably across calls, meetings, and apps, not only in prepared demonstrations.

 

Markus Kasanmascheff
Markus Kasanmascheff
Markus has been covering the tech industry for more than 15 years. He is holding a Master´s degree in International Economics and is the founder and managing editor of Winbuzzer.com.
Subscribe
Notify of
guest
0 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments