Microsoft has rolled out four innovative AI neural voices for text-to-speech (TTS) applications, specifically designed for integration with Azure OpenAI Service. These voices are primed to enhance speech-based chatbots, voice assistants, and conversational agents.
Voices Optimized for Conversational Scenarios
The newly introduced voices are named en-US-AndrewNeural, en-US-BrianNeural, en-US-EmmaNeural (all in US English), and zh-CH-YunjieNeural (Chinese). These voices have been fine-tuned for conversational contexts and are currently available for public preview in three regions: East US, South East Asia, and West Europe. Microsoft has provided samples of these voices, highlighting their advancements in delivering more natural and fluid speech compared to existing neural voices.
“…friendly, and optimistic about life, always eager to assist others and share intriguing or practical knowledge. The speaking style of the voice resembles a conversation with an acquaintance over a cup of tea, maintaining a natural and unexaggerated tone.” This statement from Microsoft emphasizes the persona and tone behind each voice.
Technological Advancements Behind the Voices
Microsoft's continuous efforts to enhance Text-to-Speech (TTS) modeling techniques have led to significant improvements in the quality of AI voices. Recent projects like DelightfulTTS 2 and MuLanTTS have bridged the quality gap between AI voices and professional human recordings. These projects have played a pivotal role in producing voices that sound more natural and realistic. Such technological progress forms the foundation for the newly introduced AI voices.
Developers can seamlessly integrate these voices into their applications using the Azure Speech SDK or REST API. The Azure Bot Framework also offers capabilities to craft intelligent bots that can utilize these new neural TTS voices.
Microsoft's extensive offering includes over 400 neural voices, spanning more than 140 languages and locales. This vast array ensures developers and businesses have a plethora of choices to provide enriched conversational experiences to their users.