ChatGPT Adds Voice and Image Input, Turning into Fully Fledged Voice Assistant

OpenAI has introduced innovative updates to ChatGPT, enabling users to interact with the AI bot through voice commands and images. These features, initially available to ChatGPT Plus and Enterprise users, will be accessible to the wider public shortly. Users can now engage in vocal dialogues with ChatGPT and utilize images to receive information, marking a significant step towards a more intuitive user interface.

“The new features are rolling out to those who pay for ChatGPT in the next two weeks, and everyone else will get it soon after,” OpenAI stated.

Voice Interaction and Synthetic Voices

The voice interaction feature allows users to communicate with ChatGPT seamlessly, akin to interacting with Amazon Alexa or Google Assistant. The AI bot utilizes OpenAI's Whisper model for speech-to-text conversion and a newly developed text-to-speech model to generate human-like audio responses. Users have the option to choose from five distinct synthetic voices, created in collaboration with professional voice actors.

Use your voice to engage in a back-and-forth conversation with ChatGPT. Speak with it on the go, request a bedtime story, or settle a dinner table debate.

Sound on 🔊 pic.twitter.com/3tuWzX0wtS

— OpenAI (@OpenAI) September 25, 2023

Joanne Jang, a product manager at OpenAI, emphasized the importance of creating voices that users could listen to all day. “In fashioning the voices, the number-one criterion was whether this is a voice you could listen to all day,” she mentioned.

Image Interaction and Real-World Applications

The image interaction feature allows users to upload images and inquire about their contents. This feature, powered by multimodal GPT-3.5 and GPT-4, has practical applications such as assisting with meal planning and solving math problems. A notable implementation of this technology is its collaboration with Be My Eyes, an app designed to assist visually impaired individuals by describing the contents of uploaded images.

Addressing Risks and Ethical Considerations

OpenAI acknowledges the potential risks associated with these advancements, including voice fraud and privacy concerns. The organization has implemented measures to mitigate these risks, such as limiting ChatGPT's ability to analyze and make direct statements about individuals. OpenAI remains transparent about the model's limitations and advises users against utilizing ChatGPT for high-risk purposes without proper verification.

Raul Puri, a scientist at OpenAI, highlighted the complexity of combining models and the extensive brainstorming involved in addressing potential misuses. “You have all the problems with computer vision; you have all the problems of large language models. Voice fraud is a big problem,” Puri explained.

ChatGPT Adds Voice and Image Input, Turning into Fully Fledged Voice Assistant

Voice Interaction and Synthetic Voices

Image Interaction and Real-World Applications

Addressing Risks and Ethical Considerations

Recent News

Reddit Launches Dynamic Product Ads in Global Public Beta

Google Announces Direct Microsoft 365 App Access on ChromeOS