HomeWinBuzzer NewsChatGPT Adds Voice and Image Input, Turning into Fully Fledged Voice Assistant

ChatGPT Adds Voice and Image Input, Turning into Fully Fledged Voice Assistant

Users can now engage in vocal dialogues with ChatGPT and utilize images to receive information.


has introduced innovative updates to , enabling users to interact with the AI bot through voice commands and images. These features, initially available to and Enterprise users, will be accessible to the wider public shortly. Users can now engage in vocal dialogues with ChatGPT and utilize images to receive information, marking a significant step towards a more intuitive user interface.

“The new features are rolling out to those who pay for ChatGPT in the next two weeks, and everyone else will get it soon after,” OpenAI stated.

Voice Interaction and Synthetic Voices

The voice interaction feature allows users to communicate with ChatGPT seamlessly, akin to interacting with Amazon Alexa or Google Assistant. The AI bot utilizes OpenAI's Whisper model for speech-to-text conversion and a newly developed model to generate human-like audio responses. Users have the option to choose from five distinct synthetic voices, created in collaboration with professional voice actors.

Joanne Jang, a product manager at OpenAI, emphasized the importance of creating voices that users could listen to all day. “In fashioning the voices, the number-one criterion was whether this is a voice you could listen to all day,” she mentioned.

Image Interaction and Real-World Applications

The image interaction feature allows users to upload images and inquire about their contents. This feature, powered by multimodal GPT-3.5 and , has practical applications such as assisting with meal planning and solving math problems. A notable implementation of this technology is its collaboration with Be My Eyes, an app designed to assist visually impaired individuals by describing the contents of uploaded images.

Addressing Risks and Ethical Considerations

OpenAI acknowledges the potential risks associated with these advancements, including voice fraud and privacy concerns. The organization has implemented measures to mitigate these risks, such as limiting ChatGPT's ability to analyze and make direct statements about individuals. OpenAI remains transparent about the model's limitations and advises users against utilizing ChatGPT for high-risk purposes without proper verification.

Raul Puri, a scientist at OpenAI, highlighted the complexity of combining models and the extensive brainstorming involved in addressing potential misuses. “You have all the problems with computer vision; you have all the problems of large language models. Voice fraud is a big problem,” Puri explained.

Markus Kasanmascheff
Markus Kasanmascheff
Markus is the founder of WinBuzzer and has been playing with Windows and technology for more than 25 years. He is holding a Master´s degree in International Economics and previously worked as Lead Windows Expert for Softonic.com.

Recent News