HomeWinBuzzer NewsOpenAI Delays ChatGPT Advanced Voice Mode Due to Safety Concerns

OpenAI Delays ChatGPT Advanced Voice Mode Due to Safety Concerns

The voice assistant might not be available to all ChatGPT Plus users until the fall, depending on the results of internal safety and reliability checks.


has postponed its planned release of a GPT-4o voice assistant for ChatGPT, dubbed “advanced voice mode“. The extra time will be used to conduct further safety and performance checks. Initially, the new feature was set to debut for ChatGPT Plus subscribers by the end of June, but this has now been extended by at least a month.

The voice assistant aims to broaden ChatGPT's functionalities, enabling interactions via a conversational voice mode that replaces the current voice recognition and transcription feature. The coming voice mode strives to improve accessibility and user interactions by creating a more natural communication method.

Safety and Performance Concerns

OpenAI initially showcased the voice assistant feature in May, describing it as an “advanced voice mode” capable of nearly real-time responses. The company had planned to release this feature to a limited group of users in late June but postponed it due to unresolved issues.

OpenAI stated on its official Discord server that the delay is partly due to efforts to improve the model's ability to detect and refuse certain types of content.

Additionally, the company is working on enhancing user experience and preparing its infrastructure to handle a large number of users while maintaining real-time responses. The voice assistant might not be available to all Plus users until the fall, depending on the results of internal safety and reliability checks.

OpenAI ChatGPT voice mode delay announcemnt

Video and Screen-Sharing Rollout Unaffected

The delay does not affect the rollout of new video and screen-sharing capabilities, which include solving math problems from images and explaining device settings menus. These features are designed to work across both smartphone and desktop clients, including the macOS app. The advanced voice mode aims to understand and respond with emotions and nonverbal cues, moving towards more natural conversations with AI.

During a demonstration at its May event, OpenAI showed the voice assistant's capability to solve math problems almost instantaneously. However, the feature raised legal issues due to the default “Sky” voice's similarity to Scarlett Johansson's voice, which led to its removal following legal scrutiny.

OpenAI's announcement highlights the essential role of thorough testing and validation when deploying new technologies. The company is focused on refining the voice assistant to ensure it meets user expectations. OpenAI says that the additional time will be utilized to address potential problems and enhance the feature's overall performance.

Multimodal AI Models Can Produce Unsafe Output

As we reported earlier, research shows how multimodal AI models such as OpenAI's GPT-4V and GPT-4o, as well as Gemini 1.5, often produce unsafe outputs when dealing with combined image and text inputs.

There is concern that these models could generate harmful or improper content. Unlike single-modality models, the blend of multiple data types in makes it challenging to predict and manage outputs, posing risks, especially in sensitive areas like healthcare, finance, and autonomous systems.

Markus Kasanmascheff
Markus Kasanmascheff
Markus is the founder of WinBuzzer and has been playing with Windows and technology for more than 25 years. He is holding a Master´s degree in International Economics and previously worked as Lead Windows Expert for Softonic.com.