OpenAI is currently releasing its advanced Voice Mode to select users of ChatGPT Plus. With this feature, OpenAI aims to create more fluid, real-time dialogues within the mobile app available for both iOS and Android. The company initially unveiled the capability of voice interactions through the release of its GPT-4o AI model.
With the new voice feature, ChatGPT can now engage in conversations better suited to human interaction, recognizing tonal variations like humor and sarcasm. This upgrade bypasses the need for speech-to-text translation, significantly reducing interaction lag.
Beginning and Controversies
Unveiled during OpenAI's Spring Update event in May 2024, the feature initially used a voice named Sky, which resembled Scarlett Johansson. Johansson objected, revealing she had declined to lend her voice for ChatGPT.
During the demonstration at its event, OpenAI showed the voice assistant's capability to solve math problems almost instantaneously. However, the feature raised legal issues due to the default “Sky” voice's similarity to Scarlett Johansson's voice, which led to its removal following legal scrutiny. OpenAI consequently removed Sky and delayed Voice Mode, affirming it was not meant to imitate her voice.
Post-demo, OpenAI has worked on improving voice interaction safety and credibility. The system restricts itself to four pre-defined voices, deliberately avoiding celebrity voice replication. Safeguards also prevent the AI from handling requests involving violent or copyrighted material.
Gradual User Rollout and Prospective Plans
Selected ChatGPT Plus users will receive emails with access details for the new feature. OpenAI aims to extend the rollout through the fall, eventually offering it to all Plus subscribers. Demonstrations have shown the feature's usefulness in education, fashion advice, and assistance for the visually impaired. These scenarios accentuate the practical advantages of more natural AI communication.
The launch occurs amidst growing concerns over AI misuse in fraud or impersonation. While the new mode doesn't enable voice cloning, there is still potential for deceptive use in interactions where the AI nature isn't immediately clear.
OpenAI's Focus and Future Goals
Mira Murati, OpenAI's Chief Technology Officer, discussed the feature's collaborative potential on X, underlining a commitment to making AI useful and cooperative. OpenAI emphasized safety protocols, mentioning extensive testing by over 100 external reviewers across 45 languages.
Today we're rolling out advanced Voice Mode in ChatGPT to small group of Plus users, all Plus users in fall. Richer and more natural real-time conversations make the technology less rigid — we've found it more collaborative and helpful and think you will as well. https://t.co/AXwOBQW67N
— Mira Murati (@miramurati) July 30, 2024
This development distinguishes OpenAI amid competitors like Meta's Llama model and Anthropic's Claude while presenting a challenge to startups that specialize in expressive voice AI, like Hume.