Anthropic appears ready to bring voice conversations to its Claude AI assistant mobile application, with the feature reportedly functional internally and nearing a public debut. The coming Claude voice mode signals Anthropic’s move into the bustling arena of voice-driven AI interaction, currently populated by offerings from OpenAI, Google, Meta, and others.
The initial implementation takes a controlled approach to conversation. Unlike systems striving for completely fluid, human-like back-and-forth, Claude’s voice mode operates on a push-to-talk basis, just like ChatGPT before the launch of Advanced Voice Mode.
This means users articulate their query or statement, then manually tap a send button to have the AI process the audio. This method prevents the AI from cutting users off but sacrifices the ability for spontaneous interjections or clarifications mid-thought. Early reports suggest the system handles voice input reliably, even with natural pauses, but requires the user to hold their device during interaction.
BREAKING 🚨: Claude's voice mode is now fully functional and supports web search and file uploads.
— TestingCatalog News 🗞 (@testingcatalog) May 4, 2025
It comes with push-to-talk and scrollable text views. It will be quite a big upgrade for the Claude mobile app!
* Not available to the public yet pic.twitter.com/LHsXEQtHqL
Features Beyond Chat
Beyond basic voice chat, the upcoming Claude mode integrates several functionalities. It will offer four voice options – two categorized as male, two as female – allowing some user preference.
The assistant can tap into web search to answer queries, presenting cited sources alongside the spoken response within the app’s interface. This interface displays the conversation history as scrollable, paginated text.
Perhaps most notably, the voice mode supports file uploads, enabling users to provide images or PDF documents and then discuss their contents directly with the AI through voice commands, a capability Google is also developing for Gemini Live. The feature follows the recent global availability of web search within the Claude mobile app in March.
Anthropic is continuing to develop their voice mode adding "Glassy", the latest voice in the newest Claude app update.
— M1 (@M1Astra) April 29, 2025
I'm thinking this will be the most popular voice. https://t.co/NPQfVUW6pj pic.twitter.com/dOjYp52BXK
The Conversational AI Arena
Claude’s push-to-talk interaction model sets it apart from competitors actively working on more dynamic dialogue flow. OpenAI refined ChatGPT’s Advanced Voice Mode to better handle user pauses without interruption, aiming for smoother exchanges.
Meta, meanwhile, detailed tests in April of an experimental “full-duplex” voice mode for its Llama 4-powered Meta AI app, specifically designed to accommodate overlapping speech, though this beta was limited. Full-duplex systems attempt to allow both parties (human and AI) to speak simultaneously, much like a natural phone call.
The difficulty in perfecting natural conversational pacing was underscored by Sesame AI’s March 2025 demo of a voice model so realistic—complete with hesitations and stumbles—that it unnerved some testers, yet reportedly still faced challenges with organic turn-taking.
Anthropic’s approach also differs in multimodal input compared to some rivals. While Claude users can upload static files like PDFs and images for discussion, Google’s Gemini Live gained features in March allowing real-time analysis of live smartphone camera feeds and on-screen content. OpenAI had previously added live video support to ChatGPT’s voice mode in December 2024.
Access and Ethics in Voice AI
How users will access Claude’s voice mode remains unspecified, but the market shows varied strategies. OpenAI began offering free-tier users limited daily previews of its Advanced Voice Mode (using the less capable GPT-4o mini model) in February, reserving unrestricted access via the full GPT-4o model for paying subscribers. This tiered strategy contrasts sharply with Microsoft, which, in the same month, made its Copilot voice features entirely free.
On the voice model side of things, Amazon’s Nova Sonic model, launched in April with a focus on expressive, real-time speech-to-speech synthesis, is available to developers via its Bedrock platform. Speech-to-speech models aim to translate spoken input directly to spoken output, potentially reducing latency and capturing more vocal nuance compared to traditional speech-to-text-to-speech pipelines.
Google’s Chirp 3 HD voice model, integrated into Vertex AI in March, also targets developers, offering customizable voice styles and an “Instant Custom Voice” feature that raises ethical questions about consent for voice replication.
The personality and boundaries of voice assistants are also diverging. xAI’s Grok 3 voice mode, launched February 2025 for X Premium+ subscribers, notoriously includes an “Unhinged” option permitting swearing, insults, and explicit chat, reflecting a philosophy of minimal restriction quite different from the typically moderated outputs of mainstream assistants.
The pursuit of extreme realism, as seen with Sesame AI, also brings potential risks like sophisticated voice cloning for scams, prompting discussions about whether AI voices should retain artificial markers. OpenAI itself encountered ethical turbulence when it had to withdraw a voice option in May 2024 due to its perceived similarity to actress Scarlett Johansson.