HomeWinBuzzer NewsGoogle’s Gemini Live to Offer Voice Discussions on Files

Google’s Gemini Live to Offer Voice Discussions on Files

Google's Gemini Live could soon allow voice discussions about uploaded files, according to recent beta code findings—learn what this means for users.

-

Google’s latest beta version for Android hints at an upcoming voice feature in Gemini Live, its AI assistant. Details discovered in beta version 15.45.33.ve.arm64 suggest that users might soon have the ability to engage in spoken discussions about uploaded files.

Android Authority uncovered text strings like “Open Live” and “Talk about attachment,” indicating that this new capability could be on the way. This development points to Google’s continued effort to enhance its multimodal AI tools by integrating voice commands into file management.

Beta Version Details Suggest New Functionality

Code from the beta shows that Gemini Live may soon be able to detect when a file is uploaded and suggest starting a voice session for discussing it. Phrases such as “Open Live with attachment” and “Talk about attachment” were noted, signaling that users might soon be prompted to handle files in a hands-free manner.

While the current Gemini interface already allows uploads for text-based analysis or edits, adding voice interaction would improve usability, especially on mobile devices where spoken commands can be more practical.

Gemini Live: Background and Features

Gemini Live was introduced in August 2024, alongside the Pixel 9 series, aiming to provide dynamic voice interactions. Initially limited to Advanced users, the AI assistant was made available to a wider audience on Android in September.

Accessing Gemini Live involves tapping an icon on the interface or activating it with the “Hey Google” command. Notably, the feature supports multitasking, running in the background and storing conversation transcripts for later review.

The tool offers ten distinct voice profiles, such as “Ursa” with its mid-range tone and “Capella,” which carries a British accent. However, feedback has pointed out that it lacks adaptability in terms of pitch and speed, making it feel robotic compared to competitors like OpenAI’s Advanced Voice Mode. This feedback highlights the challenge of creating AI interactions that sound truly natural.
 

Early Reception and Ongoing Issues

Despite its promising start, Gemini Live has had a mixed reception. Initial reviews cited interruptions in voice streaming and challenges with accurately recognizing user input. The issue of AI “hallucinations”—where models confidently provide incorrect information—has also been a persistent problem.

Users reported cases where the assistant offered outdated recommendations or delivered overly general responses during specific tasks like job interview simulations.

Additionally, the voice version of Gemini Live has been noted for lacking the integration capabilities found in Google’s text-based AI, such as email summarization or playlist management. These limitations have placed it behind some competitors when it comes to delivering comprehensive user experiences.

Google’s Push Amidst Competitive Pressures

The move to enhance Gemini Live with voice-based file interactions comes as part of a broader strategy to remain competitive in the evolving AI landscape. OpenAI has launched its o1 model this summer, also known as Strawberry, which employs advanced algorithms for improved reasoning.

The model, while promising, still faces challenges with accuracy and AI “hallucinations.” OpenAI has also introduced real-time web search and autocomplete functions for ChatGPT, providing timely information access to users on paid plans.

Apple, meanwhile, continues to face challenges in its AI offerings. An internal report from October 2024 revealed that Siri lags 25% behind ChatGPT in terms of accuracy. This gap, despite Apple’s strong hardware base, highlights the company’s struggle to match leading AI models. Amazon is also looking to bolster its AI game with the anticipated launch of Alexa Plus, a premium version aimed at addressing complex user queries more effectively.

Gemini Live’s current beta updates point to Google’s intention to enhance how users interact with their documents through voice, aligning with its multimodal vision for AI tools. If released, this feature would allow users to command and query files without typing, making it especially useful on devices where voice input is preferable.

Markus Kasanmascheff
Markus Kasanmascheff
Markus has been covering the tech industry for more than 15 years. He is holding a Master´s degree in International Economics and is the founder and managing editor of Winbuzzer.com.

Recent News

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
0
We would love to hear your opinion! Please comment below.x
()
x
Mastodon