Google's latest AI-driven voice assistant, Gemini Live, has encountered scrutiny regarding its reliability and effectiveness in maintaining user engagement. The new feature, built on the generative AI models Gemini 1.5 Pro and 1.5 Flash, was intended to deliver a more authentic conversational experience, but it has been hampered by various technical and interaction-related issues.
Technical Hiccups
Gemini Live seeks to replicate human conversation with voice profiles like Ursa, characterized by a mid-range tone. However, it lacks adaptability in aspects like pitch, timbre, or speed, limiting user personalization. Unlike OpenAI's Advanced Voice Mode, Gemini Live does not simulate human-like attributes such as laughter, breathing, or pauses, contributing to a more robotic user experience.
Many users have reported technical problems, including frequent interruptions in the voice stream and difficulty in recognizing user inputs accurately. Moreover, Gemini Live lacks numerous integrations present in Google's text-based Gemini chatbot, such as email summarization or managing YouTube playlists.
Interaction Limitations
Practical applications of Gemini Live have highlighted several weaknesses. For example, during mock job interviews, the assistant gave generic responses, missing the mark on delivering tailored advice. It also demonstrated a tendency to provide false information confidently, a behavior known as “hallucinating.” When asked for economical activities in New York City, it suggested defunct venues like a nightclub that had been closed since 2019.
The overall user experience with Gemini Live is complicated by its inability to handle interruptions gracefully, often continuing to speak even when it detects a person might be trying to interject. When this happens, it leads to confusion and difficulty maintaining coherent conversations due to the AI's sudden shifts in dialogue topics.
Potential Improvements
Looking ahead, Google plans to enhance Gemini Live with capabilities such as interpreting images and real-time video. Currently, the assistant is available only through Google's $20-per-month Google One AI Premium Plan. Despite ambitious plans, its current limited functionality and reliability issues place it behind text-based chatbots in attractiveness to users.
Gemini Live's responses tend to lack depth, especially on current events or controversial issues. Its answers can be long-winded or vague. For instance, it initially made a critical comment about the focus on mental health awareness but retracted the statement upon further questioning, highlighting its inconsistent performance.
While Gemini Live signifies a move forward in AI voice technology, it necessitates considerable enhancements to rectify its current limitations in reliability and engagement. Google will continue to update the assistant and improve its utility, but user's initial experiences invite skepticism about its practicality.