Tech Expert and IT columnist Mark Pesce has identified a major flaw affecting a wide range of large language models (LLMs), including those used in popular AI chatbots like ChatGPT, Microsoft Copilot, and Google Gemini. The flaw, triggered by a seemingly simple prompt, causes the models to produce incoherent and endless output, raising concerns about the stability and reliability of these AI systems.
Nonsensical and Continuous Output
As Pesce writes in his article for The Register, the issue was discovered when he attempted to create a prompt for an AI-based classifier. The classifier was intended to assist an intellectual property attorney by automating tasks that required subjective judgments. When tested on Microsoft Copilot Pro, which uses OpenAI‘s GPT-4 model, the prompt caused the chatbot to generate nonsensical and continuous output. Similar behavior was observed across other AI models, including Mixtral and several others, with the exception of Anthropic’s Claude 3 Sonnet. Pesce writes:
“I set to work on writing a prompt for that classifier, beginning with something very simple – not very different from a prompt I’d feed into any chatbot. To test it before I started consuming expensive API calls, I popped it into Microsoft Copilot Pro. Underneath the Microsoft branding, Copilot Pro sits on top of OpenAI’s best-in-class model, GPT-4. Typed the prompt in, and hit return.
The chatbot started out fine – for the first few words in its response. Then it descended into a babble-like madness.”
Industry Response and Challenges
Pesce reported the issue to various AI service providers, including Microsoft and Elon Musk´s xAI, who are behind the Grok AI products. xAI confirmed the replication of the flaw across multiple models, indicating a fundamental problem rather than an isolated bug. However, the response from other companies was less encouraging. Microsoft’s security team dismissed the issue as a non-security-related bug, while other prominent AI firms failed to respond adequately or at all. Some companies had no direct contact information for reporting such critical issues, highlighting a significant gap in their customer support and security processes.
Implications for AI Development
The discovery underscores the potential risks associated with the rapid deployment of AI technologies without robust support and feedback mechanisms. The lack of a clear channel for reporting and addressing bugs in these systems poses a threat to their reliability and security. Industry experts stress the need for AI firms to establish efficient processes for handling customer feedback and resolving issues promptly. Until these measures are in place, the safety and dependability of AI-driven applications remain in question.
Pesce’s experience points to a broader issue within the AI industry: the necessity for more rigorous testing and better communication between developers and users. As AI continues to integrate into various aspects of daily life and business, ensuring these systems are both effective and secure is paramount.