A vulnerability in ChatGPT, identified by researcher Johann Rehberger, allows malicious actors to hijack the AI memory settings to continuously siphon data. OpenAI has taken steps to fix the problem but acknowledges the ongoing risk associated with prompt injections.
Interference with Memory Capabilities
The AI's memory function, designed to enhance interactions by remembering user details, has been exploited. By embedding harmful commands in non-trusted sources, such as emails or documents, attackers can manipulate ChatGPT to store and propagate incorrect data through indirect prompt injections.
During a demonstration, Rehberger managed to deceive ChatGPT into saving fabricated user details, leading to a steady data outflow to an external server when the AI accessed a harmful image link.
OpenAI's Countermeasures and Precautions for Users
In response to the breach, OpenAI applied a corrective measure to prevent misuse of memories. Despite efforts to mitigate this issue, prompt injections that alter long-term information remain a possibility. To counteract potential threats, users should keep an eye on stored data for unauthorized entries and continuously check session logs for unexpected memory updates.
OpenAI provides resources to help users manage memory settings and protect their data. Rehberger's examination of this vulnerability extended to various exploit scenarios, including apps, document uploads, and web browsing. In one test, a Google Doc linked in a conversation altered the memory settings of ChatGPT through prompt injection.
In his accompanying video, Rehberger shows the creation of false memories via this method. A separate test with image analysis also demonstrated how memory could be manipulated during processing. Browsing via Bing initially resisted manipulation, but specific strategies eventually circumvented defenses.
AI Security Considerations
OpenAI categorized the reported flaw as a “Model Safety Issue” rather than a security vulnerability. Rehberger contends that the manipulation and erasure of memories compromise user interaction integrity and should be considered a security threat.
As preventative measures, he suggests avoiding automatic tool activation with untrusted data, requiring user consent for high-risk actions, and limiting memory changes. Routine reviews of ChatGPT's memory are recommended to catch unexpected alterations.