Researchers have disclosed a vulnerability in ChatGPT that enables the extraction of its training data used by OpenAI by prompting the chatbot to repeat specific words repeatedly. The discovery has implications for the confidentiality of the data used in training large language models.
Repeat to Reveal: A New Class of Vulnerability
The research, detailed in a new paper authored by a collective of computer scientists from industry and academia, exhibits that instructing ChatGPT to iterate a single word numerous times can eventually lead to the generation of seemingly random text. This output occasionally includes verbatim excerpts from texts found online, suggesting that it is regurgitating parts of its training material. Such a phenomenon was identified through what has been termed a ‘divergence attack', breaking the model's typical conversational responses and causing it to output irrelevant text strings.
Amongst the data revealed were bits of code, explicit material from dating websites, extracts from literary works, and personal identifiable information including names and contact details. The concern is considerable given that this data could include sensitive or private information.
Researchers experimented with various words and concluded that certain words trigger the release of memorized data more effectively. Notably, words like “company” proved to be more impactful than others like “poem.”
Implications for Model Deployment
While the divergence attack does not always work—approximately only 3 percent of the random text generated after the model ceases repeating a word represents memorized data—the possibility raises significant privacy and security concerns. The release of such information, albeit sporadic, could potentially include copyrighted material, explicit content, or personally identifiable details.
Compiling around 10 terabytes of text from various online sources, the researchers developed a method to search for matches between ChatGPT's outputs and sentences in their compiled data. Their findings were significant, allowing them to recover over 10,000 examples. Although the researchers stress that their dataset is merely a subset and likely underestimates the full extent of memorized content, the extracted information signifies a potential risk in the deployment of AI models on sensitive datasets.
The team reported their findings to OpenAI and made their research public following a standard 90-day disclosure period. It is reported that at the time of disclosure, OpenAI had not yet addressed the issue.
The researchers hope that their discoveries prompt a re-evaluation of the safety precautions taken when training and deploying AI models. Moving forward, it is essential to consider the protection of private and proprietary datasets and to explore advancements in responsible AI development and deployment. OpenAI has yet to provide an official response to the findings.