HomeWinBuzzer NewsAnthropic Unveils "Many-Shot Jailbreaking" Technique in AI Models

Anthropic Unveils “Many-Shot Jailbreaking” Technique in AI Models

A new AI attack called "many-shot jailbreaking" tricks large language models into answering harmful questions.


Researchers at Anthropic have unveiled a novel vulnerability in large language models (LLMs), which they have termed “many-shot jailbreaking.” Their findings, encapsulated in a recently published paper, highlight a method by which an AI can be manipulated into providing responses to queries it is typically programmed to reject. The technique involves priming the AI with a series of less harmful questions before introducing the inappropriate request. This discovery has prompted immediate sharing of information within the AI research community to foster mitigation efforts.

Understanding the “Many-Shot Jailbreaking” Technique

The vulnerability exploits the expanded context window of the latest LLMs, which refers to the volume of data these models can process and retain over short periods. Previously limited to a few sentences, this window now encompasses thousands of words, enabling the AI to remember and reference a much larger corpus of information. Anthropic’s researchers discovered that LLMs improve their performance on tasks when provided with numerous examples within their context window. Consequently, when the AI is asked a series of questions leading up to an inappropriate query, it becomes progressively more likely to respond affirmatively to the harmful request.

Efforts Towards Mitigation and Future Concerns

In response to this discovery, Anthropic has informed not only its peers but also competitors, aiming to initiate a collaborative approach to addressing this and similar vulnerabilities. While reducing the context window size has been identified as a potential mitigation strategy, this solution could adversely affect the AI’s overall performance. The team is exploring alternative methods, such as classifying and contextualizing queries prior to processing, to prevent exploitation without diminishing the model’s capabilities. This ongoing challenge underscores the complexity of ensuring AI security and ethical compliance in an evolving technological landscape.

Anthropic Driving Amazon’s AI Ambitions

In other Anthropic news this week, Amazon extended its investment into the AI research firm. This move, announced in September of the previous year, signifies Amazon’s largest financial commitment to another entity to date. The total investment could reach up to $4 billion, underlining the importance of advanced large language models to the tech giant.

Amazon poured a significant amount of money into Anthropic, but the deal is structured to keep Amazon’s influence limited. They only hold a minority stake in the company and don’t have any representatives on the board. This setup likely reflects the current regulatory climate that makes big tech acquisitions more challenging. As part of the agreement, Anthropic has committed to spending a hefty $4 billion on Amazon’s cloud services, AWS, over the coming years. This mirrors a similar arrangement between Microsoft and OpenAI, although interestingly, Microsoft does have a non-voting position on OpenAI’s board.

Luke Jones
Luke Jones
Luke has been writing about all things tech for more than five years. He is following Microsoft closely to bring you the latest news about Windows, Office, Azure, Skype, HoloLens and all the rest of their products.