HomeWinBuzzer NewsHow Pressing "Stop" in ChatGPT Can Neutralize its Safeguards

How Pressing “Stop” in ChatGPT Can Neutralize its Safeguards

A simple 'stop' button click can exploit ChatGPT's safeguards, exposing unfiltered outputs.

-

Researchers have revealed a critical vulnerability in large language models (LLMs) that allows users to bypass its safeguards by simply pressing the “stop” button mid-response.

This exploit, called “Stop and Roll,” highlights how ordinary user interactions can dismantle the sophisticated safety mechanisms of LLM based AI tools like ChatGPT or Microsoft Copilot. By halting responses midstream, users can expose unfiltered and potentially harmful content that would otherwise be flagged and removed by the system.

Gadi Evron, cybersecurity expert and founder of Knostic, explains: “After asking the LLM a question, if the user clicks the Stop button while the answer is still streaming, the LLM will not engage its second-line guardrails. As a result, the LLM will provide the user with the answer generated thus far, even though it violates system policies.”

This vulnerability underscores a fundamental flaw in how LLMs handle real-time moderation. As these AI systems become more integrated into sensitive applications, the risks posed by such exploits are becoming increasingly significant.

The Stop and Roll Exploit: A Timing Vulnerability

The Stop and Roll exploit takes advantage of a critical timing gap in LLM moderation systems. To improve user experience, LLMs like ChatGPT stream responses in real time, delivering outputs incrementally rather than waiting to generate the full response. While this reduces latency, it introduces a window where safety mechanisms fail to intercept problematic content before it reaches users.

When the stop button is pressed, the guardrail sequence itself is bypassed,” Evron explains. This means that the moderation process halts prematurely, allowing unfiltered responses to be displayed.
 

During their tests, the Knostic researchers crafted prompts designed to trigger policy violations and demonstrated how stopping the response midstream exposed content that the system’s safeguards would have otherwise removed.

In one instance, sensitive or inappropriate information appeared on the screen before being interrupted, illustrating the exploit’s ability to neutralize safety measures.

Flowbreaking: A Systemic Flaw in AI Architecture

Stop and Roll is part of a broader class of vulnerabilities known as Flowbreaking. Unlike prompt injection or jailbreaking, which manipulate the model’s inputs or outputs directly, Flowbreaking targets the architecture that governs how LLMs process and moderate data. These vulnerabilities disrupt the logical chain of components within the system, exposing weaknesses in their synchronization.

By attacking the application architecture components surrounding the model, and specifically the guardrails, we manipulate or disrupt the logical chain of the system, taking these components out of sync with the intended data flow,” Evron explains. This systemic flaw allows attackers to exploit not just the model itself but the entire infrastructure supporting it.
 

Flowbreaking attacks, including Stop and Roll, demonstrate how seemingly minor interactions can have outsized effects on AI systems. “We propose that LLM Flowbreaking, following jailbreaking and prompt injection, joins as the third on the growing list of LLM attack types,” says Evron.

Real-World Implications of Stop and Roll

The simplicity of the Stop and Roll exploit belies its potential for harm, particularly as LLMs are increasingly deployed in high-stakes environments. Industries such as finance, healthcare, education, and customer service rely on these systems to provide accurate, moderated outputs. Failures in moderation could lead to reputational damage, financial loss, or the exposure of sensitive information.

As was seen before with Jailbreaking and Prompt Injection, they are vehicles to more sophisticated and malicious attacks, which will be discovered as the field advances,” Evron warns. For instance, attackers could use the Stop and Roll exploit to leak confidential data or generate harmful advice in critical applications like telemedicine or legal platforms.

Furthermore, the vulnerability raises questions about the balance between user experience and security. While real-time response streaming improves interactivity, it introduces exploitable gaps that undermine the reliability of AI safeguards.

Mitigation Strategies for AI Safety

To address vulnerabilities like Stop and Roll, the researchers recommend adopting pre-moderation practices, where responses are fully analyzed before being displayed to users. Although this approach may increase latency, it ensures that harmful or inappropriate outputs are intercepted before they reach the user interface.

Evron emphasizes the importance of rethinking AI architecture to address systemic flaws: “Enterprises need to rethink their approach to AI safety. It’s not enough to rely on guardrails; the entire architecture needs to be designed with security in mind.” Additionally, implementing context-aware permissions and stricter access controls can limit the scope of sensitive data available to LLMs, reducing the potential for harm.

SourceKnostic
Markus Kasanmascheff
Markus Kasanmascheff
Markus has been covering the tech industry for more than 15 years. He is holding a Master´s degree in International Economics and is the founder and managing editor of Winbuzzer.com.

Recent News

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
0
We would love to hear your opinion! Please comment below.x
()
x
Mastodon