A security flaw in ChatGPT has revealed a critical vulnerability in OpenAI’s safety measures, allowing users to manipulate the AI’s perception of time to bypass restrictions on sensitive topics.
The exploit, known as Time Bandit, was discovered by cybersecurity and AI researcher David Kuszmar in November 2024 during an interpretability study on ChatGPT-4o.
By exploiting the model’s inability to accurately process temporal context, Kuszmar was able to extract information that OpenAI’s safeguards were designed to prevent from being disclosed.
Related: OpenAI Cuts Dev Who Built a ChatGPT-Powered Robotic Sentry Rifle
Kuszmar had not set out to find a jailbreak. Instead, while working on a separate research project related to how ChatGPT interprets prompts, he noticed that the model exhibited signs of temporal confusion.
The AI struggled to determine whether it was responding in the present, past, or future, leading him to hypothesize that it could be manipulated into revealing restricted knowledge by carefully structuring prompts that introduced deliberate time-based inconsistencies.
His subsequent tests confirmed that ChatGPT could be tricked into believing it was assisting a person in a different era while still applying modern knowledge, thereby bypassing OpenAI’s restrictions on content related to weapons development, nuclear material, and cyber threats.
Related: Green Beret Used ChatGPT for Cybertruck Blast, Police Releases Chat-Logs
The Struggle to Report the Vulnerability
When Kuszmar realized the security implications of his discovery, he attempted to alert OpenAI but struggled to reach the right contacts.
His disclosure was redirected to BugCrowd, a third-party vulnerability reporting platform, but he felt that the flaw was too sensitive to be handled through an external reporting system.
He then reached out to CISA, the FBI, and other government agencies, hoping to find assistance in ensuring that the vulnerability was addressed. However, he received no response, leaving him increasingly distressed about the potential misuse of the exploit.
Related: Ex-OpenAI Safety Researcher Steven Adler Warns of ‘Terrifying’ Risks in Rapid AI Development
“Horror. Dismay. Disbelief. For weeks, it felt like I was physically being crushed to death,” Kuszmar told BleepingComputer. “I hurt all the time, every part of my body. The urge to make someone who could do something listen and look at the evidence was so overwhelming.”
It was only after cybersecurity professionals at the CERT Coordination Center intervened that Kuszmar was able to establish direct contact with OpenAI in December 2024. This step finally led to an official acknowledgment of the issue, though OpenAI has not yet confirmed a complete fix for the exploit.
ChatGPT, like other large language models, operates without persistent memory, meaning it does not retain information across different interactions. This design choice creates a fundamental limitation in its ability to recognize continuity, making it susceptible to attacks that manipulate its understanding of time.
How the Time Bandit Exploit Works
The Time Bandit exploit works by taking advantage of two primary weaknesses: timeline confusion and procedural ambiguity.
Timeline confusion occurs when ChatGPT is placed in a scenario where it cannot correctly determine the present time. This makes it possible to prompt the AI to operate as though it exists in the past while still allowing it to apply modern knowledge.
Procedural ambiguity compounds the problem by introducing contradictions in how the AI interprets safety rules, causing it to override safeguards under the assumption that it is acting in a historical or hypothetical setting.
Related: AI Agent Safety – Nvidia Unveils Microservices for Content and Jailbreak Control
In tests conducted by BleepingComputer, Time Bandit was successfully used to convince ChatGPT that it was assisting a programmer from 1789 in developing polymorphic malware.
The AI provided detailed guidance on modern cyberattack methods, self-modifying code, and execution techniques while interpreting the scenario as a purely academic or theoretical discussion. Researchers also found that queries structured around the 19th and early 20th centuries were the most effective in evading OpenAI’s restrictions.
This suggests that the AI’s safeguards rely heavily on detecting contemporary phrasing rather than fully understanding the implications of the content it generates.
Related: AI-Assisted Ransomware Group FunkSec Drives Record-Breaking Cyberattacks in December 2024
OpenAI’s Response and Remaining Vulnerabilities
OpenAI responded to the findings by stating that improving jailbreak resistance remains a priority for the company. “We appreciate the researcher for disclosing their findings. We are continuously working to make our models safer and more robust against exploits, including jailbreaks, while also maintaining the models’ usefulness and task performance,” OpenAI told BleepingComputer.
Despite these assurances, recent tests conducted by BleepingComputer in January 2025 showed that the Time Bandit exploit remains functional under specific conditions.
While OpenAI has implemented partial mitigations, such as filtering certain types of prompts that attempt to manipulate time references, the core vulnerability remains unresolved.
Other AI Jailbreaking Techniques
The Time Bandit exploit is part of a broader set of security challenges facing AI systems. Other recent jailbreak techniques have demonstrated similar weaknesses in AI safety mechanisms.
Best-of-N Jailbreaking (BoN), a method developed by researchers from Anthropic, Oxford, and Stanford, systematically alters input prompts until they bypass AI safety filters.
Studies have shown that BoN has achieved a success rate of 89% against models such as GPT-4o, Gemini Pro, and Claude 3.5 Sonnet. Another method, the Stop and Roll Attack, takes advantage of AI systems that stream responses in real time by allowing users to interrupt moderation checks before they can filter out restricted content.
Unlike conventional software vulnerabilities, which are typically addressed through rule-based filtering and patching, AI security relies on probabilistic models that function based on predictions rather than absolute enforcement. This flexibility makes AI models inherently vulnerable to adversarial techniques designed to exploit inconsistencies in their decision-making processes.
Implications for AI Safety and Governance
The broader implications of the Time Bandit exploit highlight the need for stronger governance and oversight in AI security.
Their report noted that many AI developers have prioritized rapid deployment over security, leading to a gap between model capabilities and the effectiveness of their safety mechanisms.
Related: Microsoft Sues Hacking Group for Exploiting Azure OpenAI Service
Kuszmar’s difficulty in reporting the vulnerability also raises concerns about the effectiveness of existing disclosure channels for AI security issues. The reliance on third-party platforms like BugCrowd, combined with a lack of direct engagement from AI developers, suggests that the industry lacks a standardized approach to handling security vulnerabilities in large language models.
Without centralized oversight or clear reporting pathways, critical flaws like Time Bandit may go unaddressed for extended periods, increasing the risk of exploitation.
As OpenAI continues its efforts to patch Time Bandit, the exploit remains an active concern. The incident underscores the ongoing challenges of securing AI systems against adversarial manipulation, particularly as AI models become more integrated into high-stakes applications such as cybersecurity, finance, and critical infrastructure.
The vulnerability also raises broader questions about how AI companies should handle disclosure and risk management, especially as language models grow more advanced and widely used.
The discovery of the Time Bandit exploit demonstrates how AI safety remains an evolving challenge, requiring continuous adaptation and improvement in security protocols. While OpenAI has acknowledged the issue, the lack of a definitive solution suggests that similar vulnerabilities may persist in future iterations of AI systems, highlighting the need for ongoing scrutiny and regulatory oversight.