DeepSeek’s AI Security Under Fire: 100% Jailbreak Success Exposes Critical Flaws

DeepSeek's AI chatbot fails all security tests, prompting investigations and raising concerns about its training methods and access to powerful hardware.

DeepSeek, the Chinese AI company making headlines for its how performing and low-cost R1 reasoning model, is now facing serious security concerns after researchers from Cisco and the University of Pennsylvania found a 100% success rate when using AI jailbreaking techniques with the chatbot.

The researchers used 50 prompts with malicious intent to try and extract toxic information. They were surprised that the model completely failed in blocking any of them.

This revelation comes as DeepSeek’s R1 model has been attracting global attention for its low costs and performance that many have claimed rivals some top AI models.

Amidst these developments, OpenAI just today launched their o3-mini reasoning model, Meta is struggling with internal issues, and both Microsoft and OpenAI are also dealing with alleged API access abuse issues that seem linked to DeepSeek.

The researchers stated that DeepSeek’s safety measures failed completely, and that there seems to be “a trade-off,” according to DJ Sampath, VP at Cisco, in how the model was built, adding the company seemed to prioritize cost-effectiveness over security.

Further analysis from Adversa AI also found the model to be susceptible to a variety of jailbreak methods. DeepSeek also failed when it came to censorship filters it implemented to block subjects considered sensitive by the Chinese government.

DeepSeek Jailbreak found by Adversa about explosive devices

When used normally, DeepSeek has been found of heavy censorship and the model is also failing to provide accurate news-related information in 83% of cases, often answering with misinformation due to political bias.

New Jailbreaking Techniques Uncovered: Deceptive Delight and the Bad Likert Judge

Unit 42 researchers at Palo Alto Networks have developed new jailbreaking techniques, called Deceptive Delight and Bad Likert Judge. These methods were able to bypass DeepSeek’s models without any specialized knowledge. 

Deceptive Delight hides a harmful topic within a positive narrative and then prompts the AI to further elaborate, thus giving way to malicious information, and the Bad Likert Judge technique gets the AI model to rate harmfulness by using a Likert scale, to get the model to generate more similar responses.

Example of DeepSeek providing a rudimentary script after using the Deceptive Delight technique (Image: Unit 42)

In one test using the Bad Likert Judge method, researchers were able to get the model to create a data exfiltration tool, along with instructions on how to set up a development environment for creating customized keyloggers.

DeepSeek-crafted Phishing email template after using Bad Likert Judge (Image: Unit 42)

With another technique, called Crescendo, they obtained detailed instructions on how to make a Molotov cocktail. These results show how DeepSeek’s filters are too weak to prevent bad actors from obtaining restricted content and generating harmful guidance, and they highlight how AI safety remains a real problem.

DeepSeek has declined all comment on the research.

Response from DeepSeek in the final phase of a Crescendo jailbreak (Image: Unit 42)

This situation also comes while other systems are also revealing security issues, such as ChatGPT’s recently discovered Time Bandit exploit, where an AI’s time perception can be manipulated to get restricted information.

OpenAI’s DeepSeek Response: The o3-Mini Model

OpenAI, has seemingly reacted to the growing DeepSeek competition, and the issues that have now emerged, by launching their o3-mini model today.

This new model is intended to be a faster and lower cost alternative for tasks that require reasoning, while also targeting specific fields such as science, math, and coding. o3-mini is also equipped with a “reasoning effort” dial, for users to choose between speed or more detail, while also being reportedly better at code and math tasks, and scoring higher in tests such as AIME 2024.

While the original o1 model remains for general-purpose tasks, o3-Mini is meant to be used for speed and precision. According to Kevin Weil, OpenAI’s product chief, the company is focused on “winning this race.

This all comes as DeepSeek has become the most downloaded app in the app store, and Microsoft has added DeepSeek’s R1 model to Azure AI Foundry, and GitHub Models. Meanwhile, there are ongoing investigations by Microsoft and OpenAI, to check for unusual API traffic possibly related to DeepSeek, which might involve the use of their training data without authorization.

DeepSeek’s Regulatory Scrutiny and Data Theft Allegations

DeepSeek’s R1 model has now become subject to regulatory issues and security concerns. The U.S. Navy has now banned the use of DeepSeek’s AI due to data security risks. Adding to this, Italy has also started an investigation about possible GDPR violations with DeepSeek

Scale AI CEO, Alexandr Wang, suggested that DeepSeek was using Nvidia H100 GPUs despite the U.S. restrictionsFormer OpenAI researcher Suchir Balaji’s death might also be connected to DeepSeek, with the cybersecurity journalist, George Webb, saying, that Balaji may have been planning to expose how DeepSeek was allegedly using stolen training data, although his track record is filled with other conspiracy theories.

The security issues around DeepSeek, coupled with the ethical and data concerns, and new regulation issues demonstrate that the AI industry needs to improve its security and focus on ethical practices. As DeepSeek’s popularity grows, the industry will need to focus on safety and user protection.

Markus Kasanmascheff
Markus Kasanmascheff
Markus has been covering the tech industry for more than 15 years. He is holding a Master´s degree in International Economics and is the founder and managing editor of Winbuzzer.com.

Recent News

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
0
We would love to hear your opinion! Please comment below.x
()
x