HomeWinBuzzer NewsOpenAI GPT-4 Demonstrates High Success Rate in Exploiting Real-World System Flaws, Study...

OpenAI GPT-4 Demonstrates High Success Rate in Exploiting Real-World System Flaws, Study Shows

Researchers found a large language model (LLM) can exploit real-world flaws with security advisories.


A team of computer scientists from the University of Illinois Urbana-Champaign has unveiled findings that OpenAI's GPT-4 possesses the capability to autonomously exploit vulnerabilities in real-world systems. The study focuses on the model's ability to interpret and act upon information from CVE (Common Vulnerabilities and Exposures) advisories, demonstrating an 87% success rate in exploiting a dataset of 15 one-day vulnerabilities, some of which are classified as critical.

Comparative Analysis with Other Models

The research contrasts 's performance with that of other models and tools, including GPT-3.5 and various open-source LLMs and vulnerability scanners such as ZAP and Metasploit, none of which were able to exploit the vulnerabilities. The study did not include tests on two commercial rivals of GPT-4, Anthropic's Claude 3 and Google's Gemini 1.5 Pro, due to lack of access, but plans for future testing are in place.

Implications and Cost Efficiency

The implications of these findings are significant, suggesting that future models could surpass the capabilities currently accessible to less skilled cyber attackers, commonly referred to as “script kiddies.” The researchers, led by assistant professor Daniel Kang, advocate for proactive security measures over restricting public access to security information, a stance supported by the broader security research community. The study also highlights the cost efficiency of using an LLM agent for exploitation, estimating the cost at $8.80 per exploit, significantly lower than hiring a human penetration tester.

Challenges and Limitations

Despite the high success rate, GPT-4 encountered challenges with certain vulnerabilities, particularly those with complex web interfaces or non-English descriptions, underscoring the limitations of current LLMs in understanding and navigating diverse data formats and languages. The research team's work builds upon previous studies on automating attacks using LLMs, pushing forward the conversation about the potential risks and the need for robust defenses in the era of advanced AI technologies.

Luke Jones
Luke Jones
Luke has been writing about all things tech for more than five years. He is following Microsoft closely to bring you the latest news about Windows, Office, Azure, Skype, HoloLens and all the rest of their products.

Recent News