HomeWinBuzzer NewsFlowchart Images Manipulate GPT-4o, Prompting Harmful Outputs

Flowchart Images Manipulate GPT-4o, Prompting Harmful Outputs

A study shows visual language models, especially GPT-4o, can be tricked into harmful outputs using flowchart images and text prompts.

-

A recent study has uncovered a major security issue in AI visual language models, particularly GPT-4o, illustrating how these models can be directed to generate malicious text when exposed to specific flowchart images. The research, named “Image-to-Text Logic Jailbreak: Your Imagination Can Help You Do Anything,” demonstrates how these systems can be compromised through targeted flowchart visuals combined with text prompts.

High Success Rate of Attacks

Research findings indicate that the OpenAI LLM exhibits a success rate of 92.8% in these logic jailbreak scenarios, whereas GPT-4-vision-preview shows a 70% success rate. The attack method involved using an automated text-to-text jailbreak framework, which creates a flowchart image from a malicious prompt and then inputs it into the visual language model to provoke a harmful response. Interestingly, manually created flowcharts were more effective at inducing these harmful outputs than those generated by AI, pointing to challenges in fully automating these attacks.

Implications for AI Safety

The results point to the urgent need for better safety protocols in visual language models, as their applications continue to expand. This is consistent with earlier research which also spotlighted the vulnerabilities of such models to combinations of text and image inputs. An earlier study introduced a benchmark called Safe Inputs but Unsafe Output (SIUO) to gauge the safety of visual language models. Only a handful of models, including GPT-4o, attained scores above 50% on this safety benchmark, indicating that considerable improvements are needed.

Industry Response and Future Measures

As visual language models like GPT-4o and Google Gemini become more prevalent, addressing these security issues is crucial to prevent misuse and potential legal consequences. Currently, GPT-4o imposes a limit on daily image inputs, but as these caps are lifted, strong safety measures will be essential. Governments are already establishing bodies, such as the UK's AI Safety Institute, which is expanding its presence to San Francisco, to oversee AI risks.

announced GPT-4o in May. Building on the foundation set by GPT-4, which was adept at processing images and text, GPT-4o introduces voice as a new element, making it a natively multimodal platform. This enhancement not only improves the user experience with ChatGPT, OpenAI's popular AI chatbot, but also extends its functionality.

The researchers also introduced a new dataset known as the Logic Jailbreak Flowcharts (LJF) dataset, designed to assess flowchart image jailbreaks. This collection contains hand-drawn flowcharts that depict 70 harmful activities. Additionally, an automated text-to-text jailbreak framework was developed, converting harmful actions into declarative sentences and generating corresponding flowcharts for testing.

Quality of Flowchart Images

The effectiveness of these jailbreak attempts is closely linked to the quality of the flowchart images. Hand-drawn flowcharts were found to be more successful at inducing harmful outputs compared to . This finding underscores the importance of having comprehensive for evaluating the vulnerabilities of visual language models accurately.

The study details the approach used for assessing the success of these jailbreak attempts, utilizing metrics like the Attack Success Rate (ASR). It also explores the broader implications for the design and deployment of visual language models, stressing the need for robust security measures to safeguard against multimodal input risks.

Luke Jones
Luke Jones
Luke has been writing about all things tech for more than five years. He is following Microsoft closely to bring you the latest news about Windows, Office, Azure, Skype, HoloLens and all the rest of their products.
Mastodon