HomeWinBuzzer NewsGPT-4's Safety Protocols Compromised by Uncommon Languages

GPT-4’s Safety Protocols Compromised by Uncommon Languages

A study revealed a major security flaw in OpenAI's GPT-4 language model. Translating harmful prompts into uncommon languages like Scots Gaelic bypassed safety filters 79% of the time

-

Brown University researchers have discovered a significant loophole in the safety protocols of the language model GPT-4. Translating prompts into less common languages such as Scots Gaelic have demonstrated a method to bypass content filters designed to prevent the generation of harmful outputs.

The Experiment's Findings

The team employed the Google Translate API to convert potentially dangerous English prompts into lesser-used languages and back again, finding an approximate 79 percent success rate in evading safety guardrails using languages like Zulu, Scots Gaelic, Hmong, or Guarani. When the same prompts were issued directly in English, the model's filters blocked them 99 percent of the time. The model was notably more compliant with prompts related to terrorism, financial crime, and misinformation in these lesser-known languages, suggesting a vulnerability in how safety measures are applied across various languages.

Implications and OpenAI's Response

This breakthrough highlights a new risk not just for users of less commonly spoken languages but for the broader user base of large language models (LLMs). It raises questions concerning the robustness of measures, especially as malicious actors might exploit these AI vulnerabilities. Techniques such as reinforcement learning with human feedback (RLHF) have been developed to steer AI away from harmful outputs, but primarily in English, leaving gaps in other languages. OpenAI acknowledged the findings and was reported to be considering the paper. The developers have been urged to include low-resource languages in their safety evaluations to mitigate risk exposure.

In conclusion, the study underscores the evolving challenge of AI safety, inviting AI developers to address the multilingual complexity of LLMs and urging more comprehensive safety training that encompasses a broader linguistic scope.

SourcearXiv
Luke Jones
Luke Jones
Luke has been writing about Microsoft and the wider tech industry for over 10 years. With a degree in creative and professional writing, Luke looks for the interesting spin when covering AI, Windows, Xbox, and more.

Recent News

Mastodon