A major new step has been taken to assess how well AI models are aligning with the European Union’s new rules for artificial intelligence. ETH Zurich, INSAIT, and LatticeFlow AI just released a framework, evaluating big names like OpenAI and Meta, under the EU AI Act, a new law that governs AI training and usage across Europe.
Earlier this year, the EU passed the law to regulate AI, aiming to ensure models are safe and transparent. It became active in August 2024, but developers have been left wondering exactly what technical steps they need to take. Enter LatticeFlow’s open-source system—called Compl-AI—which attempts to translate legal requirements into benchmarks that AI models can be tested against.
Big Tech Faces Compliance Challenges
Major language models, such as GPT-4 from OpenAI and Meta’s Llama, were put to the test using this new framework. These models were judged on several categories, ranging from fairness to the ability to handle harmful content. Compl-AI scored each on a scale between 0 and 1, with the goal of identifying how closely they align with the new law’s demands.
So how did these AI systems do? Results weren’t perfect. While most models scored decently in avoiding harmful behavior, many stumbled in fairness and cybersecurity benchmarks. Models performed surprisingly poorly in areas related to fairness, with no system surpassing a score of 50%. On the other hand, handling toxicity and harmful instructions saw better scores across the board.
Key Areas Still Lacking: Fairness and Privacy Concerns
Some problems proved trickier than others. While large language models have gotten good at preventing harmful outputs, they falter when it comes to fairness. The team behind the research noted that no model managed to demonstrate a robust balance across fairness metrics, which could raise concerns as the EU AI Act continues to evolve.
The issue of copyright and data privacy also stood out. LatticeFlow’s team acknowledged that current benchmarks struggle to assess AI models’ adherence to copyright rules since they mainly focus on books. This limited scope means there’s still a long way to go in terms of testing AI’s ability to handle sensitive data properly. Personal information protection remains an area that requires more refined testing.
A Growing Framework for AI Governance
The goal of Compl-AI isn’t just to provide a one-time assessment. As the EU AI Act evolves, so will this framework. It’s designed to be adaptable, ensuring that AI developers can keep up with changing regulations. LatticeFlow is encouraging other researchers to contribute to improving the platform and add their own benchmarks for testing different aspects of AI models.
For developers working on AI within or outside the EU, this system offers a glimpse into what’s to come. LatticeFlow hopes this initiative will provide developers with the tools they need to ensure their models can meet future requirements not only in Europe but around the world.
What the Future Holds for AI Regulation
Although Compl-AI is a first step, it’s clear that there are plenty of challenges ahead. According to Petar Tsankov, LatticeFlow’s CEO, the framework is just the beginning of the compliance journey for AI developers. The gaps revealed in the current benchmarking show that while models are powerful, they have yet to be fully optimized for legal compliance. AI makers will need to focus more on issues like fairness and data protection as the EU’s regulatory deadlines approach.
Last Updated on November 7, 2024 2:32 pm CET