Google is pushing for global action on artificial general intelligence (AGI), emphasizing the urgency of creating safeguards before these systems advance beyond human control. In a blog post published yesterday by DeepMind, the company revealed a new international safety framework built around three key pillars: bolstering technical research, implementing early-warning systems, and fostering international cooperation through governance bodies.
Instead of focusing solely on policy frameworks or abstract ethical considerations, DeepMind’s proposal is firmly rooted in the practicalities of AI’s rapid evolution. The company stresses that the need for safety measures isn’t a distant concern, but an immediate challenge. “[A] key element of our strategy is identifying and restricting access to dangerous capabilities that could be misused, including those enabling cyber attacks.,” DeepMind stated in its blog post.
This call comes at a time when the development of AGI is accelerating. DeepMind is framing AGI not just as a future possibility but as an imminent reality, underscoring the necessity of precautionary measures today.
From Safety Tools to Geopolitical Treaties
Beyond technical innovations, DeepMind is advocating for structural changes that span the globe. The company suggests establishing an international body that would evaluate AGI systems, similar to nuclear nonproliferation agreements. This organization would help manage global risks and set a standardized framework for AGI development and testing.
Moreover, DeepMind proposes the creation of national-level risk assessment centers to enable countries to independently evaluate foreign AI systems and ensure safety.
These suggestions come alongside internal restructuring at Google DeepMind. In early 2024, the company formed a new AI Safety and Alignment organization, combining several of its existing teams while introducing new talent focused specifically on AGI risks.
This division will lead DeepMind’s efforts to develop technical solutions and safety standards as the field progresses. This internal focus builds on Google’s broader commitment to ensuring AI’s responsible development.
In April 2023, Google DeepMind merged its Brain team with DeepMind, forming a unified research entity tasked with advancing AI capabilities and ensuring the safety of those advancements. The merger paved the way for the development of the Gemini model family, which saw significant upgrades with the recent release of Gemini 2.5 Pro Experimental — its latest multimodal AI model capable of advanced reasoning. This advancement signals DeepMind’s growing capabilities, as well as its focus on ensuring such powerful systems are deployed responsibly.
Echoes from rivals—and a few contradictions
DeepMind’s call for safety regulation does not exist in isolation. It arrives as other major AI labs begin taking similar steps. Anthropic, one of DeepMind’s most significant competitors, issued a similar warning in November 2024, urging regulators to take swift action within 18 months to prevent runaway AI development.
The company introduced new internal policies, including “capability thresholds” that automatically trigger stronger safeguards as AI systems advance. Anthropic has also been working with the U.S. Department of Energy’s National Nuclear Security Administration, running red-teaming exercises to test its Claude models in high-security settings. This initiative emphasizes the increasing focus on AI safety, particularly in contexts where AI could impact national security.
Meta, which has long championed open AI development, is also reevaluating its approach. In February 2025, the company announced a shift in its AI strategy with the Frontier AI Framework, which divides models into “high-risk” and “critical-risk” categories. Meta explained that critical-risk models would no longer be publicly released without stringent safeguards in place.
This decision followed the misuse of its LLaMA models in generating malicious scripts and unauthorized military chatbots. Meta emphasized that its goal is to minimize catastrophic risks associated with these models.
While these moves reflect a shift toward caution, they also demonstrate the increasingly complex relationship between AI development and its potential misuse. As more companies recalibrate their strategies, DeepMind’s proposal fits into a larger pattern of caution as the industry grapples with the future of AGI.
Building the Tools for Model Containment
While much of the conversation around AI safety centers on governance, other companies are focusing on technical solutions. In February Anthropic launched the Constitutional Classifier, an external filtering system designed to prevent adversarial prompts and harmful outputs from its AI models. Tests showed that the classifier reduced jailbreak success rates from 86% to just 4.4%.
To validate its effectiveness, Anthropic ran a public challenge offering a $15,000 bounty to anyone who could bypass the system. None of the participants succeeded in breaking it completely, underscoring the growing sophistication of tools designed to contain AI systems.
Furthering its commitment to safety, Anthropic in March launched its Interpretability Framework, calling it an “AI microscope” as a tool that provides insights into how models like Claude make decisions. By analyzing neural activations, it can trace how the model processes information and detect potentially harmful behaviors.
This interpretability is essential, DeepMind argues, as it can prevent unwanted outcomes before they manifest.
Alongside these tools, Anthropic is using its Clio framework to track AI usage patterns. Introduced in December 2024, Clio analyzes millions of conversations with Claude to detect patterns of misuse. The system prioritizes privacy by anonymizing conversations before processing them. This proactive approach to monitoring AI behavior aligns with DeepMind’s emphasis on the need for ongoing safety oversight as AI systems grow more sophisticated.
The EU Act and National Policy Efforts Take Hold
DeepMind’s proposal arrives as governments around the world begin taking concrete steps to regulate AI. The European Union’s AI Act, which came into effect on February 2, bans certain AI systems deemed to pose “unacceptable risks” and imposes strict transparency requirements on those deemed high-risk.
These regulations mandate that companies disclose how their models are trained, what data they use, and how they mitigate potential risks. Companies like OpenAI and Meta have publicly committed to meeting these requirements, though many have yet to comply fully.
The EU Act’s implementation follows months of debate within the industry about how best to balance innovation with safety. The European Commission has already indicated that non-compliance could result in hefty fines—up to 6% of a company’s global revenue for violations.
In the United States, the White House has begun considering Anthropic’s recent proposal, which urges stricter safety protocols and oversight mechanisms for AGI models. However, as reported by TechCrunch, Anthropic quietly rolled back several safety commitments it made in the early days of the Biden administration, raising questions about the consistency of the industry’s self-regulatory efforts. This backdrop sets the stage for DeepMind’s call for stronger governance.
Guardrails in Hardware and Industry Partnerships
The drive for AI safety isn’t confined to software alone. Hardware companies are also playing a role in building AI safety infrastructure. Nvidia, for example, introduced NeMo Guardrails in January 2025, a suite of microservices designed to provide real-time safeguards against harmful AI behaviors. The tools include content safety filters, jailbreak detection, and topic control, all designed to work in tandem with existing models to ensure they remain compliant with safety protocols.
These tools are already being deployed in sectors such as healthcare, retail, and automotive, offering a level of oversight that DeepMind’s proposal envisions on a broader scale. Kari Briski, Vice President of Enterprise AI Models at Nvidia, noted that these systems allow businesses to “secure their models against harmful outputs” while maintaining low-latency performance. By integrating these technologies, Nvidia is positioning itself as a key player in AI’s future safety.
The collaboration between hardware and software companies underscores the collective responsibility shared across the industry to address AGI risks. While DeepMind’s framework advocates for a global governance structure, it is clear that the path to secure AI will require concerted action from both developers and hardware providers.