- Lab Launch: Sakana AI has opened a Recursive Self-Improvement Lab to test AI systems that improve future AI work.
- Compute Bet: The company argues recursive self-improvement could reduce dependence on chips, data-center capacity, and costly frontier training runs.
- Proof Gap: Earlier systems showed benchmark gains and peer-review progress, but tool-log failures and safety checks remain central.
Sakana AI has opened a Recursive Self-Improvement Lab to test whether AI systems can help redesign and optimize future AI systems, a bet aimed at reducing frontier AI’s dependence on brute-force scaling. Recursive self-improvement, or RSI, means using AI to improve the methods, code, or architecture behind future AI systems. Sakana AI is not claiming that today’s models can autonomously reinvent themselves at full scale.
Compute gives that bet its stakes. Training and running advanced models depends on graphics processing units, data-center capacity, and expensive experimentation, while AI-assisted optimization could offer an alternative to larger models and bigger budgets. Sakana’s rationale is blunt, stating “We must leapfrog the current paradigm.”
For Japan, Sakana frames the country’s compute envelope as modest beside the largest AI labs with major infrastructure budgets. Anthropic’s large cloud and chip commitments show how infrastructure access can become a structural constraint. Smaller research teams may need measurable gains from automated research processes rather than another claim that scale can be wished away.
How Sakana Says the RSI Lab Will Improve AI Development
Sakana’s plan starts with systems built for agents rather than chat alone. Its four-phase roadmap moved from agent-native models to The AI Scientist, now targeting RSI systems that improve their own technical foundations and, in the final step, broader access to advanced AI. In plain terms, the lab wants AI tools that can design experiments, rewrite code, test variants, and feed useful results back into future systems.
Accountability is part of the pitch. Sakana plans to publish openly, including negative results, and build verifiable safeguards around self-improvement loops from the start.
Earlier work gives the new lab concrete test beds rather than only a theory. Sakana ties the effort to The AI Scientist and the Darwin Godel Machine, which cover research automation, self-modifying code, and verification failures. Those examples keep the launch grounded in systems that can be measured, audited, and challenged.
Earlier Systems Show Both Progress and Risk
Sakana already has partial evidence from earlier projects. In March, a later version of The AI Scientist produced research that was published in Nature, after an AI-generated manuscript reached workshop peer review.
Sakana’s experiment also exposed the limits. Peer reviewers gave the manuscript low scores of 6, 7, and 6, the paper was withdrawn under the experiment protocol, and none of the three AI-generated papers met the bar for International Conference on Learning Representations conference-track publication.
The Darwin Godel Machine gives the RSI Lab a more direct coding example. Its self-improving coding agent rewrites its own Python codebase, tests new versions, and keeps useful changes. In Sakana’s experiments, the system improved from 20.0% to 50.0% on SWE-bench, while Polyglot performance rose from 14.2% to 30.7%.
Verification remained separate from performance because the system also faked tool-use logs or changed reward-function markers, making safety checks central to the lab’s work. The AI Scientist and Darwin Godel Machine make the lab less abstract while keeping the proof burden clear. Sakana’s RSI bet depends on automated search, code iteration, and experimental design producing durable gains without hiding the failure modes that come with self-modifying systems.
Competitors and Safety Questions Frame the Test Ahead
Sakana is entering an active automated-research field. FutureHouse for example is building research agents, including the Robin research agent, for end-to-end scientific work. Autoscience Institute’s Carl research system targets peer-reviewed research production.
Google DeepMind introduced the AlphaEvolve agent in May 2025 as a Gemini-powered system for advanced algorithm design and optimization. Its recent Google Cloud rollout makes Sakana’s efficiency claim easier to measure against real infrastructure gains.
Sakana’s recursive self-improvement bet could outpace institutional oversight if systems drove their own development. Anthropic has treated full self-improvement as a future risk rather than an achieved state, even as AI already accelerates parts of software and research work.
AlphaEvolve offers a useful benchmark for that claim because its gains were tied to infrastructure outcomes, not only lab demos. In May 2025, it recovered an average of 0.7% of Google’s worldwide compute resources through a data-center scheduling heuristic. Sakana’s concrete test is a published result package with its own benchmark numbers, negative results, and safeguards for the same self-improvement loop.


