Ilya Sutskever, co-founder and former chief scientist of OpenAI, has moved away from the “bigger is better” philosophy he once advocated, highlighting a change in focus for AI development. Now leading Safe Superintelligence Inc. (SSI), Sutskever underscores that merely scaling models up may no longer be the solution to advancing artificial intelligence.
Scaling and Its Limits: Insights from Sutskever
Previously an advocate of expanding model sizes to achieve better results, Sutskever’s views have shifted following the industry’s realization that scaling comes with diminishing returns. “The 2010s were the age of scaling, now we’re back in the age of wonder and discovery once again. Everyone is looking for the next thing,” Sutskever told Reuters recently, emphasising that “Scaling the right thing matters more now than ever”.
The development of OpenAI’s upcoming Orion model serves as a prime example for this. Despite initial high hopes, Orion’s outcomes are said to be more incremental than groundbreaking, particularly struggling in complex areas like coding. The high cost associated with training models at this scale—exceeding by far the $100 million for GPT-4—highlights the unsustainability of such a strategy in the long run.
SSI’s Mission and Strategic Focus
Sutskever’s new venture, SSI, is a research-focused lab aimed at creating superintelligent systems that prioritize safety over commercial gain. Launched in June 2024, the startup quickly attracted over $1 billion in investment by September.
Unlike other AI companies, SSI has set a clear boundary by choosing not to enter the competitive product market, instead focusing solely on safe AI research. The company’s primary goal will be to build safe superintelligence according to a note on its website, highlighting the shift from Sutskever´s past focus at OpenAI.
From OpenAI to Safe Superintelligence
Sutskever’s departure from OpenAI in May 2024 came after months of internal tension, which reached a peak in late 2023. Together with CTO Mira Murati, who also left OpenAI recently, he raised concerns about CEO Sam Altman’s transparency and direction, leading to an attempted removal of Altman by the board.
The move, however, backfired due to resistance from Microsoft, OpenAI’s major investor, and an overwhelming majority of staff. Altman was reinstated, and the board underwent a shake-up that included Sutskever’s exit.
Jakub Pachocki, a seasoned OpenAI researcher known for leading advancements in AI gaming and reasoning models, took over Sutskever’s role as chief scientist after his departure. Pachocki has since steered OpenAI through changes that involve more nuanced strategies, such as post-training optimizations and test-time compute integration.
Synthetic Data as a Solution
With data scarcity becoming an increasing concern, experts predict a depletion of high-quality training datasets by 2026. To mitigate this, OpenAI and other industry leaders have turned to synthetic data generation.
By using advanced models to create datasets that mimic real-world language, the training process can continue when genuine data is limited. Nvidia has also adopted this approach, releasing Nemotron-4 340B, a series of models designed to produce synthetic training data.
Adopting Test-Time Compute and Optimized Techniques
To address the challenges of scaling, AI firms, including OpenAI, have also shifted toward techniques like test-time compute. This method allows AI to simulate various solutions during the inference phase and choose the most effective outcome.
This approach contrasts with the one-shot response generation of traditional models, leading to better problem-solving abilities. OpenAI’s o1 model and projects at Anthropic and Google reflect this trend, showcasing a broader industry movement toward more efficient model use.
The industry is adapting, with smaller, specialized models gaining attention as an alternative to large-scale systems. Anthropic’s new Claude 3.5 Haiku AI model promises cost-effective text processing and takes on OpenAI’s GPT-4o Mini with competitive pricing and advanced features. Meta has launched compact Llama models to enhance mobile AI, offering efficient AI processing on smartphones and small devices.