Microsoft researchers have recently published a startling paper asserting that their new artificial intelligence (AI) system displays an ability to understand the world in a manner similar to humans, inciting a heated debate within the tech industry.
Last year, Microsoft's computer scientists put their AI system to the test with a real-world problem: stacking a book, nine eggs, a laptop, a bottle, and a nail in a stable manner. The solution proposed by the AI was deemed ingenious, prompting the researchers to ponder if they were observing a novel form of intelligence.
The 155-page research paper, titled “Sparks of Artificial General Intelligence”, argues that this AI system may be the first step towards achieving artificial general intelligence (AGI), a machine with the capability to perform any task that the human brain can do. This claim has stirred up controversy, with critics suggesting that the researchers may be overreaching.
Researchers Surprised by the Power of GPT-4
Peter Lee, who leads research at Microsoft, admitted to initially feeling skeptical about the AI's abilities. “I started off being very skeptical — and that evolved into a sense of frustration, annoyance, maybe even fear”, Lee told the New York Times.
The bold claim made by Microsoft, the first major tech firm to publish such a paper, has reignited one of the most contentious debates in the tech world: Is the industry on the verge of creating something akin to human intelligence?
The AI system used by Microsoft researchers, OpenAI's GPT-4, is widely considered the most powerful of its kind. Microsoft has invested $13 billion in OpenAI and is a close partner of the company. GPT-4 also powers Microsoft Bing´s adaption for Bing Chat and Bing Compose, both already available via the Microsoft Edge web browser.
Sébastien Bubeck, the lead author on the Microsoft AGI paper, documented complex behaviors exhibited by the system over several months. “All of the things I thought it wouldn't be able to do? It was certainly able to do many of them — if not most of them” Bubeck is quoted by the NYT. In their paper the authors conclude:
“The central claim of our work is that GPT-4 attains a form of general intelligence, indeed showing sparks of artificial general intelligence. This is demonstrated by its core mental capabilities (such as reasoning, creativity, and deduction), its range of topics on which it has gained expertise (such as literature, medicine, and coding), and the variety of tasks it is able to perform (e.g., playing games, using tools, explaining itself, …).”
However, not everyone is convinced. Critics, like Maarten Sap, a researcher, and professor at Carnegie Mellon University, argue that claims of AGI can be reputational hazards. Sap even suggested that the ‘Sparks of AGI' paper could be seen as a public relations pitch disguised as a research paper.
Microsoft researchers admit that the AI's behavior can be inconsistent at times. Ece Kamar, a Microsoft researcher, said, “These behaviors are not always consistent”.
While the paper has sparked intrigue among the research community, the debate about whether AI systems like GPT-4 demonstrate genuine intelligence or merely mimic human reasoning continues.
Where GPT-4 and ChatGPT Currently Fail
Where GPT-4 comes short to a more widely accepted definition of what would be considered artificial general intelligence (AGI), the authors identified various areas of improvement:
Confidence Calibration: The model often confidently states incorrect or made-up information, referred to as hallucinations. These can lead to errors and mistrust, especially in high-stakes areas like healthcare. Solutions include improving the model's calibration, inserting missing information into prompts, conducting post-hoc checks, and designing user experiences with potential hallucinations in mind.
Long-term Memory: GPT-4 operates in a stateless manner, unable to retain new information or understand evolving contexts, such as following the plot of a book.
Continual Learning: The model can't update or adapt itself to changing environments. While fine-tuning is possible, it can degrade performance or cause overfitting. Thus, the model may lack updated knowledge after its latest training cycle.
Personalization: GPT-4 struggles to adapt to specific organizations or individuals, lacking a mechanism to incorporate personalized information into its responses, except through limited and inefficient meta-prompts.
Planning and Conceptual Leaps: The model has difficulty with tasks requiring forward planning or “Eureka” moments, which are typical of human genius.
Transparency, Interpretability, and Consistency: GPT-4's outputs can be inconsistent or made-up, and it struggles to verify its own consistency with training data. Its limitations make it difficult to establish trust or effective collaboration with users.
Cognitive Fallacies and Irrationality: The model may replicate human cognitive biases and statistical fallacies present in its training data, potentially reflecting skewed perspectives.
Sensitivity to Inputs: The model's responses can vary significantly depending on the framing or wording of prompts, suggesting the need for careful engineering of prompts and their sequencing.
The paper concludes, …
“The mechanism behind GPT-4's ability to reason, plan, and create, despite being a simple combination of gradient descent, large-scale transformers, and extensive data, remains a mystery. This curiosity drives further research into the emergence phenomenon in large language models (LLMs).
Current hypotheses suggest the vast and diverse data input forces neural networks to learn generic and useful “neural circuits”. The large model size allows these circuits to specialize and adapt to specific tasks. Other theories propose that the model's size enhances gradient descent's effectiveness or allows for smoother fitting of high-dimensional data.
However, proving these hypotheses for large-scale models is challenging, and it's likely they only partially explain the model's capabilities. Understanding the nature and mechanisms of AI systems like GPT-4 is an urgent, complex challenge. The authors thank OpenAI and colleagues for their contributions and feedback on this work.”