A recent study has found that ChatGPT, the large language model chatbot from OpenAI, may be getting worse over time. The study, conducted by researchers at Stanford and UC Berkeley, found that ChatGPT's accuracy on a variety of tasks declined significantly between March and June 2023.
The researchers tested ChatGPT's ability to solve math problems, answer sensitive questions, generate software code, and perform visual reasoning. They found that ChatGPT's accuracy on these tasks declined by an average of 25% between March and June.
The researchers also found that ChatGPT's performance was more erratic in June than in March. For example, ChatGPT's accuracy on math problems ranged from 97.6% to 2.4% in June, compared to a range of 87.5% to 97.6% in March.
The researchers are not sure why ChatGPT's performance is declining. They speculate that it could be due to a number of factors, such as changes to the training data or the algorithm itself.
A Reason Why AI Search is Not Ready
The study's findings raise concerns about the reliability of ChatGPT. If the chatbot is getting worse over time, it could lead to users making bad decisions based on the chatbot's responses. OpenAI has not yet commented on the study's findings. However, the company has said that it is committed to improving the accuracy and reliability of ChatGPT.
Last week, responding to a comment not involved with the study, OpenAI VP of Product Peter Welinder tweeted, “No, we haven't made GPT-4 dumber. Quite the opposite: we make each new version smarter than the previous one. Current hypothesis: When you use it more heavily, you start noticing issues you didn't see before.”
In the meantime, users should be aware of the potential for ChatGPT to provide inaccurate or misleading information. If you are using ChatGPT for important tasks, it is advisable to verify the chatbot's responses with other sources.
While it is anecdotal, I have seen a severe deterioration in the accuracy of Microsoft's Bing Chat. More specifically, there seems to be times where the AI search chatbot is less accurate than other times. Maybe this is to do with peak usage times, but it is observable that the AI can be more frustrating to use at certain points.
Of course, Bing Chat uses OpenAI's GPT-4 large language model, while ChatGPT uses GPT-3.5. Microsoft wants Bing Chat to transform search, but the evidence so far suggests the AI that underpins the service is simply not ready.