HomeWinBuzzer NewsAI Chatbot Face-Off Puts Perplexity first, followed by ChatGPT and Gemini

AI Chatbot Face-Off Puts Perplexity first, followed by ChatGPT and Gemini

Perplexity AI emerged as the top contender, surpassing ChatGPT with its recent upgrade to GPT-4o, while Google Gemini secured the third position.

-

The Wall Street Journal has evaluated five leading AI chatbots with a series of blind tests to determine how well these AI chatbots handle real-world queries. The goal was to evaluate their usefulness in practical scenarios rather than scientific benchmarks. The chatbots were tested on health advice, financial guidance, culinary creativity, professional writing, creative writing, summarization skills, coding, and speed.

They compared OpenAI's ChatGPT, Microsoft's Copilot, Google's Gemini, Anthropic's Claude, and Perplexity to assess their performance across various everyday tasks, highlighting the strengths and weaknesses of each bot. As to their surprise, Perplexity AI came out as the overall winner, beating with its latest upgrade to GPT-4o. only came in third. Despite using similar models to ChatGPT, 's Copilot was placed fifth overall after Claude at position four.

While it is important to note that such specific hands-on tests are not representative due to their limited scope, they might give some hints about the specific strengths of these chatbots. For instance Copilot excelled in “Creative writing” with ChatGPT coming in last and Perplexity has shown to be the slowest of the five compared solutions. In Coding and Speed, ChatGPT seems unbeatable as of now. Here are the results of the WSJ in a nutshell.

Perplexity AI

Perplexity AI emerged as the overall winner in the comparison, showcasing its exceptional capabilities in several key areas. For professional writing tasks, Perplexity excelled by crafting detailed and contextually appropriate job listings, demonstrating a deep understanding of specific requirements. Its summarization skills were particularly noteworthy, providing detailed and accurate summaries of various types of content, including text, PDFs, and YouTube video subtitles.
 
In health advice, Perplexity delivered the most comprehensive guidance, considering multiple factors such as financial stability and relationship strength in its responses. However, it should be noted that Perplexity was the slowest among the five chatbots tested, indicating a trade-off between its thoroughness and response speed.

ChatGPT

OpenAI's ChatGPT, while not securing the top spot, showcased strong performance in several areas. It was particularly impressive in culinary creativity, crafting menus and recipes that catered to various dietary restrictions with ease. In coding tasks, ChatGPT proved to be highly capable, providing precise solutions to technical queries related to JavaScript and web app development.
 
Moreover, ChatGPT distinguished itself with its rapid response speed, consistently delivering answers faster than its competitors. Despite these strengths, ChatGPT did not perform as well in creative writing, where it ranked lower than some of the other chatbots.

Google Gemini

Google's Gemini stood out in the realm of financial guidance, offering clear, thorough, and practical advice on a range of topics such as interest rates, retirement savings, and inheritance rules. Its financial insights were well-rounded and actionable, making it a valuable tool for users seeking financial advice.
 
However, Gemini did not perform as well in health advice, where its responses were less detailed and focused primarily on confidence and preparedness without much depth. In the overall evaluation, Gemini secured the third position, indicating strong but not exceptional performance across the board.

Anthropic's Claude

Anthropic's Claude demonstrated some notable strengths but also faced challenges in certain areas. While it struggled with summarizing web content effectively, it showed potential in other domains. Claude's performance in professional writing and creative writing was moderate, neither excelling nor significantly lagging behind the other chatbots.
 
In health advice and financial guidance, Claude provided useful information but lacked the comprehensive detail offered by the top-performing chatbots. Overall, Claude's performance placed it in the fourth position, indicating room for improvement in specific areas.

Microsoft Copilot

Microsoft's Copilot, despite utilizing similar models to ChatGPT, placed fifth overall in the evaluation. Copilot's standout performance was in creative writing, where it produced witty and engaging content, such as a humorous wedding toast featuring the Muppets.
 
However, in other areas, Copilot fell short. Its professional writing tasks were not as detailed or accurate as those of Perplexity, and its financial guidance lacked critical details, making the advice less actionable. In culinary creativity, Copilot also failed to meet specific dietary requirements in its recipes. These shortcomings contributed to its lower overall ranking in the evaluation.

Markus Kasanmascheff
Markus Kasanmascheff
Markus has been covering the tech industry for more than 15 years. He is holding a Master´s degree in International Economics and is the founder and managing editor of Winbuzzer.com.
Mastodon