Artificial Intelligence – Overview, Benchmarks, Latest News

AI Model Architectures

AI models are not a monolithic technology; they consist of multiple architectures, each designed for specific types of tasks. While some models excel at recognizing patterns, others specialize in generating content or making autonomous decisions.

ModelBest Use CasesAdvantagesLimitations
Feedforward NetworksFraud detection, risk assessment, structured data classificationSimple, fast, efficient for small-scale tasksCannot handle sequential or complex unstructured data
Recurrent Neural Networks (RNNs)Speech processing, time-series forecastingCaptures sequential dependenciesSuffers from vanishing gradient problem, inefficient for long sequences
Transformers (LLMs)Text generation, translation, multimodal AIHigh scalability, state-of-the-art performanceRequires vast computational power, black-box decision-making
GANsAI-generated images, deepfakes, artistic designProduces highly realistic outputsTraining instability, prone to mode collapse
Diffusion ModelsAI art, synthetic image generationMore stable than GANs, superior output qualityComputationally expensive, slow inference speed
Reinforcement LearningRobotics, autonomous vehicles, game AIAdapts to dynamic environments, learns from experienceHigh training cost, lack of generalization outside of trained tasks

AI Model Benchmarks – LLM Leaderboard

The transformer architecture redefined AI by enabling parallel sequence processing, eliminating the bottlenecks of RNNs. Instead of analyzing sequences step-by-step, transformers use self-attention mechanisms to determine relationships between all elements of an input at once.

This breakthrough led to the development of large language models (LLMs), such as GPT-4, Claude, and Google Gemini 1.5, which power today’s most advanced AI applications.

Last updated: Mar 16, 2025

Benchmark stats come from the model providers, if available. For models with optional advanced reasoning, we provide the highest benchmark score achieved.

OrganizationModelContextParameters (B)Input $/MOutput $/MLicenseGPQAMMLUMMLU ProDROPHumanEvalAIME'24SimpleBenchModel
openai o3128,000---Proprietary87.70%----o3
anthropic Claude 3.7 Sonnet200,000-$3.00 $15.00 Proprietary84.80%86.10%---80.00%46.4%Claude 3.7 Sonnet
xai Grok-3128,000---Proprietary84.60%-79.90%--93.30%Grok-3
xai Grok-3 Mini128,000---Proprietary84.60%-78.90%--90.80%Grok-3 Mini
openai o3-mini200,000-$1.10 $4.40 Proprietary79.70%86.90%---86.50%22.8%o3-mini
openai o1-pro128,000---Proprietary79.00%----86.00%o1-pro
openai o1200,000-$15.00 $60.00 Proprietary78.00%91.80%--88.10%83.30%40.1%o1
google Gemini 2.0 Flash Thinking1,000,000---Proprietary74.20%----73.30%30.7%Gemini 2.0 Flash Thinking
openai o1-preview128,000-$15.00 $60.00 Proprietary73.30%90.80%---44.60%41.7%o1-preview
deepseek DeepSeek-R1131,072671$0.55 $2.19 Open71.50%90.80%84.00%92.20%-79.80%30.9%DeepSeek-R1
openaiGPT-4.5128,000---Proprietary71.4%90.0%--88.0%36.7%34.5%GPT-4.5
anthropic Claude 3.5 Sonnet200,000-$3.00 $15.00 Proprietary67.20%90.40%77.60%87.10%93.70%16.00%41.4%Claude 3.5 Sonnet
qwen QwQ-32B-Preview32,76832.5$0.15 $0.20 Open65.20%-70.97%--50.00%QwQ-32B-Preview
google Gemini 2.0 Flash1,048,576---Proprietary62.10%-76.40%--35.5%18.9%Gemini 2.0 Flash
openai o1-mini128,000-$3.00 $12.00 Proprietary60.00%85.20%80.30%-92.40%70.00%18.1%o1-mini
deepseek DeepSeek-V3131,072671$0.27 $1.10 Open59.10%88.50%75.90%91.60%-39.2%18.9%DeepSeek-V3
google Gemini 1.5 Pro2,097,152-$2.50 $10.00 Proprietary59.10%85.90%75.80%74.90%84.10%19.3%27.1%Gemini 1.5 Pro
microsoft Phi-416,00014.7$0.07 $0.14 Open56.10%84.80%70.40%75.50%82.60%Phi-4
xai Grok-2128,000-$2.00 $10.00 Proprietary56.00%87.50%75.50%-88.40%22.7%Grok-2
openai GPT-4o128,000-$2.50 $10.00 Proprietary53.60%88.00%74.70%--17.8%GPT-4o
google Gemini 1.5 Flash1,048,576-$0.15 $0.60 Proprietary51.00%78.90%67.30%-74.30%Gemini 1.5 Flash
xai Grok-2 mini128,000---Proprietary51.00%86.20%72.00%-85.70%Grok-2 mini
meta Llama 3.1 405B Instruct128,000405$0.90 $0.90 Open50.70%87.30%73.30%84.80%89.00%23.0%Llama 3.1 405B Instruct
meta Llama 3.3 70B Instruct128,00070$0.20 $0.20 Open50.50%86.00%68.90%-88.40%19.9%Llama 3.3 70B Instruct
anthropic Claude 3 Opus200,000-$15.00 $75.00 Proprietary50.40%86.80%68.50%83.10%84.90%23.5%Claude 3 Opus
qwen Qwen2.5 32B Instruct131,07232.5--Open49.50%83.30%69.00%-88.40%Qwen2.5 32B Instruct
qwen Qwen2.5 72B Instruct131,07272.7$0.35 $0.40 Open49.00%-71.10%-86.60%23.30%Qwen2.5 72B Instruct
openai GPT-4 Turbo128,000-$10.00 $30.00 Proprietary48.00%86.50%-86.00%87.10%GPT-4 Turbo
amazon Nova Pro300,000-$0.80 $3.20 Proprietary46.90%85.90%-85.40%89.00%Nova Pro
meta Llama 3.2 90B Instruct128,00090$0.35 $0.40 Open46.70%86.00%---Llama 3.2 90B Instruct
qwen Qwen2.5 14B Instruct131,07214.7--Open45.50%79.70%63.70%-83.50%Qwen2.5 14B Instruct
mistral Mistral Small 332,00024$0.07 $0.14 Open45.30%-66.30%-84.80%Mistral Small 3
qwen Qwen2 72B Instruct131,07272--Open42.40%82.30%64.40%-86.00%Qwen2 72B Instruct
amazon Nova Lite300,000-$0.06 $0.24 Proprietary42.00%80.50%-80.20%85.40%Nova Lite
meta Llama 3.1 70B Instruct128,00070$0.20 $0.20 Open41.70%83.60%66.40%79.60%80.50%Llama 3.1 70B Instruct
anthropic Claude 3.5 Haiku200,000-$0.10 $0.50 Proprietary41.60%-65.00%83.10%88.10%Claude 3.5 Haiku
anthropic Claude 3 Sonnet200,000-$3.00 $15.00 Proprietary40.40%79.00%56.80%78.90%73.00%Claude 3 Sonnet
openai GPT-4o mini128,000-$0.15 $0.60 Proprietary40.20%82.00%-79.70%87.20%10.7%GPT-4o mini
amazon Nova Micro128,000-$0.04 $0.14 Proprietary40.00%77.60%-79.30%81.10%Nova Micro
google Gemini 1.5 Flash 8B1,048,5768$0.07 $0.30 Proprietary38.40%-58.70%--Gemini 1.5 Flash 8B
ai21 Jamba 1.5 Large256,000398$2.00 $8.00 Open36.90%81.20%53.50%--Jamba 1.5 Large
microsoft Phi-3.5-MoE-instruct128,00060--Open36.80%78.90%54.30%-70.70%Phi-3.5-MoE-instruct
qwen Qwen2.5 7B Instruct131,0727.6$0.30 $0.30 Open36.40%-56.30%-84.80%Qwen2.5 7B Instruct
xai Grok-1.5128,000---Proprietary35.90%81.30%51.00%-74.10%Grok-1.5
openai GPT-432,768-$30.00 $60.00 Proprietary35.70%86.40%-80.90%67.00%25.1%GPT-4
anthropic Claude 3 Haiku200,000-$0.25 $1.25 Proprietary33.30%75.20%-78.40%75.90%Claude 3 Haiku
meta Llama 3.2 11B Instruct128,00010.6$0.06 $0.06 Open32.80%73.00%---Llama 3.2 11B Instruct
meta Llama 3.2 3B Instruct128,0003.2$0.01 $0.02 Open32.80%63.40%---Llama 3.2 3B Instruct
ai21 Jamba 1.5 Mini256,14452$0.20 $0.40 Open32.30%69.70%42.50%--Jamba 1.5 Mini
openai GPT-3.5 Turbo16,385-$0.50 $1.50 Proprietary30.80%69.80%-70.20%68.00%GPT-3.5 Turbo
meta Llama 3.1 8B Instruct131,0728$0.03 $0.03 Open30.40%69.40%48.30%59.50%72.60%Llama 3.1 8B Instruct
microsoft Phi-3.5-mini-instruct128,0003.8$0.10 $0.10 Open30.40%69.00%47.40%-62.80%Phi-3.5-mini-instruct
google Gemini 1.0 Pro32,760-$0.50 $1.50 Proprietary27.90%71.80%---Gemini 1.0 Pro
qwen Qwen2 7B Instruct131,0727.6--Open25.30%70.50%44.10%--Qwen2 7B Instruct
mistral Codestral-22B32,76822.2$0.20 $0.60 Open----81.10%Codestral-22B
cohere Command R+128,000104$0.25 $1.00 Open-75.70%---17.4%Command R+
deepseek DeepSeek-V2.58,192236$0.14 $0.28 Open-80.40%--89.00%DeepSeek-V2.5
google Gemma 2 27B8,19227.2--Open-75.20%--51.80%Gemma 2 27B
google Gemma 2 9B8,1929.2--Open-71.30%--40.20%Gemma 2 9B
xai Grok-1.5V128,000---Proprietary-----Grok-1.5V
moonshotai Kimi-k1.5128,000---Proprietary-87.40%---Kimi-k1.5
nvidia Llama 3.1 Nemotron 70B Instruct128,00070--Open-80.20%---Llama 3.1 Nemotron 70B Instruct
mistral Ministral 8B Instruct128,0008$0.10 $0.10 Open-65.00%--34.80%Ministral 8B Instruct
mistral Mistral Large 2128,000123$2.00 $6.00 Open-84.00%--92.00%22.5%Mistral Large 2
mistral Mistral NeMo Instruct128,00012$0.15 $0.15 Open-68.00%---Mistral NeMo Instruct
mistral Mistral Small32,76822$0.20 $0.60 Open-----Mistral Small
microsoft Phi-3.5-vision-instruct128,0004.2--Open-----Phi-3.5-vision-instruct
mistral Pixtral-12B128,00012.4$0.15 $0.15 Open-69.20%--72.00%Pixtral-12B
mistral Pixtral Large128,000124$2.00 $6.00 Open-----Pixtral Large
qwen QvQ-72B-Preview32,76873.4--Open-----QvQ-72B-Preview
qwen Qwen2.5-Coder 32B Instruct128,00032$0.09 $0.09 Open-75.10%50.40%-92.70%Qwen2.5-Coder 32B Instruct
qwen Qwen2.5-Coder 7B Instruct128,0007--Open-67.60%40.10%-88.40%Qwen2.5-Coder 7B Instruct
qwen Qwen2-VL-72B-Instruct32,76873.4--Open-----Qwen2-VL-72B-Instruct
cohereCommand A256,000111$2.50$10.00Open-85.00%-----Command A
baiduERNIE 4.5-----75.00%-79.00%87.00%85.00%ERNIE 4.5
googleGemma 3 1B128,0001--Open19.20%29.90%14.70%-32.00%--Gemma 3 1B
googleGemma 3 4B128,0004--Open30.80%46.90%43.60%----Gemma 3 4B
googleGemma 3 12B128,00012--Open40.90%65.20%60.60%----Gemma 3 12B
googleGemma 3 27B128,00027--Open42.40%72.1%67.50%-89.00%--Gemma 3 27B
qwenQwen2.5 Max32,768-59.00%-76.00%-93.00%23.00%-Qwen2.5 Max
qwenQwQ 32B131,00032.8Open59.00%-76.00%98.00%78.00%-QwQ 32B

Tencent Releases its Hunyuan T1 AI Reasoning Model, Beating DeepSeek R1, GPT-4.5, o1 Across Multiple Benchmarks

Tencent has positioned Hunyuan T1 as a reasoning-optimized model, with benchmark results confirming its strengths in structured logic and math accuracy.
Apple Intelligence Siri official

Apple’s AI Ambitions Face Legal Heat and Technical Setbacks

Apple is facing a lawsuit over Siri’s delayed AI upgrade, as users claim the company misled them about Apple Intelligence features promised at launch.

Perplexity Wants to Buy and Reinvent TikTok—But Can It Really Pull It Off?

As ByteDance faces pressure to divest TikTok, Perplexity has launched a bid offering transparency, U.S. data hosting, and civic oversight.
Cloudflare

Cloudflare Deploys AI Labyrinth to Exhaust Unauthorized AI Crawling Bots

Cloudflare has unveiled AI Labyrinth, a system that misleads unauthorized AI crawling bots by trapping them in auto-generated content mazes.

China’s Tencent Cuts GPU Demand by Turning to DeepSeek’s Efficient AI Models

Tencent has reshaped its AI stack by using DeepSeek models, achieving more with fewer GPUs and responding to growing pressure on chip supply chains.
Aardvark Weather via Alan Turing Institute

New Weather AI Promises Faster, Low-Cost Forecasting Without Supercomputers

Aardvark Weather outperforms traditional models, using AI to provide hyper-efficient forecasting without the need for expensive computing infrastructure.

Gmail Introduces AI-Powered Search to Find Important Emails Faster

Gmail’s new AI-driven search upgrade prioritizes relevant emails using machine learning, making it easier for users to find important messages quickly.
CoreWeave Logo

Microsoft Passed on $12 Billion CoreWeave Option, Deferring AI Infrastructure Deal to OpenAI

Microsoft has passed on a $12B option with CoreWeave, allowing OpenAI to secure an $11.9B contract for GPU cloud services ahead of CoreWeave’s IPO.
Anthropic Claude Web Search

Anthropic Expands Claude with Web Search, Challenging AI-Powered Search Rivals

Anthropic introduces web search for Claude, making it more competitive with ChatGPT, Bing AI, and Google’s AI Overviews.
Openai gpt-4o-transcribe gpt-4o-mini-transcribe official openai

OpenAI Enhances AI Speech Models with More Realistic Voices and Improved Transcription

OpenAI has upgraded its AI speech models, enhancing transcription accuracy and improving voice realism, raising both innovation and ethical concerns.
Adobe Project Slide Wow

Adobe Expands AI Offering with Automated PowerPoint Creation Tool

Adobe has introduced Project Slide Wow, an AI-powered tool that transforms raw customer data into PowerPoint presentations with automated formatting.
Google-NotebookLM-AI

Google Expands NotebookLM with Mind Maps To Visually Manage Complex Topics

NotebookLM’s new Mind Maps feature helps users organize study materials into interactive visual diagrams, enhancing AI-powered research and learning.
Vision AI

This New AI Scaling Method Challenges Scaling Laws — But Can It Deliver?

A novel approach allows AI models to improve performance by generating multiple responses and self-verifying the best one, challenging traditional scaling methods.
Meta-AI-Gen-3D

Meta AI Launches Chatbots in Europe After Year-Long Delay, But With Privacy Restrictions

Meta AI has launched in Europe but lacks key U.S. features due to GDPR rules, including AI image generation and personalized content.

Jensen Huang: Nvidia to Invest “Several Hundred Billion” in U.S. Chip Production Over Four Years

Nvidia has committed billions to U.S. chip production over four years, aiming to reduce reliance on overseas suppliers amid AI demand and geopolitical risks.
OpenAI Publishers fair use books

OpenAI Opens API Access for It’s o1-Pro Model with a Hefty Price Tag

OpenAI’s o1-Pro delivers better structured reasoning but costs 10x more than o1. Is it worth it for businesses?
HuggingSnap iOS official

Hugging Face Releases HuggingSnap iOS App for Visual Assistance With On-Device Processing

With HuggingSnap, Hugging Face has combined the smolVLM2 model and on-device AI to offer instant visual analysis and descriptions.

Google’s AI Overviews Now Link to More Google Searches Instead of Websites

Google's AI Overview links have shifted to favor internal search results, raising questions about the feature's impact on external site traffic.
Nvidia Blackwell Ultra GB300 GTC 2025 announcement

NVIDIA GTC 2025 Wrap-Up: Blackwell Ultra and Vera Rubin, AI PCs, AI Reasoning Models and Enterprise Solutions

At GTC 2025, NVIDIA unveiled Blackwell Ultra, Vera Rubin, AI Factories, Llama Nemotron models, DGX AI supercomputers, and partnerships with
Nvidia Llama Nemotron official

NVIDIA Unveils Llama Nemotron Open Reasoning AI Models

New Llama Nemotron models from NVIDIA introduce toggleable reasoning, allowing AI agents to independently process complex tasks while optimizing computational efficiency.
NVIDIA DGX Spark

NVIDIA Expands AI Computing with DGX Spark and DGX Station Desktop Supercomputers

NVIDIA has launched DGX Spark and DGX Station, two AI supercomputers designed for personal use, bringing high-performance AI to desktops for developers.
Blackwell Ultra AI Factory official

NVIDIA Unveils Blackwell Ultra AI Factory Platform at GTC 2025

NVIDIA's Blackwell Ultra AI platform dramatically reduces inference latency, accelerating complex AI model responses.
Nvidia Blackwell Ultra GB300 GTC 2025 announcement

NVIDIA Unveils Blackwell Ultra and Vera Rubin AI Superchips

The Blackwell Ultra GB300, set for 2025, offers 1.5× the FP4 compute of its predecessor, while Vera Rubin promises next-gen AI processing by 2026.
Adobe Agent Orchestrator official

Adobe Expands AI Suite with New Agent Orchestrator and 10 Experience Agents

Adobe's latest AI innovations, announced at Summit 2025, are set to optimize marketing strategies and improve customer interactions.
Stability AI Stable Camera official

Stability AI’s New Stable Virtual Camera Converts 2D Images into 3D Videos

Stability AI has launched Stable Virtual Camera, an AI model that converts still images into immersive 3D videos with realistic depth and perspective.

Google Gemini Adds Canvas for Writing & Coding and Audio Overview Feature

Google Gemini now features Canvas, an AI-powered workspace for writing and coding, alongside Audio Overview, a tool for listening to summarized content.

Mistral Launches Small 3.1 Language Models, Taking On Gemma 3, GPT-4o Mini, and Claude 3.5 Haiku

Mistral AI has launched its Small 3.1 model, offering an efficient alternative to OpenAI’s GPT-4o Mini and other efficient models, with local processing capabilities and reduced costs.

Microsoft March 2025 Updates Unintentionally Remove Copilot from Windows Systems

The latest Windows updates have unintentionally uninstalled Copilot from some systems.
EuroStack

European Tech Leaders Urge Digital EU Sovereignty In Letter To European Commission

A coalition of European tech leaders have urged the EU to invest in local digital infrastructure to reduce dependency on foreign providers and ensure digital sovereignty.

Zoom’s AI Companion Evolves with Agentic Skills to Streamline Workflows

Zoom has enhanced its AI Companion with agentic skills, enabling autonomous task management and streamlined business workflows.

Recent News

Table of Contents: