Artificial Intelligence – Overview, Benchmarks, Latest News
AI Model Architectures
AI models are not a monolithic technology; they consist of multiple architectures, each designed for specific types of tasks. While some models excel at recognizing patterns, others specialize in generating content or making autonomous decisions.
Model | Best Use Cases | Advantages | Limitations |
---|---|---|---|
Feedforward Networks | Fraud detection, risk assessment, structured data classification | Simple, fast, efficient for small-scale tasks | Cannot handle sequential or complex unstructured data |
Recurrent Neural Networks (RNNs) | Speech processing, time-series forecasting | Captures sequential dependencies | Suffers from vanishing gradient problem, inefficient for long sequences |
Transformers (LLMs) | Text generation, translation, multimodal AI | High scalability, state-of-the-art performance | Requires vast computational power, black-box decision-making |
GANs | AI-generated images, deepfakes, artistic design | Produces highly realistic outputs | Training instability, prone to mode collapse |
Diffusion Models | AI art, synthetic image generation | More stable than GANs, superior output quality | Computationally expensive, slow inference speed |
Reinforcement Learning | Robotics, autonomous vehicles, game AI | Adapts to dynamic environments, learns from experience | High training cost, lack of generalization outside of trained tasks |
AI Model Benchmarks – LLM Leaderboard
The transformer architecture redefined AI by enabling parallel sequence processing, eliminating the bottlenecks of RNNs. Instead of analyzing sequences step-by-step, transformers use self-attention mechanisms to determine relationships between all elements of an input at once.
This breakthrough led to the development of large language models (LLMs), such as GPT-4, Claude, and Google Gemini 1.5, which power today’s most advanced AI applications.
Last updated: Mar 16, 2025
Benchmark stats come from the model providers, if available. For models with optional advanced reasoning, we provide the highest benchmark score achieved.
Organization | Model | Context | Parameters (B) | Input $/M | Output $/M | License | GPQA | MMLU | MMLU Pro | DROP | HumanEval | AIME'24 | SimpleBench | Model |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
openai | o3 | 128,000 | - | - | - | Proprietary | 87.70% | - | - | - | - | o3 | ||
anthropic | Claude 3.7 Sonnet | 200,000 | - | $3.00 | $15.00 | Proprietary | 84.80% | 86.10% | - | - | - | 80.00% | 46.4% | Claude 3.7 Sonnet |
xai | Grok-3 | 128,000 | - | - | - | Proprietary | 84.60% | - | 79.90% | - | - | 93.30% | Grok-3 | |
xai | Grok-3 Mini | 128,000 | - | - | - | Proprietary | 84.60% | - | 78.90% | - | - | 90.80% | Grok-3 Mini | |
openai | o3-mini | 200,000 | - | $1.10 | $4.40 | Proprietary | 79.70% | 86.90% | - | - | - | 86.50% | 22.8% | o3-mini |
openai | o1-pro | 128,000 | - | - | - | Proprietary | 79.00% | - | - | - | - | 86.00% | o1-pro | |
openai | o1 | 200,000 | - | $15.00 | $60.00 | Proprietary | 78.00% | 91.80% | - | - | 88.10% | 83.30% | 40.1% | o1 |
Gemini 2.0 Flash Thinking | 1,000,000 | - | - | - | Proprietary | 74.20% | - | - | - | - | 73.30% | 30.7% | Gemini 2.0 Flash Thinking | |
openai | o1-preview | 128,000 | - | $15.00 | $60.00 | Proprietary | 73.30% | 90.80% | - | - | - | 44.60% | 41.7% | o1-preview |
deepseek | DeepSeek-R1 | 131,072 | 671 | $0.55 | $2.19 | Open | 71.50% | 90.80% | 84.00% | 92.20% | - | 79.80% | 30.9% | DeepSeek-R1 |
openai | GPT-4.5 | 128,000 | - | - | - | Proprietary | 71.4% | 90.0% | - | - | 88.0% | 36.7% | 34.5% | GPT-4.5 |
anthropic | Claude 3.5 Sonnet | 200,000 | - | $3.00 | $15.00 | Proprietary | 67.20% | 90.40% | 77.60% | 87.10% | 93.70% | 16.00% | 41.4% | Claude 3.5 Sonnet |
qwen | QwQ-32B-Preview | 32,768 | 32.5 | $0.15 | $0.20 | Open | 65.20% | - | 70.97% | - | - | 50.00% | QwQ-32B-Preview | |
Gemini 2.0 Flash | 1,048,576 | - | - | - | Proprietary | 62.10% | - | 76.40% | - | - | 35.5% | 18.9% | Gemini 2.0 Flash | |
openai | o1-mini | 128,000 | - | $3.00 | $12.00 | Proprietary | 60.00% | 85.20% | 80.30% | - | 92.40% | 70.00% | 18.1% | o1-mini |
deepseek | DeepSeek-V3 | 131,072 | 671 | $0.27 | $1.10 | Open | 59.10% | 88.50% | 75.90% | 91.60% | - | 39.2% | 18.9% | DeepSeek-V3 |
Gemini 1.5 Pro | 2,097,152 | - | $2.50 | $10.00 | Proprietary | 59.10% | 85.90% | 75.80% | 74.90% | 84.10% | 19.3% | 27.1% | Gemini 1.5 Pro | |
microsoft | Phi-4 | 16,000 | 14.7 | $0.07 | $0.14 | Open | 56.10% | 84.80% | 70.40% | 75.50% | 82.60% | Phi-4 | ||
xai | Grok-2 | 128,000 | - | $2.00 | $10.00 | Proprietary | 56.00% | 87.50% | 75.50% | - | 88.40% | 22.7% | Grok-2 | |
openai | GPT-4o | 128,000 | - | $2.50 | $10.00 | Proprietary | 53.60% | 88.00% | 74.70% | - | - | 17.8% | GPT-4o | |
Gemini 1.5 Flash | 1,048,576 | - | $0.15 | $0.60 | Proprietary | 51.00% | 78.90% | 67.30% | - | 74.30% | Gemini 1.5 Flash | |||
xai | Grok-2 mini | 128,000 | - | - | - | Proprietary | 51.00% | 86.20% | 72.00% | - | 85.70% | Grok-2 mini | ||
meta | Llama 3.1 405B Instruct | 128,000 | 405 | $0.90 | $0.90 | Open | 50.70% | 87.30% | 73.30% | 84.80% | 89.00% | 23.0% | Llama 3.1 405B Instruct | |
meta | Llama 3.3 70B Instruct | 128,000 | 70 | $0.20 | $0.20 | Open | 50.50% | 86.00% | 68.90% | - | 88.40% | 19.9% | Llama 3.3 70B Instruct | |
anthropic | Claude 3 Opus | 200,000 | - | $15.00 | $75.00 | Proprietary | 50.40% | 86.80% | 68.50% | 83.10% | 84.90% | 23.5% | Claude 3 Opus | |
qwen | Qwen2.5 32B Instruct | 131,072 | 32.5 | - | - | Open | 49.50% | 83.30% | 69.00% | - | 88.40% | Qwen2.5 32B Instruct | ||
qwen | Qwen2.5 72B Instruct | 131,072 | 72.7 | $0.35 | $0.40 | Open | 49.00% | - | 71.10% | - | 86.60% | 23.30% | Qwen2.5 72B Instruct | |
openai | GPT-4 Turbo | 128,000 | - | $10.00 | $30.00 | Proprietary | 48.00% | 86.50% | - | 86.00% | 87.10% | GPT-4 Turbo | ||
amazon | Nova Pro | 300,000 | - | $0.80 | $3.20 | Proprietary | 46.90% | 85.90% | - | 85.40% | 89.00% | Nova Pro | ||
meta | Llama 3.2 90B Instruct | 128,000 | 90 | $0.35 | $0.40 | Open | 46.70% | 86.00% | - | - | - | Llama 3.2 90B Instruct | ||
qwen | Qwen2.5 14B Instruct | 131,072 | 14.7 | - | - | Open | 45.50% | 79.70% | 63.70% | - | 83.50% | Qwen2.5 14B Instruct | ||
mistral | Mistral Small 3 | 32,000 | 24 | $0.07 | $0.14 | Open | 45.30% | - | 66.30% | - | 84.80% | Mistral Small 3 | ||
qwen | Qwen2 72B Instruct | 131,072 | 72 | - | - | Open | 42.40% | 82.30% | 64.40% | - | 86.00% | Qwen2 72B Instruct | ||
amazon | Nova Lite | 300,000 | - | $0.06 | $0.24 | Proprietary | 42.00% | 80.50% | - | 80.20% | 85.40% | Nova Lite | ||
meta | Llama 3.1 70B Instruct | 128,000 | 70 | $0.20 | $0.20 | Open | 41.70% | 83.60% | 66.40% | 79.60% | 80.50% | Llama 3.1 70B Instruct | ||
anthropic | Claude 3.5 Haiku | 200,000 | - | $0.10 | $0.50 | Proprietary | 41.60% | - | 65.00% | 83.10% | 88.10% | Claude 3.5 Haiku | ||
anthropic | Claude 3 Sonnet | 200,000 | - | $3.00 | $15.00 | Proprietary | 40.40% | 79.00% | 56.80% | 78.90% | 73.00% | Claude 3 Sonnet | ||
openai | GPT-4o mini | 128,000 | - | $0.15 | $0.60 | Proprietary | 40.20% | 82.00% | - | 79.70% | 87.20% | 10.7% | GPT-4o mini | |
amazon | Nova Micro | 128,000 | - | $0.04 | $0.14 | Proprietary | 40.00% | 77.60% | - | 79.30% | 81.10% | Nova Micro | ||
Gemini 1.5 Flash 8B | 1,048,576 | 8 | $0.07 | $0.30 | Proprietary | 38.40% | - | 58.70% | - | - | Gemini 1.5 Flash 8B | |||
ai21 | Jamba 1.5 Large | 256,000 | 398 | $2.00 | $8.00 | Open | 36.90% | 81.20% | 53.50% | - | - | Jamba 1.5 Large | ||
microsoft | Phi-3.5-MoE-instruct | 128,000 | 60 | - | - | Open | 36.80% | 78.90% | 54.30% | - | 70.70% | Phi-3.5-MoE-instruct | ||
qwen | Qwen2.5 7B Instruct | 131,072 | 7.6 | $0.30 | $0.30 | Open | 36.40% | - | 56.30% | - | 84.80% | Qwen2.5 7B Instruct | ||
xai | Grok-1.5 | 128,000 | - | - | - | Proprietary | 35.90% | 81.30% | 51.00% | - | 74.10% | Grok-1.5 | ||
openai | GPT-4 | 32,768 | - | $30.00 | $60.00 | Proprietary | 35.70% | 86.40% | - | 80.90% | 67.00% | 25.1% | GPT-4 | |
anthropic | Claude 3 Haiku | 200,000 | - | $0.25 | $1.25 | Proprietary | 33.30% | 75.20% | - | 78.40% | 75.90% | Claude 3 Haiku | ||
meta | Llama 3.2 11B Instruct | 128,000 | 10.6 | $0.06 | $0.06 | Open | 32.80% | 73.00% | - | - | - | Llama 3.2 11B Instruct | ||
meta | Llama 3.2 3B Instruct | 128,000 | 3.2 | $0.01 | $0.02 | Open | 32.80% | 63.40% | - | - | - | Llama 3.2 3B Instruct | ||
ai21 | Jamba 1.5 Mini | 256,144 | 52 | $0.20 | $0.40 | Open | 32.30% | 69.70% | 42.50% | - | - | Jamba 1.5 Mini | ||
openai | GPT-3.5 Turbo | 16,385 | - | $0.50 | $1.50 | Proprietary | 30.80% | 69.80% | - | 70.20% | 68.00% | GPT-3.5 Turbo | ||
meta | Llama 3.1 8B Instruct | 131,072 | 8 | $0.03 | $0.03 | Open | 30.40% | 69.40% | 48.30% | 59.50% | 72.60% | Llama 3.1 8B Instruct | ||
microsoft | Phi-3.5-mini-instruct | 128,000 | 3.8 | $0.10 | $0.10 | Open | 30.40% | 69.00% | 47.40% | - | 62.80% | Phi-3.5-mini-instruct | ||
Gemini 1.0 Pro | 32,760 | - | $0.50 | $1.50 | Proprietary | 27.90% | 71.80% | - | - | - | Gemini 1.0 Pro | |||
qwen | Qwen2 7B Instruct | 131,072 | 7.6 | - | - | Open | 25.30% | 70.50% | 44.10% | - | - | Qwen2 7B Instruct | ||
mistral | Codestral-22B | 32,768 | 22.2 | $0.20 | $0.60 | Open | - | - | - | - | 81.10% | Codestral-22B | ||
cohere | Command R+ | 128,000 | 104 | $0.25 | $1.00 | Open | - | 75.70% | - | - | - | 17.4% | Command R+ | |
deepseek | DeepSeek-V2.5 | 8,192 | 236 | $0.14 | $0.28 | Open | - | 80.40% | - | - | 89.00% | DeepSeek-V2.5 | ||
Gemma 2 27B | 8,192 | 27.2 | - | - | Open | - | 75.20% | - | - | 51.80% | Gemma 2 27B | |||
Gemma 2 9B | 8,192 | 9.2 | - | - | Open | - | 71.30% | - | - | 40.20% | Gemma 2 9B | |||
xai | Grok-1.5V | 128,000 | - | - | - | Proprietary | - | - | - | - | - | Grok-1.5V | ||
moonshotai | Kimi-k1.5 | 128,000 | - | - | - | Proprietary | - | 87.40% | - | - | - | Kimi-k1.5 | ||
nvidia | Llama 3.1 Nemotron 70B Instruct | 128,000 | 70 | - | - | Open | - | 80.20% | - | - | - | Llama 3.1 Nemotron 70B Instruct | ||
mistral | Ministral 8B Instruct | 128,000 | 8 | $0.10 | $0.10 | Open | - | 65.00% | - | - | 34.80% | Ministral 8B Instruct | ||
mistral | Mistral Large 2 | 128,000 | 123 | $2.00 | $6.00 | Open | - | 84.00% | - | - | 92.00% | 22.5% | Mistral Large 2 | |
mistral | Mistral NeMo Instruct | 128,000 | 12 | $0.15 | $0.15 | Open | - | 68.00% | - | - | - | Mistral NeMo Instruct | ||
mistral | Mistral Small | 32,768 | 22 | $0.20 | $0.60 | Open | - | - | - | - | - | Mistral Small | ||
microsoft | Phi-3.5-vision-instruct | 128,000 | 4.2 | - | - | Open | - | - | - | - | - | Phi-3.5-vision-instruct | ||
mistral | Pixtral-12B | 128,000 | 12.4 | $0.15 | $0.15 | Open | - | 69.20% | - | - | 72.00% | Pixtral-12B | ||
mistral | Pixtral Large | 128,000 | 124 | $2.00 | $6.00 | Open | - | - | - | - | - | Pixtral Large | ||
qwen | QvQ-72B-Preview | 32,768 | 73.4 | - | - | Open | - | - | - | - | - | QvQ-72B-Preview | ||
qwen | Qwen2.5-Coder 32B Instruct | 128,000 | 32 | $0.09 | $0.09 | Open | - | 75.10% | 50.40% | - | 92.70% | Qwen2.5-Coder 32B Instruct | ||
qwen | Qwen2.5-Coder 7B Instruct | 128,000 | 7 | - | - | Open | - | 67.60% | 40.10% | - | 88.40% | Qwen2.5-Coder 7B Instruct | ||
qwen | Qwen2-VL-72B-Instruct | 32,768 | 73.4 | - | - | Open | - | - | - | - | - | Qwen2-VL-72B-Instruct | ||
cohere | Command A | 256,000 | 111 | $2.50 | $10.00 | Open | - | 85.00% | - | - | - | - | - | Command A |
baidu | ERNIE 4.5 | - | - | - | - | - | 75.00% | - | 79.00% | 87.00% | 85.00% | ERNIE 4.5 | ||
Gemma 3 1B | 128,000 | 1 | - | - | Open | 19.20% | 29.90% | 14.70% | - | 32.00% | - | - | Gemma 3 1B | |
Gemma 3 4B | 128,000 | 4 | - | - | Open | 30.80% | 46.90% | 43.60% | - | - | - | - | Gemma 3 4B | |
Gemma 3 12B | 128,000 | 12 | - | - | Open | 40.90% | 65.20% | 60.60% | - | - | - | - | Gemma 3 12B | |
Gemma 3 27B | 128,000 | 27 | - | - | Open | 42.40% | 72.1% | 67.50% | - | 89.00% | - | - | Gemma 3 27B | |
qwen | Qwen2.5 Max | 32,768 | - | 59.00% | - | 76.00% | - | 93.00% | 23.00% | - | Qwen2.5 Max | |||
qwen | QwQ 32B | 131,000 | 32.8 | Open | 59.00% | - | 76.00% | 98.00% | 78.00% | - | QwQ 32B |