OpenAI has officially launched its GPT-4.1 model family, introducing three new versions—GPT-4.1, GPT-4.1 Mini, and GPT-4.1 Nano—that aim to balance top-tier performance with flexible cost and speed. Built to outperform its predecessors, the GPT-4.1 series improves core capabilities such as code generation, instruction following, and long-context reasoning while delivering lower latency and more consistent tool usage.
Unlike GPT-4 and GPT-4o, which power ChatGPT for Pro users, the new models are available exclusively through the OpenAI API—underscoring the company’s continued focus on enterprise and developer integrations rather than consumer-facing chatbot use.
The release is not just a technical upgrade but also a shift in OpenAI’s platform strategy. By segmenting the new models into three distinct performance tiers, OpenAI is giving API users the ability to scale based on workload size and budget. At the top, the standard GPT-4.1 model is designed for the most complex applications and is priced accordingly: $2 per million input tokens and $8 per million output tokens.
GPT-4.1 Mini is a middle-ground option, offering near parity in intelligence benchmarks with much lower latency—priced at $0.40 per million input tokens and $1.60 per million output. For lightweight tasks and real-time use cases, GPT-4.1 Nano is the most cost-efficient yet, costing just $0.10 per million input tokens and $0.40 per million output tokens.
Each model comes with performance trade-offs, but OpenAI claims that even the smallest model, Nano, outperforms previous offerings like GPT-4o Mini in multiple benchmarks. These additions reflect OpenAI’s growing emphasis on making generative AI accessible across a broader range of use cases—from high-performance agentic workflows to embedded tools in consumer apps.
Notably, all three models share the same knowledge cutoff (June 2024) and have been tuned to deliver more deterministic, format-following outputs than earlier generations—helping reduce hallucinations and improve integration reliability in production environments.
The structured pricing and targeted improvements across GPT-4.1 variants suggest a deliberate push toward maturity in OpenAI’s API product line—one that appeals not only to research and prototyping, but to high-scale deployment in commercial software, SaaS platforms, and autonomous agent systems. With GPT-4.1 now positioned as the successor to the soon-to-be-deprecated GPT-4.5 Preview (ending July 14, 2025), OpenAI is making clear that this generation is expected to carry the operational weight of many of its commercial partners going forward.
Performance Benchmarks and Model Variants
The GPT-4.1 models demonstrate notable advancements over their predecessors. The standard GPT-4.1 model achieved a score of 54.6% on the SWE-bench Verified benchmark, marking a 21.4% absolute improvement over GPT-4o and a 26.6% absolute improvement over GPT-4.5.

In instruction following, GPT-4.1 scored 38.3% on Scale’s MultiChallenge benchmark, reflecting a 10.5% absolute increase over GPT-4o.

Additionally, GPT-4.1 set a new state-of-the-art result on the Video-MME benchmark for multimodal long-context understanding, scoring 72.0% on the “long, no subtitles” category, a 6.7% absolute improvement over GPT-4o.
OpenAI has also introduced two streamlined versions: GPT-4.1 Mini and GPT-4.1 Nano. GPT-4.1 Mini offers reduced latency and cost, achieving nearly half the latency and an 83% cost reduction compared to GPT-4o, while matching or exceeding GPT-4o’s performance in intelligence evaluations.

GPT-4.1 Nano, OpenAI’s smallest and fastest model to date, is optimized for tasks requiring low latency and cost-efficiency. It supports a context window of up to 1 million tokens and scores 80.1% on MMLU, 50.3% on GPQA, and 9.8% on Aider polyglot coding benchmarks, surpassing GPT-4o Mini’s performance.
These models are available exclusively through OpenAI’s API, catering to developers seeking to integrate advanced AI capabilities into their applications. The pricing for the models is as follows: GPT-4.1 at $2 per million input tokens and $8 per million output tokens; GPT-4.1 Mini at $0.40 per million input tokens and $1.60 per million output tokens; and GPT-4.1 Nano at $0.10 per million input tokens and $0.40 per million output tokens.

Enhanced Instruction Following and Long-Context Comprehension
OpenAI has focused on improving the models’ ability to follow instructions and comprehend long-context inputs. The GPT-4.1 models are designed to better utilize extensive context windows, supporting up to 1 million tokens, and exhibit improved long-context comprehension. These enhancements make the models more effective for powering AI agents capable of independently accomplishing tasks on behalf of users, such as software engineering, document analysis, and customer support.
According to OpenAI, “These improvements in instruction following reliability and long context comprehension also make the GPT-4.1 models considerably more effective at powering agents, or systems that can independently accomplish tasks on behalf of users.”

Model Limitations and Considerations
While the GPT-4.1 models offer meaningful advancements, OpenAI acknowledges certain limitations. The models can be more literal than previous versions, sometimes requiring more specific and explicit prompts from users. Additionally, the accuracy of the models decreases as the number of input tokens increases; in OpenAI’s own tests, accuracy dropped from around 84% with 8,000 tokens to 50% with 1,024 tokens.
OpenAI also notes that GPT-4.1 becomes less reliable the more input tokens it has to process, highlighting the importance of prompt engineering and context management in application development. In its official post, the company states, “Early testers noted that GPT‑4.1 can be more literal, so we recommend being explicit and specific in prompts.”
Additionally, the GPT-4.5 Preview model will be deprecated on July 14, 2025, making way for wider adoption of GPT-4.1. The newer models have a knowledge cutoff of June 2024, giving developers access to more current data compared to earlier versions.