HomeWinBuzzer NewsOpenAI´s New Predicted Outputs Feature Cuts GPT-4o Latency Issues by 5x

OpenAI´s New Predicted Outputs Feature Cuts GPT-4o Latency Issues by 5x

OpenAI's Predicted Outputs feature offers up to 5x faster GPT-4o performance, revolutionizing how AI handles long documents and coding tasks.

-

OpenAI has unveiled Predicted Outputs, a new feature for developers that promises to tackle one of the biggest issues with large language models: latency. By employing speculative decoding, the new featureenables GPT-4o and GPT-4o-mini models to skip over known sections of text, providing faster responses for tasks like editing code or updating documents. According to Microsoft’s benchmarks, the feature led to a staggering 5.8x speed boost in Copilot Workspace operations.

The speed upgrade could transform workflows, especially in environments where even slight delays impact productivity. Speculative decoding, the core technology behind this advancement, predicts and processes known parts of text, allowing LLMs to work more efficiently.

Eddie Aftandilian from Microsoft confirmed the positive performance gains observed during internal testing. According to Microsoft´s own benchmarks, testing Predicted Outputs on Copilot Workspace workloads generated 5.8x faster responses. 

The Mechanism Behind Speculative Decoding

At the heart of this speed enhancement is so-called speculative decoding. In simple terms, it’s a method where the model uses predictions to fast-forward through familiar text, avoiding the need to regenerate the entire output. The technique is especially beneficial for tasks like refactoring code or revising documents, where a large chunk of the content often remains unchanged. By leveraging Predicted Outputs, developers can create more responsive applications, improving user experience in real-time settings.

Predicted Outputs might be of special importance for the upcoming o1 model, which is substantially more expensive to run and much slower than its predecessors. OpenAI’s API pricing structure for its GPT-4o and o1-preview models relies on a per-token cost system, dividing charges between input and output tokens. For GPT-4o, processing one million input tokens costs about $5 USD, while one million output tokens are priced at approximately $13.95 USD. On the other hand, the o1-preview model comes with significantly higher rates: one million input tokens are priced at $15 USD, and output tokens reach $60 USD per million.

GPT-4o and GPT-4o Mini

OpenAI´s current leading stable model GPT-4o, set itself apart with its ability to process text, images, and now voice, responding to audio inputs in milliseconds, making conversations feel natural and engaging. Mira Murati, then still OpenAI’s CTO, described the integration of voice recognition as a breakthrough, noting its capability to detect emotional cues and adjust responses accordingly. The model’s real-time reaction speed was a key selling point, especially for applications where instant feedback is crucial.

In August, OpenAI took customization to a new level by enabling fine-tuning for GPT-4o. Companies can now tailor the model to fit specific use cases, whether in customer service or software development. With the use of specialized datasets, organizations can adjust the AI’s tone, response structure, and even language preferences. 

GPT-4o Mini, a smaller, faster and cheaper variant released in July introduced a critical safety feature: instruction hierarchy. It ensures that developer-set guidelines are prioritized over user inputs, a defense mechanism against prompt injections—a common exploit where users attempt to trick the AI into deviating from its intended behavior. This structure helps maintain the model’s reliability, as it gives precedence to developer instructions, even in the face of disruptive user commands.

Prompt injections have long plagued AI platforms, allowing malicious users to manipulate outputs. By incorporating instruction hierarchy, GPT-4o Mini aims to deliver safer and more consistent performance. The update aligns with OpenAI’s ongoing commitment to AI safety, especially as the company explores deploying more advanced automated agents capable of performing digital tasks.

Anthropic’s Rival Models: Claude 3.5 Sonnet and Haiku

Competition in the AI field is clearly heating up. In June, Anthropic released Claude 3.5 Sonnet, a model that outperformes GPT-4o in multiple coding and vision-related benchmarks. It excelled in speed, with a 200,000-token context window for processing large texts. And this week Anthropic launched Claude 3.5 Haiku, a budget-friendly version designed for text-heavy tasks, offering substantial cost savings with features like prompt caching.

Haiku’s simplified design targets startups and medium-sized businesses needing efficient AI without breaking the bank. Anthropic’s focus on affordability and safety measures, such as their Responsible Scaling Policy, has made them a formidable competitor.

Last Updated on November 7, 2024 2:12 pm CET

SourceOpenAI
Markus Kasanmascheff
Markus Kasanmascheff
Markus has been covering the tech industry for more than 15 years. He is holding a Master´s degree in International Economics and is the founder and managing editor of Winbuzzer.com.
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
0
We would love to hear your opinion! Please comment below.x
()
x
Mastodon