Anthropic has launched a prompt caching feature for its Claude API to help developers cut costs and enhance performance. The feature, still in its public beta phase, is currently available for the Claude 3.5 Sonnet and Claude 3 Haiku models, with future support planned for Claude 3 Opus.
How Prompt Caching Works
This new functionality lets users keep commonly used contexts ready for reuse in their sessions. By caching prompts, developers can avoid the need to resend identical contexts, saving both time and expense. This proves especially useful in scenarios involving repeated reference to extensive contexts during interactions with the model.
A 2023 technical paper outlining prompt caching explains how users can retain regularly used contexts in their sessions. This enables developers to add contextual information without additional expenses, facilitating more precise model responses. Notably, the cache holds a 5-minute lifetime and refreshes with each use, according to AI expert Simon Willison.
Initial users have noted marked improvements in both speed and affordability. In one case, a chat using a cached prompt with 100,000 tokens saw latency drop from 11.5 seconds to 2.4 seconds, alongside a 90% reduction in costs. Another case with a 10,000 token prompt observed a 31% latency reduction and an 86% cost decrease.
Pricing Details
The pricing structure underscores the economic advantages of prompt caching. For Claude 3.5 Sonnet, the cost to cache a prompt is $3.75 per million tokens (MTok), while utilizing a cached prompt costs only $0.30 per MTok. The price offers a reduction from the standard input token price of $3/MTok.
For Claude 3 Haiku, caching costs $0.30/MTok, and using the stored prompt is $0.03/MTok. While prompt caching for Claude 3 Opus is not yet available, announced prices suggest $18.75/MTok for writing to cache and $1.50/MTok for accessing the cached prompts.
Use Cases and Applications
Prompt caching opens up a variety of uses. Developers can leverage it for cost and latency reductions in tasks such as long instruction processing, document embedding in conversational agents, improving code autocompletion speed, supplying extensive instructions to search agents, and prompt-based document embedding.
The move aligns with Anthropic's broader aim to stay competitive in the AI services market. The company has previously lowered token prices to attract developers. With this newest feature, Anthropic is positioning itself as a more affordable option against competitors like Google and OpenAI, who also strive to offer cost-efficient solutions for developers.
In July, the company launched its prompt playground. Creating optimized AI prompts—crafted inputs to achieve desired model outputs—has become indispensable in the AI field. Small tweaks in prompts can substantially impact results. Traditionally, developers either guessed these modifications or hired experts. Anthropic's tools aim to simplify this by offering immediate feedback and minimizing manual adjustments.