Zoom’s Chain of Draft-Prompting Cuts Reasoning AI Cost by 90%

Zoom has introduced Chain of Draft, a new AI prompting method that reduces token usage by 92% and slashes operational costs by 90%.

Zoom researchers have introduced a new prompting technique called Chain of Draft (CoD) that could fundamentally change how artificial intelligence models process reasoning tasks.

By rethinking how AI generates responses, CoD reduces token usage by up to 92% and lowers operational costs by 90%. Instead of relying on verbose explanations, as seen in traditional AI reasoning models, this method forces AI to be structured and efficient while maintaining accuracy.

This breakthrough comes at a time when large language models (LLMs) are consuming increasing amounts of computing power, making efficiency a growing concern.

The question now is whether techniques like CoD will influence the broader industry, particularly as major players like OpenAI, Google, Microsoft, and others face mounting pressure to cut costs.

How Chain of Draft Works

Chain of Draft (CoD) is a structured prompting strategy designed to improve efficiency in AI reasoning while reducing computational overhead. It builds upon Chain of Thought (CoT) prompting, which encourages large language models (LLMs) to break down complex problems into multi-step explanations.

While CoT has proven effective for improving logical reasoning, it significantly increases token usage, leading to higher costs and slower response times. CoD seeks to address these inefficiencies by enforcing a minimalist approach to intermediate reasoning steps.

The core principle behind CoD is to emulate how humans process information when solving complex problems. Instead of generating detailed explanations at every step, CoD instructs the model to produce only essential intermediate results—akin to how a person might jot down a few key notes while working through a problem. This structured conciseness allows LLMs to maintain logical accuracy while dramatically reducing unnecessary token generation.

Unlike previous efficiency-focused techniques, such as Concise Thoughts (CCoT) or token-budget-aware reasoning, CoD does not rely on pre-determined token budgets for an entire task. Instead, it applies a dynamic, per-step constraint, allowing for unrestricted reasoning steps while maintaining overall conciseness​

Why AI Efficiency Matters More Than Ever

AI models rely on tokens—the fundamental units of text processing—to generate responses. The more tokens a model uses, the higher the cost of operation.

Techniques such as Chain of Thought (CoT) prompting have been developed to improve AI’s ability to handle complex tasks by encouraging step-by-step reasoning. However, this approach significantly increases token usage, making AI operations increasingly expensive.

Zoom’s CoD method introduces a different strategy. Rather than having AI articulate every step with excessive verbosity, CoD optimizes the structure of responses, ensuring logical depth while minimizing unnecessary output.

The implications of this could be vast, particularly for industries that depend on AI-driven automation, such as enterprise AI, finance, and customer service.

Experimental Results and Token Efficiency

Extensive benchmarking has demonstrated that CoD can match or surpass CoT in accuracy while drastically reducing token usage. In experiments across arithmetic, commonsense, and symbolic reasoning tasks, CoD used as little as 7.6% of the tokens required by CoT, significantly lowering computational costs​.

By shifting LLMs away from excessively verbose reasoning towards a structured, minimalistic drafting method, CoD presents a scalable and cost-effective approach to improving AI efficiency.

The technique has the potential to influence AI deployment strategies across multiple domains, particularly in areas where cost efficiency and latency reduction are critical concerns.

The Zoom research paper presents empirical evaluations across multiple task categories, revealing how CoD performs relative to standard prompting and CoT.

CoD was tested with OpenAI’s GPT-4o model and Claude 3.5 Sonnet from Anthropic on GSM8K, a widely used dataset for evaluating arithmetic reasoning in language models. The results indicate that while CoT achieves slightly higher accuracy, it does so at a massive computational cost. In contrast, CoD provides a near-equivalent level of correctness while drastically lowering token consumption.

Source: Zoom

For commonsense reasoning, CoD was evaluated on BIG-bench’s date understanding and sports understanding tasks. Results show that CoD not only reduces computational requirements but also outperforms CoT in certain cases, demonstrating its effectiveness in practical applications.

Source: Zoom

Symbolic reasoning tasks, such as coin flipping prediction, tested CoD’s effectiveness in highly structured logical tasks. The evaluation confirmed substantial efficiency improvements.

Limitations on Small Models

While CoD proves highly effective on large-scale LLMs, it performs less efficiently on small models (≤3B parameters) due to the lack of training exposure to CoD-style reasoning. The results on Qwen2.5 (1.5B and 3B), Llama 3.2 (3B), and Zoom-SLM (2.3B) highlight a more significant performance gap compared to CoT.

Source: Zoom

These findings suggest that small models require fine-tuning with CoD-style data to fully leverage its efficiency benefits. Without adaptation, accuracy loss becomes more pronounced, limiting CoD’s immediate applicability for lightweight AI systems.

OpenAI Adjusts Its AI Model Strategy

While companies like Zoom are working on refining in AI efficiency, OpenAI is currently restructuring its model lineup. On February 13, 2025, the company announced that it would discontinue its unreleased standalone o3 model and consolidate its structured reasoning capabilities into GPT-5.

The decision was largely a response to growing confusion among users over OpenAI’s expanding selection of AI models.

OpenAI then introduced GPT-4.5 as its last non-reasoning AI model a temporary bridge toward GPT-5, shifting focus from multiple model options to a more streamlined AI system. Before its release the underlying model with the codename Orion had been expected to be released as GPT-5.

Its underwhelming performance in comparison to modern reasoning models like OpenAI’s o3-mini, Grok 3, and Claude 3.7 Sonnet appears to have influenced this decision.

Microsoft Offers Free Advanced AI Reasoning

Less than a month later, Microsoft took a decisive step that further pressured OpenAI’s business model. Microsoft recently announced that its Copilot assistant would now offer OpenAI’s o3-mini-high for free, removing a paywall that had previously limited access to the more advanced reasoning model.

Prior to this move, OpenAI’s o3-mini-high model was available only through paid subscription plans.

Microsoft’s decision directly challenged OpenAI’s premium AI offerings and added further complexity to OpenAI’s efforts to monetize its most capable AI models. This shift also underscores why efficiency breakthroughs like Zoom’s CoD are becoming increasingly relevant.

DeepSeek Moves Quickly to Challenge OpenAI

Meanwhile, competition in the AI space continued intensifying. On February 26, 2025, Chinese AI lab DeepSeek announced that it was accelerating the release of its R2 model. Originally scheduled for May 2025, the model’s launch was moved up to counter the dominance of OpenAI, Alibaba, and Google.

DeepSeek’s rise has coincided with a surge in AI development in China, where companies are seeking alternatives to U.S.-developed models. However, the company faces challenges beyond competition.

After DeepSeek’s surprising success with its R1 reasoning model, other Chinese have reportedly stockpiling Nvidia’s H20 processors due to tightening U.S. trade sanctions, reflecting the growing difficulty of acquiring high-performance AI chips.

Alibaba and Amazon Take Different Paths to AI Efficiency

While OpenAI and DeepSeek refine their AI reasoning strategies, other companies are focusing on different cost-reduction approaches.

Alibaba just introduced QwQ-32B, an open-source AI model designed to deliver high-performance reasoning with reduced computational costs. The release positions Alibaba as a direct competitor to OpenAI and DeepSeek, particularly for businesses looking for affordable AI solutions.

Amazon is reportedly also entering the AI efficiency race but with a different strategy. The company is developing Nova AI, a proprietary model expected to launch by June 2025.

Unlike Alibaba’s open-source approach, Amazon is integrating Nova AI directly into AWS, strengthening its AI cloud service offerings and most probably the just announced paid plan for the AI powered version of its Alexa voice assistant, Alexa+.

As the AI industry pivots toward reducing operating costs, companies are experimenting with different strategies. Whether through CoD’s structured prompting, DeepSeek’s optimized models, or Alibaba’s cost-friendly alternatives, AI firms are moving beyond sheer model size and focusing on long-term efficiency.

Markus Kasanmascheff
Markus Kasanmascheff
Markus has been covering the tech industry for more than 15 years. He is holding a Master´s degree in International Economics and is the founder and managing editor of Winbuzzer.com.

Recent News

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
0
We would love to hear your opinion! Please comment below.x
()
x