Perplexity Lets AI Agents Write Their Own Search Code

Perplexity's Search as Code lets AI agents generate Python search workflows, but claimed token savings and benchmark gains still need outside validation.

By Markus Kasanmascheff

June 7, 2026 10:17 pm CEST

TL;DR

Architecture Launch: Perplexity has introduced Search as Code as AI-written Python search workflows for agent retrieval.
Search Mechanism: The system uses a model, restricted sandbox, and Agentic Search SDK to build retrieval pipelines.
Benchmark Caveat: Perplexity claims 100 percent software-vulnerability accuracy and 85.1 percent lower token use, pending outside validation.
Developer Test: Developers should compare rollout results against OpenAI, Exa, Parallel, Google, TinyFish, and Tavily alternatives.

Perplexity introduced Search as Code as a reference architecture for AI-written Python search workflows, following its 2025 real-time Search API. The new approach shifts the pitch from repeatedly calling a fixed endpoint to letting an agent build the retrieval steps a task needs.

Longer research jobs usually force agents into a query, read, refine, and query again loop. Search as Code moves more of that plan into code generated for the task. It exposes search components as software development kit primitives inside a restricted sandbox so an agent can retrieve candidates, filter pages, remove duplicates, and rerank results.

Perplexity says its CVE vendor-advisory task reached 100 percent accuracy while using 85.1 percent fewer tokens than its baseline. Because those figures come from the company’s own benchmark, developers still need outside runs before treating the accuracy and token savings as repeatable.

How Model-Written Search Workflows Run

Search as Code uses a model as the control plane, a restricted compute sandbox for generated code, and the Agentic Search SDK, the software development kit that exposes Perplexity search functions. Agents can use that three-layer stack to assemble retrieval, filtering, deduplication, and reranking steps instead of accepting one response shape from a static endpoint.

Generated scripts can show which pages were searched, which candidates were discarded, and which ranking steps shaped the answer. Teams also have to review the generated code path, because selection logic becomes part of the system’s trust boundary.

Python is the first runtime after Perplexity tested Python, Rust, TypeScript, and Bash, which keeps the first developer surface familiar but adds generated-code review to the search workflow. The architecture lets generated Python call backend functions for retrieval, filtering, deduplication, and reranking.

Teams already using Perplexity’s Search API can find guidance in its docs on specific queries, multi-query searches, rate-limit backoff, and concurrent searches. Multi-query requests can include up to five queries, underscoring how much hand tuning already sits around supposedly simple search calls.

Benchmarks and Market Alternatives

Perplexity’s CVE test covered 200 software vulnerabilities (CVEs) published between 2023 and 2025. Its benchmark table puts Search as Code ahead on four of five benchmark rows against OpenAI, Anthropic, Exa, and Parallel, with a tie against OpenAI on Humanity’s Last Exam. Company-run comparisons can guide what developers test next, but they do not settle whether model-written workflows handle messy web evidence better.

LiveBrowseComp explains why the claim matters. Search-augmented agents lost 25 to 40 points when questions targeted fresh information, while closed-book accuracy stayed below 2 percent. Perplexity’s token-savings claim becomes useful if repeatable, but not enough on its own to prove stronger verification.

Competition is direct. OpenAI’s Responses API offers web search before answer generation, Exa markets itself as a search engine for AIs, and Parallel emphasizes evidence-based outputs for agent workflows.

The rivals are not all selling the same layer. OpenAI distinguishes quick web search, agentic search with reasoning models, and longer deep-research runs, while Exa emphasizes search, content extraction, answer generation, and structured research endpoints. Parallel’s pitch centers on provenance, cost, and benchmarked accuracy for search and deep-research products.

Google’s AI search shift toward agents, TinyFish’s web-infrastructure push, and Tavily’s agent-oriented Search API put Perplexity in a market where citation quality, predictable cost, controllable retrieval behavior, and code-safety boundaries shape developer trust. Agentic search products are moving toward delegated workflows that must prove they can control cost and evidence quality, not just add another way to reach the web.

What Developers Should Watch Next

Search as Code is available first in Perplexity Computer and the Perplexity Agent API, with Perplexity Computer already positioned as an AI PC environment tied to local-cloud routing. Developers will have to watch whether real workloads reduce verification work or simply move it into sandbox policy, debugging, and pipeline maintenance.

Perplexity Computer also has to decide when work should stay local and when cloud models are needed, so generated retrieval code becomes part of a broader trust boundary. Perplexity’s planned Wide Agentic Deep Research benchmark is the next concrete gate for the June 1 architecture.

Perplexity Lets AI Agents Write Their Own Search Code

How Model-Written Search Workflows Run

Benchmarks and Market Alternatives

What Developers Should Watch Next

Recent News

AI Water Demand Could Match 1.3 Billion People by 2030

xAI Reportedly Used Workaround to Train Grok With Claude Output After...

Developer Turns Azure Linux 4 Into Bootable Desktop-App