HomeWinBuzzer NewsCerebras Systems Targets Nvidia with New AI Inference Service

Cerebras Systems Targets Nvidia with New AI Inference Service

Cerebras Systems has launched a new AI inference service, claiming it's the fastest globally. It offers efficiency and responsiveness for AI tasks.

-

Cerebras Systems has rolled out a new AI inference service, touting it as the swiftest globally, presenting a direct challenge to Nvidia’s dominance. The cloud-based service aims to significantly boost the efficiency and responsiveness of AI inference tasks, essential for real-time data analytics and decision-making.

AI Inference Explained

In AI, inference involves using a trained model to analyze new data, generate predictions, or perform specific functions. This phase is critical for deploying AI applications and constitutes a large portion of AI workloads in cloud computing environments. Despite the existing capabilities, the demand for more efficient and cost-effective inference solutions continues to rise, as indicated by CEO Andrew Feldman of Cerebras Systems.

The foundation of Cerebras Systems’ offering is the WSE-3 processor, packing 1.4 trillion transistors, over 900,000 compute cores, and 44 gigabytes of on-chip SRAM. The processor is housed within the CS-3 data center appliance, engineered to deliver extreme performance with speeds reaching up to 125 petaflops. The compact CS-3 unit, akin to a small refrigerator in size, powers Cerebras’ new inference service, boasting considerably greater memory capacity than Nvidia’s H100 GPU.

Performance and Cost Efficiency

Cerebras reports that its inference service dramatically outpaces those based on Nvidia GPUs, with up to 20 times faster performance. Specifically, it processes 1,800 tokens per second for the Llama 3.1 8B model and 450 tokens per second for the Llama 3.1 70B model. Priced at 10 cents per million tokens, this service offers a noteworthy price-performance advantage over current alternatives.

Cerebras has established partnerships with multiple companies, such as LangChain, LlamaIndex, Docker, Weights & Biases, and AgentOps. These partnerships aim to equip clients with the tools necessary to expedite AI development. Additionally, discussions are ongoing with major cloud service providers to extend its offerings, targeting an expanded customer base that includes AI-focused firms like CoreWeave and Lambda.

Service Tiers and Accessibility

The inference service is structured into three access tiers. The free tier provides API-based access with ample limits for experimentation, while the Developer Tier offers expanded access for development projects.

Cerebras Systems is an American company that develops and provides computer systems and chips for artificial intelligence applications. Founded in 2016, Cerebras Systems is backed by leading venture capitalists and technologists. Its flagship product so fare is the Wafer Scale Engine, the largest and fastest AI chip in the world.

In July, Cerebras and G42 announced the Condor Galaxy 1 supercomputer. Condor Galaxy is a network of nine interconnected  designed to significantly reduce AI model training time. The first AI supercomputer on this network, Condor Galaxy 1 (CG-1), boasts 4 exaFLOPs and 54 million cores.

Last Updated on November 7, 2024 3:06 pm CET

Luke Jones
Luke Jones
Luke has been writing about Microsoft and the wider tech industry for over 10 years. With a degree in creative and professional writing, Luke looks for the interesting spin when covering AI, Windows, Xbox, and more.

Recent News

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x
Mastodon