Google has unveiled a new, performance-centric Tensor Processing Unit (TPU) known as the v5p, aiming to drastically reduce the time required to train large language models. Building upon the previously announced TPU v5e, the v5p offers an increase in computational capabilities transparently targeting the needs of more intensive AI workloads.
Enhanced Performance and Scalability
The TPU v5p stands out with its formidable bfloat16 performance of 459 teraFLOPS or 918 teraOPS when dealing with Int8 calculations. Equipped with 95GB of high bandwidth memory and a data transfer speed of 2.76 TB/s, it surpasses its predecessors in both efficiency and utility. Google's design allows for a remarkable scaling potential, with up to 8,960 TPU v5p units being able to interconnect within a single pod through a 600 GB/s inter-chip interconnect. Compared to the TPU v5e, the v5p boasts a 35-fold increase in maximum cluster size and more than doubles the potential TPU v4 cluster size.
Mark Lohmeyer, the VP of Google's compute and ML infrastructure division, states that the accelerator can train popular language models like OpenAI's GPT-3 up to 1.9 times faster using BF16, with potential gains of up to 2.8 times for 8-bit integer calculations in comparison to the TPU v4 units.
Premium Performance Comes at a Price
The enhanced capability of the TPU v5p does impact the cost. Clients can expect an hourly rate of $4.20 per TPU v5p accelerator, in contrast to the $3.22 for TPU v4 and just $1.20 for the TPU v5e. Google positions the v5e as a more cost-efficient option for applications where time is not the paramount factor, providing a more accessible entry point for AI training without the need for extreme performance.
Alongside the new TPU v5p, Google introduced a new concept termed “AI hypercomputer,” which integrates hardware, software, machine learning frameworks, and consumption models to address AI workloads efficiently. By optimizing multiple variables within the system, Google's AI hypercomputing architecture seeks to eliminate common inefficiencies and bottlenecks, promising boosted productivity across various AI tasks.
Moreover, Google showcased Gemini, a multi-modal large language model adept at processing text, images, video, audio, and even code, heralding a significant milestone in Google's AI capabilities that coincides with the TPU v5p's roll-out. In the highly competitive landscape of AI acceleration, Google's advancement with TPU v5p beckons new possibilities for developers and businesses harnessing the power of artificial intelligence.