Amazon Web Services (AWS) is developing two custom chips to help customers train large language models (LLMs) on its cloud platform. The chips, called Trainium and Inferentia, are designed to accelerate the training and inference of LLMs, which are a type of artificial intelligence (AI) that can generate text, translate languages, and answer questions in a comprehensive and informative way.
Trainium is a training chip that is optimized for the computationally intensive task of training LLMs. It features a large number of cores and high-bandwidth memory to support the massive datasets that are required to train these models. Inferentia is an inference chip that is designed to accelerate the real-time use of LLMs. It features a smaller number of cores than Trainium, but it is more energy efficient and can be used to deploy LLMs on a wider range of devices.
AWS is not the only company that is developing custom chips for LLM training. NVIDIA has its Grace Hopper superchip, which is specifically designed for training large language models. Microsoft is also developing its Athena AI chip with AMD. However, Amazon’s chips are unique in that they are designed to be used on AWS’s cloud platform. This means that customers who use AWS will not need to purchase their own hardware, and they will be able to take advantage of AWS’s global infrastructure to train their LLMs.
The development of Trainium and Inferentia is a significant step forward for AWS’s LLM capabilities. These chips will allow AWS customers to train and deploy LLMs more quickly and efficiently than ever before. This will open up new possibilities for the use of LLMs in a wide range of applications, such as customer service, natural language processing, and machine translation.
Competing with Microsoft, Google, and Facebook
Discussing generative AI and the company’s plan, Amazon CEO Andy Jassy spoke with CNBC and said the following:
“I think of generative AI as having three macro layers, and they are all really big and important… The bottom layer is the compute, all the machine learning training and inference. What matters in that compute is the chip in there… There has really been one chip provider … supply is more scarce and it’s expensive. It’s why we’ve invested over the last few years in our own customized training chips and inference chips, which will have much better price performance than anywhere else… We are quite optimistic that a lot of the machine learning training and inference will be done on AWS chips and compute.”
In the past few years, several tech giants have launched their own generative AI models and services, such as Microsoft Bing Chat, OpenAI ChatGPT, Google Bard, and Meta’s Llama 2. These models are based on deep neural networks that can generate natural language responses or images from user-given prompts or dialogue. However, these models also rely heavily on Nvidia’s graphics processing units (GPUs), which are in high demand and short supply due to the global chip shortage.
To overcome this challenge, Amazon Web Services (AWS), the world’s largest cloud provider, has been developing its own custom chips for generative AI. These chips are called Inferentia and Trainium, and they are designed to accelerate and optimize the inference and training of generative AI models on AWS. AWS claims that these chips can offer better performance and lower cost than Nvidia GPUs for generative AI workloads.
“Hype cycle is pretty different than substance cycle,” Jassop says in response to Microsoft’s and Google’s apparent head start. “I see generative AI as one of the biggest technical transformations in our lifetimes. I think it has the ability to transform virtually every customer experience that we know.”