IBM has successfully developed a low-power analog AI processor that offers a more energy-efficient approach to running AI models. Unlike traditional digital computation, this processor utilizes analog computation, which has shown to be over 14 times more energy-efficient while maintaining comparable accuracy in speech-recognition tasks. In a research paper published in Nature on Wednesday, the company demonstrates that its equipment can conduct speech recognition efficiently with commendable precision while consuming significantly less energy.
Analog computers are a category of computers that exploit the continuous changes observed in physical phenomena like electrical, mechanical, or hydraulic properties (analog signals) to simulate and address the problem at hand. In contrast, digital computers symbolically depict changing properties and use discrete values for both time and amplitude (digital signals).
Deep Neural Networks and Hardware Limitations
In the last decade, AI techniques have expanded across various applications, from image and video recognition to speech transcription and generation. This expansion is largely attributed to the continuous evolution of deep neural network (DNN) models, which now contain up to one billion parameters. These models have significantly reduced the word error rate (WER) in automated transcription of English sentences. However, the hardware's performance hasn't kept up, resulting in longer training times and increased energy consumption. The challenge arises when large networks are trained using general-purpose processors, leading to the “von Neumann bottleneck” – a problem caused by excessive energy consumption when transferring vast amounts of data between memory and processor. The authors write:
“Large networks are still trained and implemented using general-purpose processors such as graphics processing units and central processing units, leading to excessive energy consumption when vast amounts of data must move between memory and processor, a problem known as the von Neumann bottleneck.”
“Analog AI”: A Solution to Inefficiencies
Analog AI hardware offers a promising solution to these inefficiencies. By leveraging non-volatile memory (NVM) arrays, these systems can directly perform the dominant ‘multiply and accumulate computation' (MAC) operations in memory. This approach drastically reduces both the time and energy required for computations, making it particularly beneficial for DNN models with extensive fully connected layers. As the research paper elucidates:
“By moving only neuron-excitation data to the location of the weight data, where the computation is then performed, this technology has the potential to reduce both the time and the energy required.”
Experimental Results and Chip Design
The research team presented results from a 14-nm inference chip that incorporates 34 large arrays of phase-change memory (PCM) devices, analog peripheral circuitry, and a massively parallel 2D-mesh routing system. Although the chip doesn't have on-chip digital computing cores or static random access memory (SRAM), it effectively demonstrates the accuracy, performance, and energy efficiency of analog AI in natural language processing (NLP) inference tasks.
Applications and Demonstrations
To showcase the chip's versatility, the team used two neural-network models from the MLPerf standard benchmark. The first was a keyword-spotting network (KWS) on the Google speech-commands dataset, which achieved a classification accuracy of 86.14%. The second was the MLPerf version of RNNT, a large data-center network, which demonstrated near-SWeq accuracy (98.1% of the base software model) and executed about 99% of the operations on the analog-AI tiles.