Last month, we reported on a Microsoft Research project called Codenamed Project Brainwave. Showcased at Hot Chips 2017, the platform is a deep learning acceleration solution that the company describes as “a major leap forward”. Designed for cloud-based deep learning models, Brainwave uses AI to deliver extended performance and flexibility.
Microsoft says Project Brainwave will be fully showcased at Hot Chips 2017. We have previously discussed this project and its ability to deliver real-time AI. By leveraging artificial intelligence in this way, systems can process requests as they are received.
The company says Brainwave was built on the following three layers:
- A high-performance, distributed system architecture;
- A hardware DNN engine synthesized onto FPGAs; and
- A compiler and runtime for low-friction deployment of trained models.
Microsoft worked with the FPGA infrastructure, something the company has developed in recent years. Redmond has vocally advertised the potential of FPGAs for machine learning and parallel computing. Chips are easily programmable and efficient, making them ideal for deep learning AI.
With Project Brainwave, researchers wanted to extend the compatibility. The team synthesized DPU or DNN processing units into its FPGA chips:
“Although some of these chips have high peak performance, they must choose their operators and data types at design time, which limits their flexibility. Project Brainwave takes a different approach, providing a design that scales across a range of data types, with the desired data type being a synthesis-time decision.”
Project Brainwave also uses a software stack that was designed to support numerous deep learning frameworks. For example, it already supports Microsoft’s own Cognitive Toolkit and Google’s Tensorflow frameworks. The company says more support for other frameworks will be added.
Hot Chips Demo
At Hot Chips last month, Microsoft’s Eric Chung and Jeremey Flows showed how Brainwave works when ported to Intel’s new 14 nm Stratix 10 FPGA.
The new system achieved “record-setting” performance on early Stratix 10 silicon, with no batching. In the demo, Microsoft used its own 8-bit floating point format. This allows for no accuracy losses across models.
Microsoft said its benchmarking showed the platform can sustain 39.5 Teraflops without the need for batching on a large gated recurrent. In its early tests, Microsoft was using the system on Intel Stratix 10 FPGAs.
With this capability, Brainwave architecture sustained execution of over 130,000 compute operations per cycle. Microsoft describes the real-time AI performance as “unprecedented”, even on models that are deemed “extremely challenging”.
Work is already underway to bring Brainwave’s real-time AI capabilities to the Azure cloud platform. This would allow customers to use the project directly in services, such as Bing. Details on when the platform is available will be made official in the coming months.