
Microsoft took to the stage at the Supercomputing 2019 conference to announce the availability of NDv2, which Nvidia calls the”World's largest GPU-accelerated cloud-based supercomputer”. Available via Azure, NDv2 can scale up to 800 Nvidia Tensor Core GPUs, which are interconnected with Mellanox InfiniBand. Interested parties can rent an entire supercomputer to meet their needs, with extreme benefits to speed. With a pre-release version of the Cluster, Microsoft and Nvidia managed to train conversational AI model BERT in around three hours. That doesn't beat Nvidia's previous record of 53 minutes, but at 8.3 billion parameters, consumers are still getting very good speeds. Here are the base specs of a single NDv2 VM:
- 8 Nvidia Tesla V10 NVLink GPUs (32GB HBM2 memory each)
- Intel Xeon Platinum 8168 processor with 40 non-hyperthreaded cores
- 672 GiB memory