According to Microsoft speech scientist Xuedong Huang, the data backs this up. A benchmark evaluation shows the Redmond giant's speech recognition has their lowest error rate to date.
When compared to the rest of the industry, Microsoft's speech recognition also comes out ahead. The system achieved an error rate of 6.9%, the best performance in its section. Furthermore, the system was able to score 6.3% with the help of “an ensemble of acoustic models.”
Twenty years ago, the lowest error rate was 43%. At the Interspeech conference in San Fransisco this weekend, IBM reported a rate of 6.6%, 0.3 under Microsoft.
“The research team we've assembled brings to bear a century of industrial speech R&D experience to push the state of the art in speech recognition,” says Geoffrey Zweig, manager of Microsoft's Speech and Dialogue Research group.
How did they do it?
According to a Microsoft spokesperson, the leap for both companies is a result of deep neural networks. The processes are incredibly complex, inspired by the biological processes of the brain.
Microsoft's technology in this regard is ahead of the competition. The company's neural system utilizes a deep residual neural net system which won the ImageNet computer vision challenge for its use of a new type of cross-layer network connection.
The second component to the advances is Microsoft's Computational Network Toolkit. As put by Richard Eckel, Microsoft's tech industry communications professional:
“CNTK implements sophisticated optimizations that enable deep learning algorithms to run an order of magnitude faster than before. A key step forward was a breakthrough for parallel training on graphics processing units, or GPUs.”
Microsoft uses GPU clusters in combination with CNTK to process complex algorithms incredibly quickly. Thanks to the advance, Cortana can ingest ten times more data in the same amount of time.
The tests mark a noteworthy milestone for Microsoft's overall quest to deliver intelligent AI solutions. The company has stated time and time again that it plans to lead the machine learning, artificial technology, and speech recognition industries, and this puts them one step closer.
It may not be long before computers can understand the words users are saying as well as humans, leading to a revolution in the computing industry. As Nadella has said previously, the shift could have the same impact as the invention of the GUI, or even the web itself.