Microsoft has been making consistent breakthroughs in speech recognition technology. In 2016, it reached human parity – meaning it can record it as well as humans. In 2017, it did the same for conversational speech, and this year Xiaoice AI began to detect natural pauses.
However, these tests were done in a perfect environment. In the real world, we have to deal with noise from traffic, televisions, music, and more. Microsoft's new Speech Devices SDK plans to help with that.
The SDK allows is paired with specific microphone-enabled hardware, implementing features like noise cancellation, far-field capabilities, and echo cancellation. The idea is to encourage developers to build systems like car assistants and drive-thru systems while integrating with the cloud-based Microsoft Speech service.
Project Kinect and More
The SDK was announced at Build 2018 on Monday and will compete with SDKs from Amazon and Google. However, Microsoft is pairing it with updates to its cloud services across the board, from the Azure App Service to the now open source Azure IoT Edge.
Microsoft will also be supporting AI-driven services via Project Kinect on Azure. The next-generation depth-sensing camera combines with Azure's AI services for use in hand tracking, spatial mapping, and more.
In combination, the services are starting to look like a very robust solution for developers, and Microsoft is also offering financial support. The AI for Accessibility program has pledged $25 million to developers over five years to create disability-focused tools.
The project is partly inspired by Seeing AI, which uses machine learning to help blind users detect the objects around them. With enhanced speech recognition, depth-sensing cameras, and more, it's not difficult to imagine an even better solution.
You can read more about the Speech Devices SDK on the Azure site.