HomeWinBuzzer NewsIBM Beats Microsoft’s Speech Recognition Accuracy Record

IBM Beats Microsoft’s Speech Recognition Accuracy Record

The Armonk-based tech giant claims it has achieved a 5.5% word error rate, beating Microsoft´s previous record of 5.9%.


Last October, Microsoft revealed that its Speech Recognition technology achieved a 5.9% word error rate (WER), setting a new world record. Now, IBM has managed to break that record, announcing in a blog post that it has achieved a 5.5% WER.

Word error rate is a common metric of the performance of a or machine translation system. The difficulty of measuring performance lies in the fact that the recognized word sequence can have a different length from the reference word sequence; i.e. supposedly the correct one.

How IMB did it

According to IBM's blog post, the company has reached 5.5% word error rate by combining Long Short-Term Memory (LSTM) and WaveNet language models.

LSTM is a recurrent neural network architecture (an artificial neural network), which can compute anything a conventional computer can compute. An LSTM network is well-suited to learn from experience to classify, process and predict time series.

WaveNet is a deep generative model of raw audio waveforms created by DeepMind Technologies. WaveNet is able to generate speech which mimics any human voice. According to DeepMind, WaveNet's speech sounds more natural than the best existing systems.

Microsoft vs. IBM Speech Recognition

The noble competition between and IBM in the Speech Recognition field is a long-standing one. Both companies have managed some impressive breakthrough at Speech Recognition, beating each others' world records over the last couple of months.

Back in September 2016, Microsoft announced it achieved a 6.3% word error rate, beating IBM's 6.9% WER. As mentioned before, in October Microsoft beat its own world record with a 5.9% word error rate, and now IBM has claimed the world record once more.

The “human parity” debate

Despite the fact that both Microsoft and IBM compete for the best WER, the companies' views on reaching human parity are different. Reaching human parity – meaning an error rate on par with that of two humans speaking- has always been the ultimate industry goal.

Back in October, when Microsoft achieved a 5.9% WER, the company claimed to have reached human parity in conversational speech recognition. “We've reached human parity,” said Xuedong Huang, Microsoft's chief speech scientist. “This is a historic achievement,”, he added.

However, IBM claims to have determined human parity is lower than what anyone has yet achieved – at 5.1% WER. George Saon, an IBM Principal Research Scientist, says in the announcement blog post that “Others in the industry are chasing this milestone alongside [IBM], and some have recently claimed reaching 5.9 percent as equivalent to human parity…but [IBM is] not popping the champagne yet.

Kostas Papanikolaou
Kostas Papanikolaou
Kostas is a former sports journalist and an amateur gamer. Combining his love for technology with his writing experience, he enjoys covering news about Microsoft. Being an artistic “soul”, he is also writing poems and short stories.

Recent News