Microsoft has publicly released data from the Microsoft Translator service. More specifically, conversation data from bilingual speakers talking in French, German and English has been released as open. The company says it is sending out the data as part of ongoing efforts to improve artificial intelligence (AI) accuracy.
The announcement was made by Christian Federmann, senior program manager for Microsoft Translator. The speech translation corpus was created by Microsoft and involved a number of bilingual speakers.
Federmann says the company created the data in an effort to form a standard for organizations and customers. This standard would measure how well their conversational speech translation systems function. It would work as a base data set for services such as Microsoft Translator live and Skype Translator.
With the standardized data, this service and others can test bilingual conversation speech translation systems. Federmann points out that not many of these standardized sets for bilingual speech exist because “You need high-quality data in order to have high-quality testing.”
Microsoft has made the corpus available for free. The company hopes it will boost conversational translations services. Another hope is that the release will push others to create benchmarks for standards.
“This helps propel the field forward,” said Will Lewis, a principal technical program manager with the Microsoft Translator team who also worked on the project. Speech Language Translation corpus can be downloaded here.
Microsoft Translator and Bilingual Development
Microsoft released an update for its Translator app in August, which added bilingual dictionaries and phrasebooks. The dictionaries will help users learn new languages. Microsoft explained at the time how the feature works:
“Let's say you are translating the sentence, ‘That's great!' into French. The English word ‘great' could mean many things– excellent, glorious, large, etc. so there might be several different ways you could translate it. Using the Bilingual Dictionary, you could quickly see a list of alternative translations to the word ‘great' to come up with the perfect way to say exactly what you mean. ‘C'est super!'”