These questions are available free of charge and contain anonymized real-world data. The goal is to facilitate further breakthroughs like Microsoft’s human speech parity.
The data was collected from searches on Microsoft’s Bing search engine, as well as Cortana. The answers provided are written by humans, and come from web pages that have been verified.
Artificial General Intelligence
“In order to move towards artificial general intelligence, we need to take a step towards being able to read a document and understand it as well as a person,” said Rangan Majumder, partner group program manager at Bing. “This is a step in that direction.”
Now that artificial intelligence can process language correctly, the next step is understanding. According to Microsoft researcher Li Deng, that’s what MS MARCO has been tweaked towards.
“Our dataset is designed not only using real-world data but also removing such constraints so that the new-generation deep learning models can understand the data first before they answer questions,” he said.
With such power, artificial intelligence could provide answers to questions in a much faster, detailed and personalized fashion that a user could. That’s likely still a long way off, but with this move Microsoft hopes to make it happen faster.
Part of that is creating a collaborative and competitive community. MS MARCO will have a leaderboard displaying what teams are getting the best results. If it takes off, Microsoft may then launch more formal competitions.
Anybody can download the dataset, whether they’re an official researcher or not. The only limitation is commercial use. The Redmond giant hasn’t ruled this out, but says “we may provide access under certain conditions and terms.”
You can download the MS MARCO dataset yourself from the official website.