HomeWinBuzzer NewsMicrosoft Releases MS MARCO, an Anonymous Real-World Dataset to Help Train AI

Microsoft Releases MS MARCO, an Anonymous Real-World Dataset to Help Train AI

The MS MARCO dataset contains 100,000 questions and answers and is based on real Bing and Cortana searches. With it, Microsoft hopes to encourage further innovation in AI.


is committed to AI, and more specifically, opening it to everyone. As part of that goal, the company has released MS MARCO, a set of 100,000 questions and answers for AI researchers.

These questions are available free of charge and contain anonymized real-world data. The goal is to facilitate further breakthroughs like Microsoft's human speech parity.

The data was collected from searches on Microsoft's Bing search engine, as well as Cortana. The answers provided are written by humans, and come from web pages that have been verified.

Artificial General Intelligence

“In order to move towards artificial general intelligence, we need to take a step towards being able to read a document and understand it as well as a person,” said Rangan Majumder, partner group program manager at Bing. “This is a step in that direction.”

Now that artificial intelligence can process language correctly, the next step is understanding. According to Microsoft researcher Li Deng, that's what MS MARCO has been tweaked towards.

“Our dataset is designed not only using real-world data but also removing such constraints so that the new-generation deep learning models can understand the data first before they answer questions,” he said.

With such power, artificial intelligence could provide answers to questions in a much faster, detailed and personalized fashion that a user could. That's likely still a long way off, but with this move Microsoft hopes to make it happen faster.

Part of that is creating a collaborative and competitive community. MS MARCO will have a leaderboard displaying what teams are getting the best results. If it takes off, Microsoft may then launch more formal competitions.

Anybody can download the dataset, whether they're an official researcher or not. The only limitation is commercial use. The Redmond giant hasn't ruled this out, but says “we may provide access under certain conditions and terms.”

You can download the MS MARCO dataset yourself from the official website.

Ryan Maskell
Ryan Maskellhttps://ryanmaskell.co.uk
Ryan has had a passion for gaming and technology since early childhood. Fusing the skills from his Creative Writing and Publishing degree with profound technical knowledge, he enjoys covering news about Microsoft. As an avid writer, he is also working on his debut novel.

Recent News