Microsoft has opened the COVID-19 Open Research Dataset (CORD-19), alongside the Allen Institute for AI, the National Library of Medicine (NLM), Georgetown University, Kaggle, the Chan Zuckerberg Initiative, and the White House Office of Science and Technology Policy (OSTP).
CORD-19 is a dataset of scientific articles focused on the COVID-19 (coronavirus) pandemic that has brought the world to a standstill. Researchers around the globe can access the collection of over 29,000 scholarly articles.
“The motivation behind the CORD-19 effort is to make research and discovery more efficient—and to accelerate progress toward solutions to the pandemic.”
Microsoft says 13,000 articles have full text and are machine-readable, which means AI tools can analyze the data. By allowing AI access, the group hopes machine learning can drive new computing methods to help scientists learn more about COVID-19.
Growing Dataset
For its part in CORD-19, Microsoft provided mapping and indexing of thousands of global articles. The company will continue add more articles to the index to help global research efforts. Microsoft aims to create a pool of information that shows all there is to know about coronavirus.
“A key aspect of aggregating scientific literature into a valuable unified data resource is gaining access to the full content of articles—including permissions to analyze the content with computational tools. Many medical articles are tucked behind paywalls.
“Even when text is made available, publishers may not provide researchers with the rights to perform machine analysis and datamining. Much has been going on behind the scenes to open up the literature on the coronavirus family and on COVID-19 to create this kind of machine-readable resource.”