Microsoft Project Alexandria is not new, but it has reached a new milestone recently. The research project was launched in 2014, through Microsoft's Cambridge research center. This year, Project Alexandria is becoming more useful as it provides the underpinning tech of Microsoft Viva Topics. Microsoft is now talking about the ongoing development of Alexandria, past, present, and future.
If you are unfamiliar with Project Alexandria, it is designed to find topics of information and entities that are associated with those topics. It can gather a complete understanding from a set of documents and create a knowledge base for users.
It provides the backbone of Microsoft Viva Topics. Viva Topics is part of the overall Viva employee experience, which also includes Viva Connections, Viva Learning, and Viva Insights. It is a component of the platform that automatically takes large content sets and organizes them.
Alexandria researchers handle identifying the metadata and topics from Viva Topics to parse content into separate datasets. To achieve this, Microsoft uses the AI software that underpins Project Alexandria.
Speaking to VentureBeat, with Viva Topics director of product development Naomi Moneypenny, Alexandria project lead John Winn, and Alexandria engineering manager Yordan Zaykov discussed Alexandria, Viva Topics, and how the two integrate.
Enterprises often go through huge amounts of information, so being able to take parts of data from large sets can be challenging. Searching for such information takes both time and resources from organizations. Project Alexandria through Viva Topics helps solve this problem by finding topics in documents and keeping those topics even when documents are updated.
“When I started this work, machine learning was mainly applied to arrays of numbers — images, audio. I was interested in applying machine learning to more structured things: collections, strings, and objects with types and properties,” Winn points out. “Such machine learning is very well suited to knowledge mining, since knowledge itself has a rich and complex structure. It is very important to capture this structure in order to represent the world accurately and meet the expectations of our users.”
Microsoft leverages probabilistic programming to mine and link topics for document sets. This is an AI system that highlights how topics and properties are referenced within a set of documents. Running this AI in reverse allows topics to be extracted from the documents sets. Zaykov says Alexandria has come a long way since it was launched 7 years ago.
“A lot of progress has been made in the project since its founding. In terms of machine learning capabilities, we built numerous statistical types to allow for extracting and representing a large number of entities and properties, such as the name of a project, or the date of an event.
“We also developed a rigorous conflation algorithm to confidently determine whether the information retrieved from different sources refers to the same entity. As to engineering advancements, we had to scale up the system — parallelize the algorithms and distribute them across machines, so that they can operate on truly big data, such as all the documents of an organization or even the entire web.”
Tip of the day: Is your system drive constantly full and you need to free up space regularly? Try Windows 10 Disk Cleanup in extended mode which goes far beyond the standard procedure. Our tutorial also shows you how to create a desktop shortcut to run this advanced method right from the desktop.