Microsoft Research has been one of the leaders in developing machine learning techniques and artificial intelligence. The division’s latest advancement is an AI driven system that learned to be a master at Mc. Pac-Man. Using a divide-and-conquer method, the system achieved the maximum score on the addictive 1980s gaming classic.
In a blog post, Microsoft Research says its divide-and-conquer technique could have wider implications. For example, it could be used to teach AI to do more complex tasks.
The research team worked out of Maluuba, a Canadian-based start-up acquired by Microsoft this year. Using an AI called reinforcement learning, the team pitted its system against the Atari 2600 version of Ms. Pac-Man.
Not only did the system complete the game, but it achieved the perfect score of 999,990. While this does not have wide implications for end users, it is huge for AI development. Doina Precup, associate professor of computer science at McGill University in Montreal says Ms. Pac Man is the most difficult game to defeat.
The new AI system was able to achieve the high score thanks to the Microsoft Research team dividing tasks into smaller pieces.
“This idea of having them work on different pieces to achieve a common goal is very interesting,” Precup said.
It is an exciting advancement in AI as it is closer to mimicking how the brain works, by compartmentalizing tasks. Percup says this is another step towards creating an AI with a more general intelligence.
Maluuba calls its system Hybrid Reward Architecture. It uses 150 AI agents working in parallel with each other, but on separate tasks to win the game. For example, some AI was designated to help the system find pellets, while other agents pushed the system to avoid ghosts.
A top agent was created, which acted as a sort of manager AI. This would take information from all agents and then decide where to move Ms. Pac-Man.
Harm Van Seijen, a research manager with Maluuba who is the lead author of a new paper about the achievement, said the best results were achieved when each agent acted very egotistically – for example, focused only on the best way to get to its pellet – while the top agent decided how to use the information from each agent to make the best move for everyone.
“There’s this nice interplay between how they have to, on the one hand, cooperate based on the preferences of all the agents, but at the same time each agent cares only about one particular problem,” says Harm Van Seijen, a Maluuba research manager. “It benefits the whole.”