In a major development for the field of robotics, Google DeepMind and 33 other research institutions have unveiled an ambitious project, known as Open-X Embodiment. The goal of this initiative is to solve the conundrum of having to train machine learning models for each robot, task, and environment separately ― a process that often requires a substantial amount of time and effort.
As Pannag Sanketi, Senior Staff Software Engineer at Google Robotics told VentureBeat, robots have so far excelled at specialization but struggled with generalization. If an individual variable changes in the task, robot, or environment, the training often has to begin anew.
Open-X Embodiment seeks to revolutionize this situation by developing a general-purpose AI system that streamlines the training process for different types of robots across numerous tasks. This novel approach introduces a dataset comprising multi-robot data, alongside a suite of models that can transfer skills across an array of tasks.
Combining Robotics Data
At the core of the Open X-Embodiment project is the idea that a generalized model, built leveraging data from a plethora of robots and tasks, can significantly outperform specialized models. This revolutionary concept draws parallels from large language models (LLMs) which, when trained on extensive, general datasets, have been observed to outstrip more compact models trained on specific, task-centric datasets.
In order to facilitate this, a diverse dataset was curated by the research team, collecting data from 22 robot embodiments at 20 institutions spanning several countries. The dataset incorporates examples of more than 500 skills and 150,000 tasks throughout over a million episodes. These innovative models are built on the transformative architecture used in large language models – the transformer.
Surpassing Specialist Models
Models RT-1-X and RT-2-X, based on the Robotic Transformer frameworks, were tested across various tasks and robots. The results were indeed remarkable. The generalized model, RT-1-X, displayed a 50% higher success rate against tasks compared to the specialist models designed for individual robots. This indicates the superiority of models trained on an extensive set of examples over specialized ones.
The RT-2-X model continued to exhibit impressive performance, executing emergent skills that were not included in the training dataset. The results suggest that combining data from various platforms equips the model with additional skills not initially present, thereby enabling better performance in unfamiliar tasks.
Using Open X-Embodiment's findings, the team hopes to further research, combining insights from other models like DeepMind's self-improving RoboCat model, and exploring how these generalized models can be improved further. They have open-sourced the dataset and a small version of the RT-1-X model, and believe these tools will greatly expedite robotics research, ushering in a new era for the field.