Google DeepMind has developed new training methods aimed at improving the performance and versatility of robots. The groundbreaking research, unveiled in a recent blog post, involves large foundational models and techniques that boost robots’ situational awareness and task management. With such advancements, robots are no longer confined to repetitive single-purpose functions but are now stepping into roles that require adapting to new environments and tasks with less human programming.
AutoRT: Synchronized Robots Key to Task Implementation
DeepMind’s newly introduced AutoRT system utilizes Visual Language Models (VLM) and Large Language Models (LLMs) to create robots that better grasp what humans expect from them. VLMs grant the robots increased awareness of their surroundings, while LLMs offer suggestions for tasks, considering the capabilities of the robots, including any attached devices or end effectors. The AutoRT system has proved its efficacy, coordinating up to 20 robots and handling 52 devices simultaneously. Over recent months, DeepMind has conducted 77,000 trials with more than 6,000 distinct tasks, demonstrating the system’s robust performance and potential for scalability.
RT-Trajectory: Visual Learning Elevates Robot Competency
Complementing AutoRT, DeepMind has also introduced RT-Trajectory, which is a unique approach to robotic learning that employs video input. Unlike typical methods that use YouTube videos, RT-Trajectory adds a layer of complexity by overlaying a two-dimensional sketch of the robot arm’s movements onto the video feed. The sketches serve as visual cues that guide the robot’s learning process, furnishing it with practical and low-level hints to refine its control policies.
The approach has notably outperformed its predecessor, RT-2, in trials, doubling the success rate while testing on 41 tasks. At 63% success compared to the previous 29%, RT-Trajectory demonstrates how underutilized robotic-motion information within existing datasets can be effectively harnessed to bolster robots’ adaptability and efficiency in new environments.
The accomplishments of Google DeepMind underline the rapid progression in the field of robotics and artificial intelligence. In October, Google DeepMind and 33 other research institutions have unveiled an ambitious project, known as Open-X Embodiment. The goal of this initiative is to solve the conundrum of having to train machine learning models for each robot, task, and environment separately ― a process that often requires a substantial amount of time and effort.
The research group’s ability to intertwine large foundational models with advanced visual recognition offers a glimpse into a future where robots can comprehend and execute complex commands with nuanced understanding, substantially reducing the dependence on intricate coding. The continuous evolvement of such technologies promises considerable impacts on various industrial and consumer applications, from manufacturing to service robots, paving the way for more intuitive and capable automated systems.
Last Updated on January 2, 2025 1:55 pm CET