HomeWinBuzzer NewsGoogle DeepMind Uses Gemini AI to Boost Robotic Navigation Abilities

Google DeepMind Uses Gemini AI to Boost Robotic Navigation Abilities

Google is using Gemini 1.5 Pro to improve robot navigation and complex tasks. Robots learn from walkthrough videos and understand user commands in natural language.

-

Google DeepMind has introduced a new milestone in robotic intelligence with the incorporation of Gemini AI, significantly improving the robots’ navigation and complex task performance. This achievement, detailed in a research paper by DeepMind’s robotics team, underscores the importance of Gemini 1.5 Pro’s expansive context window in enabling natural language interactions with RT-2 robots.

Training via Multimodal Instruction Navigation

The system is trained through a method called “Multimodal Instruction Navigation with demonstration Tours (MINT).” This training involves manually guiding the robot through environments such as homes or offices, or using a smartphone to record a walkthrough. The robots learn to “watch” these videos to understand their surroundings and respond appropriately. For instance, a robot can locate a charging point for a phone shown to it. The study reports a 90 percent success rate for over 50 user instructions in a space exceeding 9,000 square feet.

A hierarchical Vision-Language-Action (VLA) navigation policy has been implemented to enable robots to understand both physical spaces and common sense reasoning. This policy helps the AI interpret user commands and navigate accordingly. The AI constructs a topological map by matching visual inputs from its cameras to frames from the demonstration video. This method achieves end-to-end success rates of 86 percent and 90 percent in navigating complex tasks.

Advanced Task Execution

Gemini 1.5 Pro enhances the robots’ ability to execute more nuanced tasks. For instance, if a user surrounded by Coke cans wants to know if there’s any Coke left in the fridge, the robot can check the fridge and report back. This marks a substantial step forward in robotic planning and task performance.

Despite these advancements, processing each instruction takes between 10 to 30 seconds, indicating potential for further optimization. The Google team aims to refine these capabilities for even better performance in the future. While widespread adoption of these advanced robots in homes is still some time away, the current progress suggests they could soon assist in everyday activities like locating keys or wallets.

Real-World Application and Command Testing

In extensive real-world testing, commands such as “Take me to the conference room with the double doors,” “Where can I borrow some hand sanitizer,” and “I want to store something out of sight from public view. Where should I go?” were used to evaluate the robots’ practical abilities. These tests demonstrated their competence in handling intricate reasoning and multimodal user commands effectively.

Last Updated on November 7, 2024 3:37 pm CET

SourcearXiv
Luke Jones
Luke Jones
Luke has been writing about Microsoft and the wider tech industry for over 10 years. With a degree in creative and professional writing, Luke looks for the interesting spin when covering AI, Windows, Xbox, and more.

Recent News

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
0
We would love to hear your opinion! Please comment below.x
()
x
Mastodon