HomeWinBuzzer NewsMeta and Nvidia Unveil Segment Anything Model 2 Video AI Tool

Meta and Nvidia Unveil Segment Anything Model 2 Video AI Tool

Meta unveils Segment Anything Model 2, a machine learning model for video segmentation that can identify and track objects in videos in real-time.

-

Meta has unveiled a new machine learning model, Segment Anything Model 2 (SAM2), aimed at improving video segmentation capabilities. This announcement came from Meta CEO Mark Zuckerberg at SIGGRAPH, where he was joined by Nvidia’s CEO, Jensen Huang.

Enhancements in Video Segmentation

Building on its predecessor, which focused on still images, SA2 extends these capabilities to video. The original model efficiently identified and outlined objects within images. The new iteration aims to replicate this effectiveness for video frames, a considerably more resource-demanding task. Zuckerberg highlighted potential uses in scientific research, such as studying coral reefs and natural environments. The model employs zero-shot learning, enabling the identification of objects without prior examples.

Handling video data is resource-intensive. SA2 aims to address these demands without straining data centers. Meta plans to make SA2 freely available, echoing the policy of the original model, and has already released a demo for public use. Additionally, Meta is offering an annotated database with 50,000 videos used to train SA2. Another database exceeding 100,000 videos was also used but will not be publicly shared. Meta has been contacted for further specifics about these sources and the decision to withhold part of the dataset.

Meta’s Open AI Strategy

Meta has cemented its status in the open AI space with tools like PyTorch and models such as LLaMa and Segment Anything. Zuckerberg explained that the aim of open-sourcing these models is strategic: to establish an ecosystem that improves their effectiveness, rather than for altruistic purposes.

The initial “Segment Anything Model” (SAM) was introduced in April 2023 as a foundational model for image segmentation, receiving wide acclaim in computer vision circles. SAM 2 was trained using the new SA-V dataset, which is the largest publicly available dataset for video segmentation. The SA-V dataset includes 50,900 videos with 642,600 mask annotations, totaling 35.5 million individual masks – 53 times more than existing datasets. With nearly 200 hours of annotated video content, SA-V establishes a new standard for training data.

Technical Features

SA2 employs a Transformer-based architecture, incorporating a memory module that retains information about objects and prior interactions across video frames. This feature allows the model to track objects through extended sequences and respond to user input. When applied to still images, the memory module is inactive, and the model functions similarly to its predecessor.

During testing, SA2 showed improved segmentation accuracy with three times fewer user interactions compared to previous methods. Meta reports that the model surpasses current benchmarks for video object segmentation and also performs better in image segmentation tasks than the original SAM model, achieving six times the speed. The inference speed is 44 frames per second, closely approaching real-time performance.

However, SA2 has its limitations. It can lose track of objects after scene cuts or long occlusions, struggle with very fine details, and face difficulties with tracking individual objects in groups of similar, moving entities. Researchers suggest that explicit motion modeling could help address these issues.

Last Updated on November 7, 2024 3:28 pm CET

SourceMeta
Luke Jones
Luke Jones
Luke has been writing about Microsoft and the wider tech industry for over 10 years. With a degree in creative and professional writing, Luke looks for the interesting spin when covering AI, Windows, Xbox, and more.

Recent News

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
0
We would love to hear your opinion! Please comment below.x
()
x
Mastodon