HomeWinBuzzer NewsMeta’s New ImageBind Open Sources Multisensory AI with Six Data Types

Meta’s New ImageBind Open Sources Multisensory AI with Six Data Types

Meta says its new ImageBind research program shows multimodal AI that combines text, visual, audio, movement, and temperature data.

-

Meta has debuted a new open-source AI that combines data streams such as text, visual, audio, movement, and temperature. According to the company, the model – which is called ImageBind – showcases how generative AI systems in the future will create multisensory and immersive experiences. Of course, this ties into Meta’s main goal of creating Metaverse technology.

However, Meta points out ImageBind is just a prototype and the company is not putting any date on a future launch. So, at the moment, the project is research only and does not have any application for consumers or enterprise. Even so, it is reasonable to assume the technology Meta is highlighting will become a part of future AI models.

At the heart of the research is the ability to link multiple data types into a single index that is multidimensional. This is known as embedding space and it is essentially what current multimodal generative AI models can do, such as OpenAI’s GPT-4.

DALL-E 2 is another example of an AI that can generate images and text together. Microsoft is currently using OpenAI’s DALL-E AI in its Bing Image Creator. Microsoft is also leveraging GPT-4 alongside its own AI models in Bing Chat.

ImageBind Promises a New Level of Multimodal AI Performance

ImageBind is part of Meta’s initiative to create multimodal systems that can learn from diverse types of data. The AI model not only understands one element, but is able to bind it with other features. For example, it can generate a video of a dog wearing a Gandalf outfit while balancing on a beach ball from a text description, an image prompt, or an audio prompt. It can also perform cross-modal search, multimodal arithmetic, and cross-modal generation.

Meta claims that ImageBind achieves a new state-of-the-art performance on emergent zero-shot recognition tasks across modalities, even better than prior specialist models trained specifically for those modalities. This means that the model can identify objects or concepts that it has never seen before by using its multimodal knowledge.

ImageBind could eventually lead to leaps forward in accessibility and creating mixed reality environments. For instance, a future headset could construct fully realized 3D scenes with sound, movement, and temperature on the fly. Or, virtual game developers could use it to take much of the legwork out of their design process.

Meta says that ImageBind is the first AI model capable of binding data from six modalities at once. The company hopes that by open-sourcing the model, but only for ongoing research purposes.

Meta Making AI Push Despite Metaverse Goals

Of course, Meta is the parent company of Facebook. It was launched partly to put distance between the tarnished Facebook brand and other projects the company works on. As the name suggests, the focus of Meta is the metaverse. In other words, AI has not been a driver for the company. Although, there are clear possibilities for AI and the metaverse to interact.

While the company threw everything behind the Metaverse, even losing billions of dollars, it now seems Meta may need to pivot again towards AI. Certainly, while the Metaverse is not dead (far from it) it has not become a tech boom sector in the way AI has.

According to the company’s CEO Mark Zuckerberg, a new AI chat assistant persona is in development that will come to Facebook and Instagram. In March, Meta also showed a new AI project that could allow robots to see by replicating the human visual eye cortex.

The company’s AI researchers have developed two innovations: adaptive sensorimotor coordination (ASC) and visual cortex (VC-1) i robot AI. ASC is a framework that allows robots to learn from videos of humans performing everyday tasks and then adapt their actions to different environments and embodiments.

VC-1 is a perception model that is compatible with a wide range of sensorimotor skills, environments and embodiments.

Tip of the day: File History is a Windows back up feature that saves each version of files in the Documents, Pictures, Videos, Desktop, and Offline OneDrive folders. Though its name implies a primary focus on version control, you can actually use it as a fully-fledged backup tool for your important documents.

SourceFacebook
Luke Jones
Luke Jones
Luke has been writing about all things tech for more than five years. He is following Microsoft closely to bring you the latest news about Windows, Office, Azure, Skype, HoloLens and all the rest of their products.

Recent News