HomeWinBuzzer NewsMicrosoft Research VinVL Makes Vision-Language Breakthrough

Microsoft Research VinVL Makes Vision-Language Breakthrough

VinVL is a Microsoft Research project that brings leading image encoding in the VL space and will be a part of Azure Cognitive Services.


artificial intelligence (AI) and machine learning (ML) research spreads a wide net, such is the company's interest in the technology. In its latest breakthrough, is showing a new Vision language (VL) system called VinVL.

VinVL (Visual features in Vision-Language) is an object-attribute detection model that specializes in image encoding.

If you're unfamiliar with VL systems, they are driven by machine learning and provide a way to search images for a text query or search a text for matching image. These systems give natural language descriptions of the content within an image.

VL systems typically combine image encoding and vision language fusion. Microsoft Research says VinVL is an image encoding model that works alongside existing VL fusion modules to produce accurate image/text matching results.


For example, it topped leaderboards across a range of VL testing services, such as Microsoft's own COCO Image Captioning, Novel Object Captioning, and Visual Question Answering (VQA). Furthermore, the new model is able to better human performance on the nocaps leaderboard by a large margin.

“VinVL has demonstrated great potential in improving image encoding for VL understanding. Our newly developed image encoding model can benefit a wide range of VL tasks, as illustrated by examples in this paper. Despite the promising results we obtained, such as surpassing human performance on image captioning , our model is by no means reaching the human-level intelligence of VL understanding.

“Interesting directions of future works include: (1) further scale up the object-attribute detection pretraining by leveraging massive image classification/tagging data, and (2) extend the methods of cross-modal VL representation learning to building perception-grounded language models that can ground visual concepts in natural language and vice versa like humans do.”

Microsoft says it will fold VinVL into Azure Cognitive Services. That means it will be available to customers of the platform that works across service such as LinkedIn and Office 365. Additionally, the project will be open source and available to all dev's.

Tip of the day:

Did you know that as a admin you can restrict user accounts by disabling settings or the control panel? Our tutorial shows how to disable and enable them via Group Policy and the registry.

Luke Jones
Luke Jones
Luke has been writing about all things tech for more than five years. He is following Microsoft closely to bring you the latest news about Windows, Office, Azure, Skype, HoloLens and all the rest of their products.

Recent News