Microsoft Research, along with the University of Washington’s Paul G. Allen School of Computer Science & Engineering and Providence, has introduced BiomedParse, a new AI model designed to enhance how medical images are analyzed.
BiomedParse integrates object recognition, detection, and segmentation, enabling medical professionals to conduct analyses with greater efficiency and fewer manual steps. The innovation comes as the latest step in Microsoft’s expanding AI-driven healthcare initiatives.
Bridging the Gap Between Recognition and Segmentation
In the mid-2000s, researchers proposed a unified approach to image analysis that combined recognition, detection, and segmentation. However, technology constraints left it largely theoretical.
BiomedParse makes this concept practical by allowing users to input a simple natural-language prompt to outline and label objects directly on images. Unlike earlier models such as MedSAM, which are restricted to segmentation alone, BiomedParse handles all tasks in one workflow, simplifying complex procedures for medical practitioners.
MedSAM, short for Medical Segment Anything Model, is a deep learning-based foundation model designed specifically for medical image segmentation. It builds upon the general-purpose Segment Anything Model (SAM) from Meta AI and adapts it to the medical domain, addressing the need for accurate and versatile segmentation across various medical imaging modalities.
BiomedParse can be integrated into advanced multimodal frameworks such as Microsoft´s LLaVA-Med (Large Language and Vision Assistant for Biomedicine), a multimodal AI model designed to assist with biomedical visual and language tasks, to facilitate conversational image analysis.
GPT-4 for Data Creation
One of the main challenges in developing comprehensive image analysis tools is the lack of extensive datasets that cover various tasks cohesively. To address this, Microsoft used OpenAI´s GPT-4 to synthesize data from 45 existing segmentation datasets.
This resulted in over six million sets of annotated images, masks, and text descriptions, covering 64 primary object types and 82 subcategories across nine imaging methods. This massive dataset strengthens BiomedParse’s capability to manage diverse medical image analysis scenarios.
Outperforming the Competition
BiomedParse was tested on over 102,000 image-mask-label combinations, consistently outperforming existing models like MedSAM and SAM, even when paired with advanced object detectors like Grounding DINO.
The model showed an advantage of 75-85 points in dice score—a critical metric for measuring segmentation precision. This strength was especially evident with objects that had complex, irregular shapes, showcasing the benefits of integrated learning.
Comparison of Biomedparse on large-scale biomedical image segmentation datasets. (Source: Microsoft)
Background: GigaPath’s Pathology Analysis
Earlier in 2024, Microsoft had already showcased its interest in advancing medical imaging with the release of GigaPath. Launched in May, GigaPath addressed the analysis of gigapixel pathology images, which are essential for studying detailed tissue samples.
Digital pathology involves converting glass slides into digital images, making analysis more scalable. GigaPath used vision transformer (ViT) architecture with dilated self-attention to process large-scale images efficiently and was developed in collaboration with Providence Health System and the University of Washington.
GigaPath’s training included data from over 170,000 whole-slide images and applied a two-stage curriculum: Meta’s self-supervised vision transformer model DINOv2 for tile-level pretraining and Microsoft´s own LongNet for slide-level modeling. This approach allowed GigaPath to excel in 18 out of 26 tasks related to cancer subtyping and pathomics, which focuses on genetic markers of tumors.
Implications for Precision Medicine
BiomedParse and GigaPath support precision medicine, where treatment is tailored based on an individual’s genetic profile. GigaPath laid the foundation with its ability to interpret pathology slides for cancer subtyping, while BiomedParse extends this by integrating different image modalities.
Despite their promise, deploying such models in clinical environments presents hurdles like data privacy, model accuracy across varied conditions, and ensuring adherence to regulatory standards.
BiomedParse’s modular architecture hints at future updates that could encompass additional imaging types and integration with tools like LLaVA-Med, which allows for interactive analysis of medical images. Microsoft’s open-source release of BiomedParse under the Apache 2.0 license, combined with deployment on Azure AI, makes BiomedParse available for medical researchers.