Microsoft is preparing a wider release of Copilot Vision, an AI tool that integrates directly into the Edge browser, aiming to change how people interact with the web. Teased initially through Copilot Labs in October, the assistant goes beyond simple chatbot features by understanding both text and images on a user’s screen.
Imagine it guiding you through a complex comparison of travel destinations, making recommendations without opening extra tabs. The company emphasizes that Copilot Vision operates under stringent privacy controls, deleting all session data upon exit to prevent information misuse.
Contextual AI for Effortless Web Assistance
Unlike traditional AI chat models, Copilot Vision provides insights based on what it sees, whether you’re shopping for tech gadgets or planning meals. Need to replace an ingredient while following a recipe? The AI can suggest suitable alternatives. It’s designed to observe, not intrude, activating only with explicit user permission. Furthermore, it abides by content restrictions, steering clear of pages behind paywalls or those marked private by website owners. This “assist and observe” model prioritizes ethical AI use and respects digital property rights.
Launched in October 2024, Copilot Labs serves as Microsoft’s controlled environment for testing new AI tools for Microsoft Copilot. It’s where Copilot Vision and other innovations like Think Deeper are fine-tuned using user feedback. Think Deeper, available to Copilot Pro subscribers via a ‘Think Deeper’ button while using Copilot, addresses complex queries such as advanced math problems or financial strategy, operating within limits to ensure stable performance in select regions, including the US and the UK. By experimenting in this sandbox, Microsoft collects real-world data to refine its AI features before a broader release.
Building on Prior AI Milestones
Microsoft has been investing in vision AI for quite a wile now, exemplified by the Florence-2 model announced in June 2024. Florence-2 is a general-purpose vision-language model designed for tasks like object detection and segmentation. It uses a prompt-based approach, excelling even against larger models like Google DeepMind’s Flamingo visual language model. Training involved a diverse set of over 5 billion image-text pairs, spanning multiple languages to enhance its versatility. Researchers claimed the model achieves notable improvements in efficiency and accuracy, benefiting applications from semantic segmentation to visual grounding.
Another important Mileston for Microsoft was the release of the GigaPath AI Vision Model in May, focused on digital pathology. Developed in partnership with the University of Washington and Providence Health System, this model uses advanced self-supervised learning to analyze gigapixel pathology slides. GigaPath has already shown superior results in tasks like cancer subtyping and tumor analysis, validated using data from initiatives like the Cancer Genome Atlas. GigaPath marks a critical step in precision medicine, enabling better analysis of diseases based on genetic characteristics.
AI Struggles: New Studies Raise Red Flags
Not all AI models live up to expectations. A recent study from October exposed serious flaws in vision-language models like OpenAI’s GPT-4o, which performed poorly on Bongard problems—visual puzzles requiring the identification of basic patterns. In tests, GPT-4o solved just 21% of open-ended questions and showed only modest improvement in multiple-choice formats. Researchers highlighted that these shortcomings reveal deeper issues with current models’ ability to generalize and apply visual reasoning.
Similar struggles have been seen in AI transcription. OpenAI’s Whisper, for instance, has been criticized for “hallucinating” phrases, which becomes a major issue in settings like healthcare. A June study from Cornell University reported that Whisper’s hallucination rate was over 1%, a risk in environments where transcription errors could be harmful. Privacy concerns are also prevalent, given that Whisper deletes original audio files post-transcription, leaving no way to verify accuracy.
Competing in a Crowded AI Market
Microsoft’s advancements come as tech giants like Google, Meta, and OpenAI continue to refine their AI models. Yet, with features like Copilot Vision, Microsoft hopes to gain an edge by prioritizing privacy and real-time functionality. The competition is intense, and each player is pushing boundaries in unique ways.
Last Updated on November 7, 2024 2:13 pm CET