The trend of AI agents moving beyond chat to actively doing things on a computer takes another step forward. Microsoft this week began previewing a “computer use” function within its Copilot Studio low-code platform, designed to let businesses build AI assistants that can navigate and operate both websites and traditional desktop applications. These agents work by simulating human actions—clicking buttons, typing into fields, selecting menus—aiming to automate tasks even on systems that lack modern programming interfaces for direct integration. Unlike the more limited ‘Actions’ feature in the consumer version of Copilot, this Copilot Studio capability targets broader enterprise automation scenarios.
Navigating the Competitive Landscape
Microsoft isn’t the first to give AI agents control over computer interfaces. Anthropic made waves in October 2024 by introducing a feature with the exact same name, “Computer Use,” for its Claude 3.5 Sonnet model, allowing it to manage desktop tasks.
OpenAI followed with it’s Operator agent in January, though it operates with more direct user oversight, requiring approval before executing tasks. Google is also known to be developing similar capabilities under the name Project Mariner. Microsoft’s entry, housed within its Copilot Studio tool (which integrates with the Power Platform), targets both web (supporting Edge, Chrome, and Firefox browsers according to the official blog) and desktop environments, potentially offering broader automation scope than Operator, running directly on Microsoft’s cloud infrastructure.
An AI Approach to Interface Automation
The core problem Microsoft aims to address is automating interactions with software that wasn’t built for easy machine control. “If a person can use the app, the agent can too,” stated Charles Lamanna, Microsoft’s Corporate Vice President for Business & Industry Copilot, in the company’s announcement. This allows for automating cumbersome processes like populating data entry forms, aggregating information for market research online, or handling digital invoices without manual intervention.
Microsoft is positioning this capability as an advancement over traditional Robotic Process Automation (RPA), suggesting the AI’s reasoning abilities make it less prone to breaking when application layouts change—a common frustration with script-based RPA.
According to Microsoft, “It adjusts in real time using built-in reasoning to fix issues on its own, so work continues without interruption.” Building these automations involves describing the desired task in natural language, and developers get real-time video feedback showing the agent’s planned steps for easier refinement.
Strategy, Security, and Availability
This new function is part of a wider push by Microsoft into agentic AI. The company recently detailed other specialized agents for Microsoft 365 (‘Researcher’ and ‘Analyst’) and cybersecurity, and unveiled its Magma AI multimodal foundation model in February 2025, designed for complex interaction tasks involving vision and action. The computer use feature benefits from this background, theoretically allowing it to understand and interact with GUIs more intelligently.
Microsoft assures enterprise customers that the process runs within the Azure cloud environment, data is not used for training the core AI, and administrators have oversight. The official blog notes that “Makers can view a history of computer use activity at will, including captured screenshots and reasoning steps.” Nonetheless, giving AI the keys to operate software interfaces inherently brings security considerations into focus.
Security researchers have previously demonstrated potential risks, showing how similar AI agent tools could theoretically be exploited for malicious purposes like sophisticated phishing attacks if not carefully secured. Striking the right balance between functionality and safety will be key.
The “computer use” feature is currently available as an early access research preview. Interested parties need a preview environment located in the US to apply via Microsoft’s sign-up form. Microsoft indicates more information will be forthcoming at its Build developer conference in May 2025.