- Launch: Alibaba launched Qwen3.7-Plus as a multimodal model for screen and coding automation.
- Screen Actions: The model may read interfaces, choose actions, execute steps, and check results across app and cloud tasks.
- Evidence Caveat: Pricing, demo, and benchmark figures remain attributed claims because no dedicated public Qwen3.7-Plus specification carries them.
- Competition: Anthropic, OpenAI, and Microsoft Research already have computer-use or browser-agent efforts, raising the comparison bar.
Alibaba’s Tongyi Qianwen team has launched Qwen3.7-Plus as an iterative upgrade to the Qwen3.7 multimodal model family. Alibaba pitches the model for screen and coding automation: reading screens, choosing actions such as clicking or typing, writing code, and operating tools rather than only answering prompts.
Alibaba’s Qwen model family already covers large language and multimodal model work.
How Qwen3.7-Plus Acts on Screens
Qwen3.7-Plus adds native vision input, screenshot perception, browser automation, app operation, and screen navigation to the Qwen3.7 lineup. Alibaba’s model team described it as a “multimodal interactive hybrid agent”.
Computer-use workflows fail when a model understands a request but cannot locate the right button, field, terminal command, or application state. Qwen3.7-Plus combines visual perception with agent capabilities as its core mechanism to solve this issue.
Concrete demos give that task loop shape. According to the Qwen team, a hybrid agent run using Qwen3.7-Plus has generated more than 10,000 lines of code across more than 1,000 agent calls during an eleven-hour vocabulary-app build. That scale matters for evaluation because long-running agents need recovery behavior, not just a single correct answer.
The new model is claimed to have recreated the native macOS Stocks app after having parsed the interface, generated SwiftUI code, connected an application programming interface, compiled the result, and run ten functional tests. Qwen for Chrome can enter agent mode with user permission for a cloud task such as selecting a low-cost virtual server instance.
Compatibility includes the Anthropic API protocol with support for Anthropic’s Claude Code developer tool, the OpenClaw agent gateway, and Alibaba’s Qwen Code. Pricing lists Qwen3.7-Plus at $0.40 per million input tokens and $2.40 per million output tokens, below cited Qwen3.7-Max figures of $2.50 and $7.50 for the language-only counterpart.
Benchmark figures place Qwen3.7-Plus at 79.0 on ScreenSpot Pro, a screen-grounding benchmark, and 70.3 on Terminal-Bench, a terminal-task benchmark. Developers get comparison points, but long action chains can still compound small mistakes in live work.
Alibaba’s Qwen Roadmap Meets Computer-Use Rivals
Alibaba is joining a computer-use race that has been building since 2024. Anthropic first introduced computer use for Claude in October 2024, giving Claude 3.5 Sonnet the ability to view screens, move a cursor, click buttons, and type through tools.
OpenAI followed in 2025 with Operator for browser actions, while Microsoft Research introduced Fara1.5 models in May 2026 as browser computer-use agents in 4B, 9B, and 27B sizes. Qwen3.7-Plus enters that field with a broader claim around app, terminal, coding, and cloud-console work.
Alibaba’s Qwen3-Coder release pushed agentic coding workflows in 2025, while its Qwen3-VL release built out vision-language groundwork. Earlier 2026 Qwen agentic model positioning also centered on agentic capability and cost cuts.
Qwen3.7-Plus brings those coding and vision strands closer together by putting screen state, coding, browser control, and cloud-console operation inside one proprietary model pitch. Enterprise adoption depends on whether Alibaba can turn the adjacent capabilities into one managed workflow product.
What Still Needs Proof
Reliability will decide whether developers treat Qwen3.7-Plus as an automation tool or a demo system. Long agent workflows compound small mistakes when a model must click, type, compile, test, and recover from errors over dozens or hundreds of steps.
A Vision Arena ranking for Qwen3.7-Plus-Preview adds another comparison point, but customer evidence on real work would carry more weight. Benchmarks and staged demos can show direction; production users need evidence that the model handles interface changes, permissions, failures, and audit requirements.


