ComputerRL: Zhipu AI’s Open-Source Agents Surpass Industry Benchmarks for Autonomous Desktop Automation

Zhipu AI launched ComputerRL, an open-source tool that lets smart agents control computers just like humans, using both clicks and direct code actions. It’s faster and outperforms previous solutions on real desktop tasks, learning from recorded actions. This technology can automate jobs in healthcare, customs, and large enterprises, with Zhipu AI planning further enhancements and cost reductions. The project has significant backing and is collaborating with major partners like Huawei and Alibaba.

What is Zhipu AI’s ComputerRL and why is it significant for autonomous desktop automation?

ComputerRL is an open-source reinforcement learning framework by Zhipu AI that enables agents to control real desktop environments using both API and GUI actions. It surpasses industry benchmarks, achieves a 48.1% OSWorld desktop success rate, and dramatically reduces multi-app workflow development time.

Zhipu AI’s ComputerRL open-sources computer-use agents that now beat Operator and Sonnet 4 on industry desktops*
What was announced
On 22 August 2025 Zhipu AI released ComputerRL* , an end-to-end reinforcement-learning framework that teaches agents to control real desktop environments. The project is already live on GitHub with full technical documentation.
Why it matters*
Until now, fully autonomous computer-use agents struggled when tasks required a mix of
precise API calls (e.g., “create invoice via REST endpoint”)
unpredictable GUI actions (click, scroll, drag).

ComputerRL unifies both modes in what Zhipu calls the API-GUI paradigm. Early adopters say this halves development time for multi-app workflows.

Training at scale*
Hardware footprint: 3 000 + virtual desktops, each running inside Docker containers orchestrated by gRPC.
Compute budget: 22 B parameter updates distributed across the fleet every day.
*Data: * 200 TB of human desktop recordings plus synthetic tasks generated by GLM-4.5.
Benchmark results (OSWorld desktop benchmark)*

Model	Success Rate	Notes
Operator baseline	41.7 %	2024 industry reference
Sonnet 4 baseline	44.9 %	Anthropic, Feb 2025
AutoGLM 9B + ComputerRL	48.1 %	New record, open source

Source: Zhipu blog + MarkTechPost coverage*
Real-world pilots starting Q4 2025*
Malaysian customs authority: automates form-filling across legacy Windows XP terminals.
Singaporean hospital network: agents schedule radiology exams and update EMRs without new vendor integrations.
UAE sovereign fund: pilots fully automated quarterly reporting from Excel, Power BI and SAP.
Open-source stack*

Component	Licence	Download
ComputerRL core	MIT	GitHub
AutoGLM 9B model	Apache-2.0	1.2 M pulls on Hugging Face
OSWorld tasks	CC-BY-4.0	5 000 + labelled videos

State backing and global reach*
$1.4 B in Chinese state funding since 2022 (Neuron Expert).
Partnerships with Huawei & Alibaba Cloud for on-prem deployments in data-sovereign markets.
OpenAI acknowledged Zhipu as “a key driver of China’s push for technology self-reliance” (SCMP, June 2025).
Next milestone*
By mid-2026 Zhipu plans to push the OSWorld success rate above 60 % while reducing per-desktop GPU cost by 40 % through quantized 4-bit inference.

What exactly is ComputerRL and why does it matter?

ComputerRL is Zhipu AI’s open-source reinforcement-learning stack built to teach software agents how to use a desktop or laptop exactly like a human would. Instead of scripting brittle click-paths, the framework trains models through trial-and-error on thousands of virtual machines, learning both API calls (fast, programmatic control) and GUI actions (click, scroll, drag-and-drop). The result is an agent that can start apps, fill forms, search the web or install software without hard-coded instructions.

Key takeaway: one 9-billion-parameter AutoGLM agent trained with ComputerRL hit 48.1 % task success on the OSWorld benchmark, outperforming earlier baselines from OpenAI Operator and Anthropic’s Sonnet 4.

How does the training scale to “thousands of desktops”?

Zhipu spins up containerized Linux and Windows desktops via Docker and orchestrates them with gRPC. A distributed RL loop pushes observation screenshots, mouse/keyboard actions and reward signals to the model at high frequency. By splitting exploration across thousands of ephemeral VMs, ComputerRL achieves the same sample-efficiency gains seen in large-scale game RL without needing a custom data-center. The only hardware requirement is enough RAM to keep each desktop snapshot live during rollout.

Which real tasks can the agents already handle?

Early pilots and public demos show the agents completing:

Cross-app workflows – e.g., open a browser, download a CSV, import it into LibreOffice Calc and create a chart, all in one continuous session.
Web research – search the latest news on a topic, summarize key points and paste them into a new Google Doc.
Software installation – locate an .exe or .deb file, step through the installer GUI, accept TOS and launch the program.

In benchmark terms, the OSWorld suite covers 369 such multi-step tasks; ComputerRL agents clear nearly half of them end-to-end.

Is the technology enterprise-ready today?

For proof-of-concept and narrowly scoped processes, yes. Zhipu’s GLM-4.5 model (355 B parameters, MIT license) is being rolled out as “infrastructure-in-a-box” with Huawei and Alibaba Cloud, offering on-prem or sovereign-cloud deployment. Early adopters in Malaysia, Singapore and the UAE are testing the stack for document-heavy back-office flows and multilingual help-desk automation.

Caveat: Full-scale production still requires human review; the 48 % success rate means roughly one in two tasks may still need intervention or scripted fallback.

Where can developers start experimenting?

Everything is published under the MIT license:

GitHub: github.com/zhipuai/computerrl (code + pretrained checkpoints)
Model weights: Hugging Face zhipuai/glm-4.5-9b-autoglm (9 B variant fine-tuned for agents)
Docs: cover installation, Docker compose for 1-click desktop farm, and a minimal “Hello OSWorld” notebook.

Zhipu also hosts a hosted playground at tensorblock.ai where you can upload a 30-second screen-recording and watch the agent reproduce the workflow in a sandboxed VM.