Content.Fans
    No Result
    View All Result
    No Result
    View All Result
    Content.Fans
    No Result
    View All Result

    ComputerRL: Zhipu AI’s Open-Source Agents Surpass Industry Benchmarks for Autonomous Desktop Automation

    Serge by Serge
    August 22, 2025
    in AI News & Trends
    0
    ComputerRL: Zhipu AI's Open-Source Agents Surpass Industry Benchmarks for Autonomous Desktop Automation

    Zhipu AI launched ComputerRL, an open-source tool that lets smart agents control computers just like humans, using both clicks and direct code actions. It’s faster and outperforms previous solutions on real desktop tasks, learning from recorded actions. This technology can automate jobs in healthcare, customs, and large enterprises, with Zhipu AI planning further enhancements and cost reductions. The project has significant backing and is collaborating with major partners like Huawei and Alibaba.

    What is Zhipu AI’s ComputerRL and why is it significant for autonomous desktop automation?

    ComputerRL is an open-source reinforcement learning framework by Zhipu AI that enables agents to control real desktop environments using both API and GUI actions. It surpasses industry benchmarks, achieves a 48.1% OSWorld desktop success rate, and dramatically reduces multi-app workflow development time.

    • Zhipu AI’s ComputerRL open-sources computer-use agents that now beat Operator and Sonnet 4 on industry desktops*

    • What was announced
      On 22 August 2025 Zhipu AI released
      ComputerRL* , an end-to-end reinforcement-learning framework that teaches agents to control real desktop environments. The project is already live on GitHub with full technical documentation.

    • Why it matters*
      Until now, fully autonomous computer-use agents struggled when tasks required a mix of

    • precise API calls (e.g., “create invoice via REST endpoint”)
    • unpredictable GUI actions (click, scroll, drag).

    ComputerRL unifies both modes in what Zhipu calls the API-GUI paradigm. Early adopters say this halves development time for multi-app workflows.

    • Training at scale*
    • Hardware footprint: 3 000 + virtual desktops, each running inside Docker containers orchestrated by gRPC.
    • Compute budget: 22 B parameter updates distributed across the fleet every day.
    • *Data: * 200 TB of human desktop recordings plus synthetic tasks generated by GLM-4.5.

    • Benchmark results (OSWorld desktop benchmark)*

    Model Success Rate Notes
    Operator baseline 41.7 % 2024 industry reference
    Sonnet 4 baseline 44.9 % Anthropic, Feb 2025
    AutoGLM 9B + ComputerRL 48.1 % New record, open source
    • Source: Zhipu blog + MarkTechPost coverage*

    • Real-world pilots starting Q4 2025*

    • Malaysian customs authority: automates form-filling across legacy Windows XP terminals.
    • Singaporean hospital network: agents schedule radiology exams and update EMRs without new vendor integrations.
    • UAE sovereign fund: pilots fully automated quarterly reporting from Excel, Power BI and SAP.

    • Open-source stack*

    Component Licence Download
    ComputerRL core MIT GitHub
    AutoGLM 9B model Apache-2.0 1.2 M pulls on Hugging Face
    OSWorld tasks CC-BY-4.0 5 000 + labelled videos
    • State backing and global reach*
    • $1.4 B in Chinese state funding since 2022 (Neuron Expert).
    • Partnerships with Huawei & Alibaba Cloud for on-prem deployments in data-sovereign markets.
    • OpenAI acknowledged Zhipu as “a key driver of China’s push for technology self-reliance” (SCMP, June 2025).

    • Next milestone*
      By mid-2026 Zhipu plans to push the OSWorld success rate above 60 % while reducing per-desktop GPU cost by 40 % through quantized 4-bit inference.


    What exactly is ComputerRL and why does it matter?

    ComputerRL is Zhipu AI’s open-source reinforcement-learning stack built to teach software agents how to use a desktop or laptop exactly like a human would. Instead of scripting brittle click-paths, the framework trains models through trial-and-error on thousands of virtual machines, learning both API calls (fast, programmatic control) and GUI actions (click, scroll, drag-and-drop). The result is an agent that can start apps, fill forms, search the web or install software without hard-coded instructions.

    Key takeaway: one 9-billion-parameter AutoGLM agent trained with ComputerRL hit 48.1 % task success on the OSWorld benchmark, outperforming earlier baselines from OpenAI Operator and Anthropic’s Sonnet 4.


    How does the training scale to “thousands of desktops”?

    Zhipu spins up containerized Linux and Windows desktops via Docker and orchestrates them with gRPC. A distributed RL loop pushes observation screenshots, mouse/keyboard actions and reward signals to the model at high frequency. By splitting exploration across thousands of ephemeral VMs, ComputerRL achieves the same sample-efficiency gains seen in large-scale game RL without needing a custom data-center. The only hardware requirement is enough RAM to keep each desktop snapshot live during rollout.


    Which real tasks can the agents already handle?

    Early pilots and public demos show the agents completing:

    • Cross-app workflows – e.g., open a browser, download a CSV, import it into LibreOffice Calc and create a chart, all in one continuous session.
    • Web research – search the latest news on a topic, summarize key points and paste them into a new Google Doc.
    • Software installation – locate an .exe or .deb file, step through the installer GUI, accept TOS and launch the program.

    In benchmark terms, the OSWorld suite covers 369 such multi-step tasks; ComputerRL agents clear nearly half of them end-to-end.


    Is the technology enterprise-ready today?

    For proof-of-concept and narrowly scoped processes, yes. Zhipu’s GLM-4.5 model (355 B parameters, MIT license) is being rolled out as “infrastructure-in-a-box” with Huawei and Alibaba Cloud, offering on-prem or sovereign-cloud deployment. Early adopters in Malaysia, Singapore and the UAE are testing the stack for document-heavy back-office flows and multilingual help-desk automation.

    Caveat: Full-scale production still requires human review; the 48 % success rate means roughly one in two tasks may still need intervention or scripted fallback.


    Where can developers start experimenting?

    Everything is published under the MIT license:

    • GitHub: github.com/zhipuai/computerrl (code + pretrained checkpoints)
    • Model weights: Hugging Face zhipuai/glm-4.5-9b-autoglm (9 B variant fine-tuned for agents)
    • Docs: cover installation, Docker compose for 1-click desktop farm, and a minimal “Hello OSWorld” notebook.

    Zhipu also hosts a hosted playground at tensorblock.ai where you can upload a 30-second screen-recording and watch the agent reproduce the workflow in a sandboxed VM.

    Previous Post

    Unlocking Potential: The Power of Mentorship in Transforming Careers

    Next Post

    Bluefish Labs Secures $20M Series A to Lead Enterprise AI Marketing Analytics

    Next Post
    Bluefish Labs Secures $20M Series A to Lead Enterprise AI Marketing Analytics

    Bluefish Labs Secures $20M Series A to Lead Enterprise AI Marketing Analytics

    Recent Posts

    • AI Impersonation Attacks: The New Threat to Aviation’s Supply Chain
    • AI-Generated Proofs: The Blurring Line Between Retrieval and Invention
    • The Claude Code Playbook: AI as Your Junior Dev, Not Just a Stencil
    • Autonomous Coding Agents in 2025: A Practical Guide to Enterprise Integration, Safety, and Scale
    • AI-Generated Proof: GPT-5 Pro’s Impact on Optimization Bounds

    Recent Comments

    1. A WordPress Commenter on Hello world!

    Archives

    • August 2025
    • July 2025
    • June 2025
    • May 2025
    • April 2025

    Categories

    • AI Deep Dives & Tutorials
    • AI Literacy & Trust
    • AI News & Trends
    • Business & Ethical AI
    • Institutional Intelligence & Tribal Knowledge
    • Personal Influence & Brand
    • Uncategorized

      © 2025 JNews - Premium WordPress news & magazine theme by Jegtheme.

      No Result
      View All Result

        © 2025 JNews - Premium WordPress news & magazine theme by Jegtheme.