Anthropic updates Claude Opus 4.7 with 13% coding boost, vision upgrades

Serge Bulaev

Serge Bulaev

Anthropic released Claude Opus 4.7, which may offer a 13% improvement in coding tasks and about 60% fewer task failures in long workflows compared to the previous version. Benchmark reports suggest Opus 4.7 leads other models in agentic tasks, but experts note these numbers come from secondary analysis and should be viewed cautiously. Vision capabilities now support images up to 2576 pixels, which could make diagrams clearer but might also increase costs and reduce how many images fit in a context window. There is early interest in Claude Cowork, a desktop tool that helps with tasks like making spreadsheets or summarizing files, although some limitations and cautions apply.

Anthropic updates Claude Opus 4.7 with 13% coding boost, vision upgrades

Claude Opus 4.7 was reported to have a 13% improvement on a 93-task coding benchmark, plus significant vision and agentic-task improvements, including self-verification and better long-running autonomy. This release prioritizes greater reliability and behavioral consistency over chasing headline benchmark scores alone.

Benchmark Performance and Competitive Landscape

Early benchmark analysis from sources like Spectrum AI Labs positions Opus 4.7 as a leader in agentic tasks according to industry reports, though experts urge caution, as these figures are from secondary reports and await independent verification.

The update brings a notable 13% improvement on a 93-task coding benchmark over its predecessor, Opus 4.6. This performance jump, coupled with significant improvements in task completion during long agentic sequences, signals a significant advance in the model's practical reliability for demanding developer workflows.

Enhanced Reliability with Task Budgets

A key focus for Opus 4.7 is behavioral reliability, reflected in substantial reductions in abandoned tasks during internal tests. This is partly achieved through task budgets, a new optional API feature. Developers can set an advisory token target for an entire agentic process, which helps the model manage resources more effectively. As detailed by Caylent, this should be paired with a max_tokens limit for robust cost control.

Upgraded Vision Capabilities and Cost Considerations

The model's vision capabilities have been upgraded to process images up to 2576 pixels (around 3.75 MP), a significant jump from the previous 1568px limit. While this improves clarity for detailed diagrams and UI screenshots, it comes at a cost. Each high-resolution image can use up to three times more tokens, reducing the number of images that can fit into the context window.

Claude Cowork: The Desktop Agent

Alongside the model update, the Claude Cowork desktop application is gaining traction. As reported by VentureBeat, Cowork allows users on macOS and Windows to automate tasks by working directly with local files and applications. It can draft documents, create spreadsheets, and even automate browser actions.

Early adopters are using Cowork for workflows such as:
- Organizing local files and project folders
- Creating presentations and Excel files with formulas
- Automating data entry on web forms
- Summarizing large document sets
- Scheduling and running long-running tasks overnight

The ability to generate complete artifacts rather than just text responses is proving popular, though experts remind users to be mindful of security and system limitations when running local agents.


What exactly changed in Claude Opus 4.7's coding performance?

According to industry reports, Anthropic's internal 93-task benchmark shows a 13% jump over Opus 4.6, while public benchmarks show significant improvements over competing models. Perhaps more telling is that several tasks were solved by 4.7 that previous versions could not complete, and the CursorBench pass rate showed substantial improvement. The gains come from tighter self-verification loops and a literal instruction style that abandons fewer multi-step jobs.

How does the new "task budget" feature work in practice?

Instead of only a per-turn max_tokens ceiling, you can now pass an advisory task_budget that covers the entire agent loop - reasoning, tool calls, tool returns, and final answer. The model sees the approximate token pool and paces itself, starting to wrap up gracefully as the budget tightens rather than dying mid-task. Early tests show significantly fewer abandoned long-horizon jobs, but the budget is not a hard cap (minimum 20k tokens) and must be paired with max_tokens for safety.

Is the 2576px vision upgrade worth the extra tokens?

Image resolution jumped from 1568px to 2576px on the long edge (3.75MP), letting the model read dense screenshots, diagrams, and UI mock-ups at native fidelity. The trade-off is cost: each high-resolution image can consume roughly 3× the tokens, and the maximum number of images per 200k context drops substantially. If your workflow needs pixel-perfect UI automation or OCR-like accuracy, the boost is real; otherwise, down-sampling client-side is the accepted way to keep costs flat.

How does Opus 4.7 compare to GPT-4o and Gemini 1.5 Pro for agentic work?

Third-party round-ups agree that Opus 4.7 leads on coding and agentic benchmarks, while GPT-4o retains an edge in real-time multimodal latency and Gemini 1.5 Pro is rarely listed in the same top-tier agent tables. For production tasks, Anthropic claims significantly more benchmark resolutions than its own prior release, and improved pass rates versus competing models. If your use case is autonomous multi-step coding, the public numbers favor Opus 4.7 according to industry reports.

Where can I try Opus 4.7 and Claude Cowork today?

  • API: Opus 4.7 is available through Anthropic's API with task budget features in beta testing.
  • Consumer: toggle "Opus 4.7" in the model picker on Claude Max plans.
  • Desktop agent: Claude Cowork is available for macOS and Windows desktop app users; it pairs with the Claude Chrome extension for web tasks and supports persistent threads so jobs continue while you step away.