Anthropic unveils Claude Opus 4.8: Faster, cheaper, safer AI

Anthropic has released Claude Opus 4.8, which may be faster, cheaper, and safer than its earlier version. The company says Opus 4.8 costs about one-third as much and gives responses around 2.5 times quicker than Opus 4.7 in fast mode. Early tests suggest it performs well at fixing complex code and avoids misaligned behavior more often, but some engineers might still prefer other tools like Codex for certain quick tasks. Anthropic's own checks indicate the new model flags uncertainty more and avoids unsupported claims, though these results may need confirmation from outside labs. Pricing stays similar to before, and actual savings might depend on how teams use it.

Anthropic has launched Claude Opus 4.8, a new AI model promising significant gains in speed, cost-effectiveness, and safety. Responding to developer demands for faster performance without sacrificing reliability, Anthropic claims the model is 2.5 times faster and one-third the cost of Opus 4.7 in "fast mode," targeting daily production workloads. This release is positioned as a comprehensive upgrade for coding, agentic functions, and analysis, with a key emphasis on reducing misaligned behavior.

Performance and Benchmark Claims

Claude Opus 4.8 is the latest AI model from Anthropic, engineered for enhanced speed, cost-efficiency, and safety. The company claims it operates 2.5 times faster and costs one-third as much as its predecessor in fast mode, targeting production environments with improved coding, analysis, and alignment.

Early performance data provides context for these claims. Anthropic's official system card reports a 69.2% score on the SWE-bench Pro code-repair test and 88.6% on SWE-bench Verified. According to industry reports, competing models show varying performance on similar benchmarks, with available sources showing conflicting results on Terminal-Bench 2.1 - some indicating GPT-5.5 at 78.2% and Opus 4.8 at 74.6%, while others mention CLI harness scores of 84.4% without clear attribution. These results suggest Opus 4.8 excels at complex repository repairs, while other models may retain advantages for specific shell-based tasks.

Coding Performance: Opus 4.8 vs. Codex

For real-world coding projects, a clear division in optimal use emerges. Opus 4.8 demonstrates superior planning and multi-file reasoning, making it ideal for architectural changes. However, many engineers still prefer Codex for its speed in rapid terminal sessions. Industry analysis suggests benchmark outcomes can vary based on tooling, such as a CLI-native versus a browser-based test harness. The consensus suggests a hybrid approach: using Opus 4.8 for large-scale architectural work and Codex for iterative, tight feedback loops.

A Stronger Focus on Safety and Alignment

Anthropic places a heavy emphasis on safety enhancements. According to internal evaluations, rates of misaligned behaviors like deception are "substantially lower" than in Opus 4.7, comparable to the highly-rated Mythos Preview model. The model is also trained for greater honesty, flagging its own uncertainty and avoiding unsupported claims more frequently. While these findings await independent verification, they are consistent with the model's improved bug-detection capabilities.

Early Adopters and Use Cases

Leading companies across e-commerce, legal tech, and data infrastructure are already piloting Opus 4.8, including:

Shopify
Cursor
Harvey
BrowserBase
Bridgewater
Thomson Reuters
Databricks

Feedback from these early adopters highlights the model's advanced agentic judgment. Testers report that it "asks the right questions" and "pushes back when a plan isn't sound," demonstrating a level of reasoning beyond simple token generation.

New Features and Pricing for Production Workflows

For production teams, Claude Code now enables dynamic workflows with hundreds of parallel sub-agents for tasks like automated bug hunting and migration scripts. Key API updates include mid-conversation system messages and a lower prompt-cache minimum to help manage long-running processes efficiently. Pricing remains consistent with the previous Opus tier at approximately $5 per million input tokens and $25 for output. While fast mode is more expensive, Anthropic argues that faster completion times lead to a lower net cost.

Overall, Opus 4.8 presents a compelling, cost-efficient solution for complex codebase analysis and safety-critical applications. However, for latency-sensitive terminal tasks, other models may still hold advantages pending further third-party benchmarking.

What exactly is new in Claude Opus 4.8?

Anthropic released Claude Opus 4.8 as a direct upgrade that delivers broad improvements in coding, agentic tasks, financial analysis, writing, and general knowledge work. Headline claims: the model is three times cheaper and 2.5× faster in "fast mode" compared with earlier Opus versions. In early benchmarks, Opus 4.8 scores 69.2 % on SWE-bench Pro and 88.6 % on SWE-bench Verified, with competing models showing varying performance on similar tests.

How do real customers already use it?

Shopify, Cursor, Harvey, BrowserBase, Bridgewater, Thomson Reuters, and Databricks have all tested the new model. Anthropic's announcement quotes testers saying Opus 4.8 "asks the right questions, catches its own mistakes, pushes back when a plan isn't sound," and is "a great model to build with." In practice, teams apply it to automated bug hunts, migration scripts, and stress-testing workflows, while Claude Code can now orchestrate dynamic workflows that spawn tens to hundreds of parallel sub-agents in a single session.

How does Opus 4.8 compare to OpenAI Codex for coding?

Independent reviews and benchmark tables show a clear split:
- Repo-wide bug fixing and complex engineering: Opus 4.8 wins with higher SWE-bench Pro scores.
- Terminal-centric loops and fast iterative coding: OpenAI Codex still leads in many scenarios, especially for rapid development workflows.

In short, choose Opus 4.8 for planning and deep repo changes, then drop to Codex for quick shell operations and review.

What safety and alignment improvements does Anthropic highlight?

Anthropic states Opus 4.8 reaches new highs on prosocial traits, including:
- Four-fold reduction in letting code flaws pass unremarked.
- Misalignment metrics such as deception or cooperation with misuse are "substantially lower than Opus 4.7", matching the already-safe Claude Mythos Preview.
- Trainers tuned the model to flag uncertainties rather than make unsupported claims, a direct benefit for finance and research workflows where one wrong assertion can cascade.

The full technical details appear in the Claude Opus 4.8 System Card.

What will this cost, and is "fast mode" worth it?

Pricing stays flat: $5 per million input tokens and $25 per million output tokens.
"Fast mode" is twice the normal price but keeps the 2.5× speed gain, so interactive or user-facing agents are the clearest ROI. Teams on tight token budgets report running heavy analysis in standard mode and reserving fast mode for latency-sensitive sub-tasks, minimizing the overall cost impact.