Anthropic adopts 4-phase workflow for Claude-generated code

Anthropic uses a four-step process for code created by its Claude AI, treating the code as a draft until tests and checks are passed. The workflow includes planning, testing, and automatic rejection if certain rules fail, which may help keep code quality high. Reports suggest that about half of Anthropic's sales staff use Claude Code weekly, and editing errors might have decreased after starting this workflow, but these numbers are unconfirmed. The company also appears to follow strong security checks and requires extra review for sensitive code. These steps may help Anthropic deliver new features quickly while keeping risks low, and other teams could use similar methods.

To enhance software development, Anthropic has published guidance and tooling for more structured, plan-then-act or code-execution-driven workflows, treating all model output as a draft until it passes stringent verification. This process emphasizes validation loops, role separation, and aggressive automation over simplistic one-click code generation, establishing a new standard for quality and safety.

What Are the Four Phases of Anthropic's Coding Workflow?

Anthropic's structured workflow for AI code generation involves a process beginning with exploration and planning, where Claude outlines the implementation. This is followed by code implementation and automated testing against its own plan. The final phase involves creating a pull request for human review and approval.

The workflow breaks down into four distinct stages:

Explore: In "plan mode," Claude is asked to identify relevant files, constraints, and potential challenges.
Plan: The engineer requests a detailed implementation plan that specifies functions, files, and necessary tests.
Code: After the plan is approved, Claude generates the code changes (diff) based on the approved plan.
Test: Claude executes the tests it previously proposed and creates a pull request for final human review.

A critical guardrail is the automatic rejection of any code diff that fails predefined tests, linting rules, or security scans. Anthropic maintains this self-verification process elevates code quality and positions engineers in a supervisory role focused on architecture and high-level judgment.

How Does Anthropic Ensure Code Quality and Security?

Anthropic integrates robust security guardrails directly into its CI/CD pipeline. Every merge request automatically triggers static application security testing (SAST), dependency scanning (SCA), and policy-as-code checks. This approach aligns with best practices described in resources like GitLab's secure AI completion guide, which advocate for continuous security scanning. By adopting this "shift-left" security model, Anthropic treats all AI-generated code as untrusted until it successfully passes multiple automated gates and a final human review, especially for sensitive code affecting authentication or cryptography.

What Are the Reported Productivity Gains?

While some figures are from external observers and remain unverified, reported signals point toward significant productivity gains. Industry reports suggest that a significant portion of Anthropic's sales team uses Claude Code regularly, indicating broad adoption beyond engineering. Additionally, UncoverAlpha cited internal benchmarks showing substantial improvements in editing accuracy after the workflow's implementation.

Official data further highlights the model's focus on development tasks, with computer and mathematical tasks representing a significant portion of all conversations on Claude.ai, making it one of the largest categories.

How Do Engineering Roles Change with AI-Assisted Coding?

Anthropic's workflow reframes the engineer's role from a primary author to a high-leverage operator and reviewer. This shift aligns with broader industry findings, such as an MIT Sloan summary of Microsoft experiments, which found that AI assistants significantly increased task completion for junior developers. In this new paradigm, engineers focus more on designing systems, curating prompts, and approving merges, rather than writing every line of code from scratch.

How Can Other Teams Replicate This Workflow?

Teams can adapt Anthropic's model by integrating similar principles into their existing development cycles. Key takeaways for replication include:

Plan First: Mandate a model-generated written plan for all medium-to-large changes before implementation.
Automate Failure: Embed automated tests, linting, and security scans that cause pull requests to fail immediately upon error.
Mandatory Review: Route all code affecting sensitive areas like authentication, encryption, or database schemas for mandatory senior review.
Isolate Search: Use isolated AI sub-agents for large-scale codebase searches to prevent context window bloat, as Anthropic recommends.

By adopting these checkpoints, development teams can leverage AI to accelerate feature delivery while effectively managing the associated risks, ensuring model-written code is vetted before reaching production.

What exactly is Anthropic's structured workflow for Claude-generated code inside the company?

Anthropic's current internal playbook for non-trivial tasks is spelled out in four explicit stages:

Explore in plan mode - Claude first maps the problem space and surfaces alternative approaches.
Plan - it then proposes a detailed, step-by-step implementation plan that can be reviewed by the human engineer.
Code - once the plan is accepted, the agent exits plan mode and writes the code against the agreed checklist.
Test & PR - Claude executes tests and opens the pull request so humans can perform final review and merge.

Small, one-line fixes skip phases one and two because the planning overhead is not justified.

Source: Claude Code in Enterprise Codebases - Pasquale Pillitteri summary of Anthropic guidance

How does Anthropic measure productivity gains from Claude-generated code?

Industry reports suggest significant productivity improvements, though specific metrics vary:

internal code-editing error rates showed substantial improvement in Anthropic's own test suite
significant adoption among Anthropic sales staff indicates the tool solves real workflow problems beyond just engineering demos
coding tasks remain among the largest use-cases on Claude platforms according to industry reports

Source: UncoverAlpha

What guardrails are in place for security and quality?

Anthropic's guidance boils down to three non-negotiables:

Self-verification loops - every code block must be paired with runnable tests, lint commands, or screenshot comparisons that return a clear PASS/FAIL.
Sub-agent isolation - exploratory search runs in a dedicated Claude instance whose results are summarized back to the main agent, keeping the planning context small and auditable.
Human review gates - authentication logic, encryption, credential handling, and infra-as-code changes are routed to senior engineers and security reviewers before merge.

Sources: Pasquale Pillitteri summary

How have human roles shifted as Claude writes most code?

Engineers at Anthropic describe their day-to-day as moving from primary author to high-leverage reviewer:

Architecture & judgment - humans now spend more time on system boundaries, trade-offs, and interface design.
Debugging & integration - the code volume is higher, so humans specialize in spotting subtle integration or performance issues.
AI oversight - writing and refining prompts, curating test suites, and maintaining the guardrail policies.

Entry-level tasks see the biggest automation, while seniors focus on coaching AI and reviewing risk-heavy areas.

Sources: MIT Sloan recap of Microsoft/Copilot study

Can other teams replicate Anthropic's workflow without starting from scratch?

Yes, the same structure can be transplanted to any repo that already has CI:

Phase tooling - adopt Claude Code's "plan mode" flag or replicate with an IDE extension that writes the plan to a scratch file for human sign-off.
Automated validation - wire the structured script into your existing CI so each PR runs the same test / lint / screenshot checks Anthropic uses.
Human gates - protect main branches with mandatory reviewer rules for files flagged as sensitive (auth, secrets, infra).

A concise policy teams can adopt tomorrow: treat every AI diff as untrusted until tests pass, reviewed by a human, and logged in an audit trail.

Sources: GitLab secure AI-code completion guide and Pasquale Pillitteri summary