Anthropic's Claude Code uses a 5-stage pipeline to compact context

Serge Bulaev

Serge Bulaev

Claude Code appears to use a five-stage process to organize and compact information before sending it to its core language model. This process, sometimes described as a "context burger," stacks different types of information in a specific order, and most of the work may happen outside the model itself. Hierarchical instruction files, like short Markdown guides, seem to let engineers adjust the system's behavior quickly. Testing these prompts and using summaries instead of long histories might help teams save on costs and make their work faster. Some sources suggest treating this setup as flexible infrastructure, not just static text.

Anthropic's Claude Code uses a 5-stage pipeline to compact context

Anthropic's Claude Code uses a 5-stage pipeline to compact context, an architecture engineers call the "Context Burger." This system intelligently organizes and compresses information within a thick operating layer before it reaches the core transformer model. Research and teardowns reveal this pre-processing happens almost entirely outside the model, a design described as a "thin brain, thick OS." This approach is key to managing costs and improving performance.

The "Thin Brain, Thick OS" Architecture

This design is documented in multiple sources. Industry reports describe the system as a "thin brain, thick OS" with multiple main injection points. This model aligns with a design-space survey from arXiv, which identifies a complex structure built on a 5-layer subsystem stack (arXiv paper). Both analyses confirm that context shaping is a primary function of the operating harness, not the core AI.

The five-stage pipeline sequentially reduces context and improves signal quality. It includes budget reduction to swap large outputs for pointers, snipping old turns, micro-compacting tokens, collapsing adjacent messages, and using the model to auto-summarize history. This keeps the active window small and relevant.

How the "Context Burger" Stacks Information

Developers often visualize the system's component layout as a multi-layer "burger." Each layer claims a portion of the limited context window, with placement affecting its priority and persistence:

  • System Prompt & Metadata: Establishes universal, high-level rules.
  • Hierarchical Instruction Files: Sets project and directory-specific policies (CLAUDE.md).
  • Prefetched Memory: Loads relevant code snippets or documentation.
  • Conditional Rules & Tools: Defines available tools based on the current path or state.
  • Conversation History: Provides live state from the ongoing dialogue and tool outputs.
  • Compact Summary: Replaces older history as the conversation length increases.

The 5 Compaction Stages That Preserve Context

The context compaction pipeline runs through five sequential shapers to keep the context window efficient and high-signal:

  1. Budget Reduction: Swaps oversized outputs for pointers to save space.
  2. Snip: Trims the oldest and least relevant conversation turns.
  3. Micro-Compact: Performs cache-aware token pruning for efficiency.
  4. Context Collapse: Merges adjacent messages together without losing information.
  5. Auto-Compact: Asks the model to generate a concise working summary of older history.

The Role of Hierarchical Instruction Files (CLAUDE.md)

Technical analysis highlights the importance of using short, version-controlled Markdown files (CLAUDE.md) to provide high-signal guidance. By placing these files near the code they govern, engineers can modify model behavior by editing simple text instead of application code. The system merges all relevant instruction files into the final prompt, allowing small wording changes to instantly shift the model's tone, goals, or safety constraints.

Key Takeaways for Development Teams

The documented pipeline provides three clear takeaways for teams building with Claude Code:

  1. Treat Prompts as Code: The system prompt and CLAUDE.md files require the same level of testing and version control as application code.
  2. Leverage Summarization: Summaries are a core feature. A concise 200-token summary can effectively replace thousands of historical tokens, reducing overhead.
  3. Manage Live Context to Control Costs: Both cost and latency are directly tied to the amount of live context. Trimming information early in the pipeline leads to significant savings.

Teams that view the context pipeline as configurable infrastructure, rather than static text, report faster iterations and lower overall token consumption.