Ex-Meta Engineer Ships 40 PRs Daily with AI Agent Setup

Former Meta engineer Kun Chen describes a terminal-based, agent-powered workflow that may let him focus more on what to build rather than typing code line by line. His setup uses lightweight tools like WezTerm, tmux, and Neovim, plus agents and validators that automate code changes and testing. Chen claims he ships between 20 and 40 pull requests daily with little manual code review, as agents and validators handle most tasks. This approach appears to scale well, as thousands of Atlassian engineers adopted similar tools. Analysts suggest that such terminal setups use less memory than traditional graphical IDEs, which may help run many agents at once.

An ex-Meta engineer's method for shipping up to 40 pull requests daily with an AI agent setup has captured the developer community's attention. Kun Chen, now a Lead Principal Engineer at Atlassian (Atlassian profile), detailed his agentic engineering workflow, arguing that a lightweight terminal stack with purpose-built agents lets him focus on high-level strategy instead of line-by-line coding.

The Core Terminal Stack

Kun Chen's setup relies on a lightweight terminal stack (WezTerm, tmux, Neovim) to run multiple AI agents in parallel with low memory overhead. Custom agent harnesses and automated validators handle code generation, testing, and pull request creation, allowing him to orchestrate complex tasks with minimal manual intervention.

Chen's foundation begins with WezTerm, a GPU-accelerated terminal, orchestrated with tmux for managing multiple panes. Each pane runs Neovim, configured with plugins like oil.nvim for file navigation and neogit for Git operations. To maintain persistent sessions across devices, he uses Tailscale and mosh, allowing him to detach and re-attach to the same tmux layout seamlessly. He even integrates voice input via OpenSuperWhisper for transcribing spoken prompts, which he reports significantly speeds up repetitive edits.

Agent Harnesses and Parallel Workflows

To remain vendor-agnostic, Chen uses two primary agent harness layers: Claude Code for Anthropic models and an OpenCode shim for other models. As detailed in his ByteByteGo article (ByteByteGo post), this allows for flexibility. His workflow typically involves five to ten concurrent tasks, each isolated in its own Git worktree to prevent branch collisions. A suite of custom tools provides validation and orchestration:

Lavish Editor: Renders planning artifacts as interactive HTML, allowing agents to iterate on tasks.
treehouse: Manages the creation and destruction of worktrees for each agent session.
no-mistakes: An automated validator that runs end-to-end tests in a clean environment and opens PRs only when all checks pass. This tool serves as an automated code reviewer to help maintain quality.
gnhf: An orchestrator that breaks down large tasks and schedules long-running jobs overnight.

Performance and Enterprise Adoption

In a conversation on Creator Economy (20-40 PRs interview), Chen stated he ships between 20 and 40 pull requests daily with 20-30 agents running concurrently. He emphasizes that manual code review is minimal due to the robust automated validation process. This terminal-first philosophy has proven scalable at Atlassian, where Chen applies these principles to the Rovo Dev agent, an AI tool for developers.

The Advantage of a Terminal-First Approach

The primary benefit of a keyboard-driven stack is its efficiency. These setups typically consume only 100-300 MB of RAM, a fraction of the 1 GB+ used by many graphical IDEs with multiple extensions. This low memory footprint is critical for running dozens of parallel agents without overwhelming system hardware. Furthermore, by relying on open-source, text-configurable tools like WezTerm and Neovim, the entire workflow avoids vendor lock-in, providing maximum control and customizability.

How does Kun Chen's terminal-first setup let him ship 40 PRs per day?

He runs 20-30 AI agents in parallel inside a terminal-first stack of WezTerm + tmux + Neovim.
- tmux splits his screen into lightweight panes, each running an independent agent.
- Neovim edits code in place while agents stream patches directly to the buffer.
- Tailscale + mosh let him re-attach to the same tmux session from any device, keeping all agents alive.
By keeping every tool in the terminal, total memory usage stays below 300 MB, far less than the 1 GB+ common in full IDEs, which means his laptop stays responsive even with dozens of agents running overnight.

What custom tools did he build to orchestrate and validate the agents?

Lavish Editor - renders agent-generated plans as interactive HTML artifacts that both human and agent can iterate on inside a browser tab.
gnhf - a long-running orchestrator that breaks a large task into fresh-context steps and produces a branch with clean commits and timestamped notes.
no-mistakes - an autonomous pre-merge reviewer that spins up a clean worktree, runs the full CI pipeline, and surfaces bugs. This tool helps identify and flag issues before PRs are opened.
treehouse - a worktree manager that lets five to ten tasks run in parallel without merge conflicts, one Git worktree per agent.

How does he avoid vendor lock-in while still leveraging advanced AI models?

He stays agent-agnostic.
- Claude Code drives Anthropic models.
- OpenCode is the harness for OpenAI, Gemini, or local OSS models.
Both connectors expose the same Unix-socket protocol, so swapping providers is a one-line change in the shell. He keeps the orchestration layer (gnhf, no-mistakes, Lavish) completely open-source, eliminating any dependence on a single SaaS.

How does voice input fit into a keyboard-driven workflow?

Kun maps OpenSuperWhisper (Whisper turbo v3 large) to a global hotkey.
A single spoken sentence like "add exponential backoff to the retry loop" is transcribed in under 400 ms, dropped into Lavish's prompt pane, and dispatched to the relevant agent.
Because the mic is open only while the hotkey is pressed, the rest of the session stays 100 % keyboard driven, preserving tmux navigation speed.

What validation practices prevent "agent drift" or low-quality PRs?

Delegate outcomes, not tasks - every job ticket starts with a test that must pass; the agent must prove it passes before proposing code.
Clean context - each validation run spins up a pristine worktree to rule out leftover artifacts.
Escalation gate - if the agent encounters anything that smells like a product decision (not just a code fix), it pauses and pings Kun via macOS notification.
E2E evidence required - every PR body includes a link to a screen-capture GIF showing the new behavior running end to end.