The One-Agent Problem
"Context bloat is the 'forget-me-not' of the LLM world. Treat your agent like a generalist, and by line 500, they'll be confidently hallucinating your own API."
I used to treat AI agents like a single omniscient developer. One conversation, one context, one agent doing everything from architecture decisions to debugging typos. It worked, mostly. But the context would bloat, the focus would drift, and by the time we got to the third subtask, the agent had forgotten what the first one was about.
Sometimes it is the model. But more often, it's the impossible expectation of asking a single brain to be its own architect, coder, and critic all at once without losing the thread.
What Is Agent Orchestration?
Agent orchestration is the practice of coordinating multiple AI agents to work together on complex tasks. Instead of one agent doing everything, you have specialized agents handling different parts of the work, with an orchestrator coordinating the flow.
Think of it like a development team. You wouldn't ask your frontend specialist to architect the database schema, or your DevOps engineer to write React components. You'd match the task to the expertise.
The Orchestration Patterns
In Building Effective Agents, Anthropic's engineering team provides a brilliant taxonomy for this. They distinguish between Workflows (predefined, predictable paths) and Agents (dynamic decision-makers), while emphasizing the "Simplest is Best" principle: only add orchestration complexity when single-prompt performance benchmarks start to plateau.
Here are the core patterns they identified:
Prompt Chaining
Break a task into sequential steps, where each LLM call processes the output of the previous one.
Good for: Tasks that cleanly decompose into fixed subtasks. Write an outline, validate it, then write the document.
Routing
Classify the input and direct it to a specialized handler.
Good for: Different input types that need different expertise. Route easy queries to smaller models, hard ones to larger models.
Parallelization
Run multiple agents simultaneously and aggregate results.
Good for: Independent subtasks or when you want multiple perspectives (voting/review).
Orchestrator-Workers
A central agent dynamically breaks down tasks, delegates to workers, and synthesizes results.
Good for: Complex tasks where you can't predict the subtasks ahead of time. Like coding agents that need to modify an unknown number of files.
Evaluator-Optimizer
One agent generates, another evaluates in a loop.
Good for: Tasks with clear evaluation criteria where iteration improves quality. Translation, complex search, code review.
Handoffs
OpenAI's Swarm framework emphasizes this: agents don't just complete tasks; they explicitly hand off control to other specialized agents when needed.
Good for: Dynamic workflows where the next step's expert is only known after the current step is done.
How Kilo Code Does It
I've been using Kilo Code's Orchestrator Mode, and it implements the orchestrator-workers pattern with some interesting twists.
When you give it a complex task, it can:
- Analyze and decompose the request into subtasks
- Delegate to specialized modes (code, architect, debug, ask)
- Run subtasks in isolation with their own context
- Resume with summaries so the parent doesn't get cluttered
The key insight: each subtask operates in complete isolation. It doesn't inherit the parent's bloated conversation history. Information flows down through explicit instructions and flows up through concise summaries.
There's also Parallel Mode which runs agents in isolated Git worktrees. Your main branch stays clean while the agent experiments in .kilocode/worktrees/feature-branch-abc123/. When it's done, you review and merge.
The Trade-offs
Orchestration isn't always the answer. Here's when it helps and when it hurts:
Pros
- Specialization: Each agent focuses on what it's good at
- Context isolation: No conversation bloat, no drift
- Parallelism: Independent tasks run simultaneously
- Resilience: One agent's failure doesn't corrupt the whole context
- Auditability: You can see what each agent contributed
Cons
- Latency: More agents means more LLM calls
- Cost: Each subtask consumes tokens
- Complexity: More moving parts to debug
- Context transfer: Information must be explicitly passed between agents
- Coordination overhead: The orchestrator itself can become a bottleneck
When to Orchestrate
I've found orchestration valuable when:
- The task spans multiple domains (architecture + implementation + testing)
- Subtasks are genuinely independent (can run in parallel or any order)
- Context bloat is killing quality (the agent forgets early decisions)
- You want specialized behavior (strict code review vs exploratory coding)
I avoid orchestration when:
- A single prompt with good context is enough
- The task is simple enough to hold in one conversation
- Latency matters more than thoroughness
- The subtasks are tightly coupled (constant back-and-forth needed)
What About Long-Horizon Agents?
A fair question: if "Long-Horizon Agents" (LHAs) are becoming more capable, do we still need orchestration?
Think of LHAs as a Virtuoso Multi-Instrumentalist. They have the reasoning capacity to play a long, complex solo without losing the beat. OpenAI's o1, for example, uses Internalized Reasoning (RL-based chain-of-thought) to let the model solve multi-step problems within a single turn.
But even a virtuoso needs a backing band for a massive symphony. Orchestration is a strategy, whereas LHA is a capability. Most modern LHAs (like Devin or Kilo’s own Orchestrator) use orchestration under the hood to manage context and prevent the inevitable drift that comes with long-running sessions.
Towards Self-Driving Codebases
We're moving from "AI as a tool" to something more akin to Cursor's vision of Self-Driving Codebases. This shift relies on three key principles:
- Planner-Executor-Judge: A triad of roles ensuring every change is proposed with intent, executed with precision, and verified for correctness.
- Anti-fragility: Orchestration makes the process resilient. If one agent fails or loops, the orchestrator can break the loop or retry with a different specialist. The symphony plays on.
- Quantity as Signal: Setting the ambition. Instead of asking for "a fix," we ask for an analysis that might generate 20-100 tasks. That scale is only manageable through orchestration.
Git Worktrees: The Secret Sauce
If isolation is the heart of orchestration, Git Worktrees are the secret sauce.
Rather than the "stash dance" (stashing work, switching branches, forgetting where you were), worktrees allow each agent to live in its own literal folder on your machine. This pattern, seen in Kilo Code and cognitive architectures like Devin's, prevents "context hijacking."
While an agent works in .kilocode/worktrees/refactor-wave-1/, your main working directory stays exactly as you left it. I'm looking forward to trying this out more—it feels like the final hurdle to truly parallel AI development.
A Real Example: The Performance Pass
I wanted to see what orchestration looked like on a real task, not a toy problem. So I pointed Kilo at this very codebase with a simple request: run a performance optimization pass.
The orchestrator proposed a three-phase approach. Measure first, then optimize, then verify. Each phase would run multiple agents in parallel.
Phase 1: Measurement
Six agents launched simultaneously, each analyzing a different dimension:
The bundle agent ran npm run build and parsed the output. The API agent profiled every route in src/app/api/. The component agent hunted for 'use client' directives that shouldn't be there. And so on.
Each agent worked in isolation. They didn't see each other's progress or pollute each other's context. When they finished, each returned a structured report.
What They Found
The reports converged on a clear picture:
The blog post page was the smoking gun. Every MDX post was loading DitherShader, DryKeysQuest, and other heavy components whether it needed them or not. Sixty-five to ninety kilobytes of wasted JavaScript per page load.
Phase 2: Optimization
Six new agents launched, each tasked with fixing one area:
The bundle agent created dynamic import wrappers for CodeBlock and the heavy MDX components. The API agent added revalidate: 3600 to the GitHub route. The font agent removed Geist and Geist_Mono from layout.tsx entirely.
Each agent made changes independently. They created new files, modified existing ones, and followed the project's code style without being told. When one agent needed to know how another's work affected its task, it read the files directly.
Phase 3: Verification
One final command: npm run build.
The blog post page dropped from 421 KB to 160 KB. A 62% reduction. The heavy components now load on demand, only when a post actually uses them.
Lint passed. No errors. The orchestrator synthesized a summary of what changed and what still needed manual attention (the 8.6 MB of GIFs, which require video conversion, not code changes).
What This Taught Me
The whole thing took about ten minutes of wall-clock time. A single agent could have done the same analysis, but it would have been one long, meandering conversation. By the time it got to the fifth optimization, it would have forgotten what the first one found.
Instead, each agent had fresh context. The bundle agent didn't need to know about fonts. The font agent didn't need to know about APIs. They focused, delivered, and got out of the way.
Was it worth the overhead? For this task, absolutely. The parallelism meant six measurements happened in the time of one. The isolation meant no context pollution. The structure meant I could see exactly what each agent contributed.
For a simpler task, adding a single feature, I'd stick with one agent. But for a cross-cutting concern like performance, where multiple systems need simultaneous attention, orchestration is the right tool.
Another Example: The SOLID Refactoring Pass
A few weeks after the performance pass, I wanted to tackle something more ambitious: refactoring the entire codebase to follow SOLID principles. This wasn't about fixing one thing—it was about restructuring dozens of files across multiple domains.
The Scope
I asked Kilo to analyze the codebase and create a plan for SOLID refactoring. The analysis identified:
- 15 Single Responsibility violations (files doing too many things)
- 8 Open/Closed violations (hard-coded algorithms, switch statements)
- 4 Interface Segregation violations (bloated interfaces)
- 12 Dependency Inversion violations (tight coupling to external services)
The biggest offenders:
telegram-bot.ts(263 lines): bot init + 7 command handlers + database operations + formattingbloq.ts(350 lines): filesystem I/O + parsing + filtering + statisticsdither-shader.tsx(484 lines): 4 dithering algorithms + 4 color modes + image loading + canvas rendering
The Orchestration Plan
The orchestrator proposed a seven-wave approach, each wave building on the previous:
Wave 1: Foundation (5 agents in parallel)
Five agents launched simultaneously to create the foundation:
Each agent worked in isolation. The API utilities agent didn't know about the location service. The fingerprint agent didn't know about GitHub. They focused, delivered, and returned structured summaries.
Wave 2: Lib Reorganization (5 agents in parallel)
With the foundation in place, five agents tackled the large lib files:
The key insight: each agent read the original file, understood its structure, and created a modular replacement. The original files became thin re-export shells for backward compatibility:
Wave 3: API Routes (4 agents in parallel)
Four agents refactored API routes to use the new shared utilities:
Wave 4: Components (2 agents in parallel)
Two agents tackled the component violations:
The DitherShader refactor is worth examining. Before:
After:
Adding a new dither mode now means creating a new class, not modifying existing code.
Wave 5: Hooks (2 agents in parallel)
Wave 6: Data Files (2 agents in parallel)
Wave 7: Component Folder Structure
Final cleanup to organize components into consistent folder structure with index.ts exports.
The Results
The Messy Middle
It wasn't a smooth ride. The orchestrator broke twice—the build check failed, and it spiraled into "stupid loops" that I had to manually break. I actually got frustrated and took a day-long break, worried that it might somehow commit broken code directly to main.
When I returned, it was still broken with an obscure import reference that the build error didn't clearly explain. But then, I tried something simple: I fed that raw, messy error message directly back into the orchestrator's context. Magically, it clicked. The orchestrator identified the drift, fixed the reference, and the entire refactor finally succeeded. I couldn't be happier.
Build passed. Lint passed. All original import paths still work via re-exports.
What This Orchestration Achieved
- Single Responsibility: Each file now has one reason to change
- Open/Closed: Strategy pattern for algorithms, easy to extend
- Interface Segregation: Small, focused interfaces
- Dependency Inversion: Services abstract external dependencies
The parallelism was key. Wave 1's five agents created the foundation in parallel. Wave 2's five agents refactored lib files in parallel. Each wave built on the previous, but within each wave, agents worked independently.
Could a single agent have done this? Eventually. But it would have been one 8-hour conversation with constant context switching. Instead, 25+ focused agent sessions, each with fresh context, each delivering a specific piece.
What I Took Away
- Start simple. One agent, one conversation. Add orchestration when complexity demands it.
- Context is the enemy. The more you stuff into a single conversation, the more the agent forgets.
- Explicit is better. Orchestration forces you to define what each piece does. That clarity is valuable even if you don't use multiple agents.
- The orchestrator matters. A bad orchestrator creates more chaos than it solves. Design the coordination carefully.
- Summaries are an art. What gets passed between agents determines success. Too little context, the next agent is lost. Too much, you've recreated the bloat problem.
- Parallelism is real. Six agents measuring simultaneously beats one agent measuring six things sequentially. The time savings compound.
- Isolation enables focus. The font agent never saw the bundle analysis. It didn't need to. Fresh context, clean decisions.
- Verification closes the loop. The build output doesn't lie. Numbers before, numbers after. The orchestrator's job isn't done until the changes are proven.
Further Reading
- Building Effective Agents - Anthropic's engineering team on patterns they've seen work
- Self-Driving Codebases - Cursor's research into autonomous agent loops
- Swarm - OpenAI's lightweight multi-agent orchestration framework
- Learning to Reason with LLMs - OpenAI's research into internal chain-of-thought (o1)
- Kilo Code Orchestrator Mode - Documentation for Kilo's implementation
In the spirit of the topic, this entire article was drafted and refined by the very orchestrator it describes. If you've enjoyed the flow, credit the patterns, not just the model.