From Soloist to Symphony

All Bloqs

xxx

Mar 8, 2026

12 min read

From Soloist to Symphony

Lessons learned from coordinating 25+ agent sessions: solving context bloat with isolation and specialization.

#ai

#architecture

#debugging

#reflections

# The One-Agent Problem

"Context bloat is the 'forget-me-not' of the LLM world. Treat your agent like a generalist, and by line 500, they'll be confidently hallucinating your own API."

I used to treat AI agents like a single omniscient developer. One conversation, one context, one agent doing everything from architecture decisions to debugging typos. It worked, mostly. But the context would bloat, the focus would drift, and by the time we got to the third subtask, the agent had forgotten what the first one was about.

Sometimes it is the model. But more often, it's the impossible expectation of asking a single brain to be its own architect, coder, and critic all at once without losing the thread.

# What Is Agent Orchestration?

Agent orchestration is the practice of coordinating multiple AI agents to work together on complex tasks. Instead of one agent doing everything, you have specialized agents handling different parts of the work, with an orchestrator coordinating the flow.

Think of it like a development team. You wouldn't ask your frontend specialist to architect the database schema, or your DevOps engineer to write React components. You'd match the task to the expertise.

# The Orchestration Patterns

In Building Effective Agents, Anthropic's engineering team provides a brilliant taxonomy for this. They distinguish between Workflows (predefined, predictable paths) and Agents (dynamic decision-makers), while emphasizing the "Simplest is Best" principle: only add orchestration complexity when single-prompt performance benchmarks start to plateau.

Here are the core patterns they identified:

# Prompt Chaining

Break a task into sequential steps, where each LLM call processes the output of the previous one.

Good for: Tasks that cleanly decompose into fixed subtasks. Write an outline, validate it, then write the document.

# Routing

Classify the input and direct it to a specialized handler.

Good for: Different input types that need different expertise. Route easy queries to smaller models, hard ones to larger models.

# Parallelization

Run multiple agents simultaneously and aggregate results.

Good for: Independent subtasks or when you want multiple perspectives (voting/review).

# Orchestrator-Workers

A central agent dynamically breaks down tasks, delegates to workers, and synthesizes results.

Good for: Complex tasks where you can't predict the subtasks ahead of time. Like coding agents that need to modify an unknown number of files.

# Evaluator-Optimizer

One agent generates, another evaluates in a loop.

Good for: Tasks with clear evaluation criteria where iteration improves quality. Translation, complex search, code review.

# Handoffs

OpenAI's Swarm framework emphasizes this: agents don't just complete tasks; they explicitly hand off control to other specialized agents when needed.

Good for: Dynamic workflows where the next step's expert is only known after the current step is done.

# How Kilo Code Does It

I've been using Kilo Code's Orchestrator Mode, and it implements the orchestrator-workers pattern with some interesting twists.

When you give it a complex task, it can:

Analyze and decompose the request into subtasks
Delegate to specialized modes (code, architect, debug, ask)
Run subtasks in isolation with their own context
Resume with summaries so the parent doesn't get cluttered

The key insight: each subtask operates in complete isolation. It doesn't inherit the parent's bloated conversation history. Information flows down through explicit instructions and flows up through concise summaries.

There's also Parallel Mode which runs agents in isolated Git worktrees. Your main branch stays clean while the agent experiments in .kilocode/worktrees/feature-branch-abc123/. When it's done, you review and merge.

# The Trade-offs

Orchestration isn't always the answer. Here's when it helps and when it hurts:

# Pros

Specialization: Each agent focuses on what it's good at
Context isolation: No conversation bloat, no drift
Parallelism: Independent tasks run simultaneously
Resilience: One agent's failure doesn't corrupt the whole context
Auditability: You can see what each agent contributed

# Cons

Latency: More agents means more LLM calls
Cost: Each subtask consumes tokens
Complexity: More moving parts to debug
Context transfer: Information must be explicitly passed between agents
Coordination overhead: The orchestrator itself can become a bottleneck

# When to Orchestrate

I've found orchestration valuable when:

The task spans multiple domains (architecture + implementation + testing)
Subtasks are genuinely independent (can run in parallel or any order)
Context bloat is killing quality (the agent forgets early decisions)
You want specialized behavior (strict code review vs exploratory coding)

I avoid orchestration when:

A single prompt with good context is enough
The task is simple enough to hold in one conversation
Latency matters more than thoroughness
The subtasks are tightly coupled (constant back-and-forth needed)

# What About Long-Horizon Agents?

A fair question: if "Long-Horizon Agents" (LHAs) are becoming more capable, do we still need orchestration?

Think of LHAs as a Virtuoso Multi-Instrumentalist. They have the reasoning capacity to play a long, complex solo without losing the beat. OpenAI's o1, for example, uses Internalized Reasoning (RL-based chain-of-thought) to let the model solve multi-step problems within a single turn.

But even a virtuoso needs a backing band for a massive symphony. Orchestration is a strategy, whereas LHA is a capability. Most modern LHAs (like Devin or Kilo’s own Orchestrator) use orchestration under the hood to manage context and prevent the inevitable drift that comes with long-running sessions.

# Towards Self-Driving Codebases

We're moving from "AI as a tool" to something more akin to Cursor's vision of Self-Driving Codebases. This shift relies on three key principles:

Planner-Executor-Judge: A triad of roles ensuring every change is proposed with intent, executed with precision, and verified for correctness.
Anti-fragility: Orchestration makes the process resilient. If one agent fails or loops, the orchestrator can break the loop or retry with a different specialist. The symphony plays on.
Quantity as Signal: Setting the ambition. Instead of asking for "a fix," we ask for an analysis that might generate 20-100 tasks. That scale is only manageable through orchestration.

# Git Worktrees: The Secret Sauce

If isolation is the heart of orchestration, Git Worktrees are the secret sauce.

Rather than the "stash dance" (stashing work, switching branches, forgetting where you were), worktrees allow each agent to live in its own literal folder on your machine. This pattern, seen in Kilo Code and cognitive architectures like Devin's, prevents "context hijacking."

While an agent works in .kilocode/worktrees/refactor-wave-1/, your main working directory stays exactly as you left it. I'm looking forward to trying this out more—it feels like the final hurdle to truly parallel AI development.

# A Real Example: The Performance Pass

I wanted to see what orchestration looked like on a real task, not a toy problem. So I pointed Kilo at this very codebase with a simple request: run a performance optimization pass.

The orchestrator proposed a three-phase approach. Measure first, then optimize, then verify. Each phase would run multiple agents in parallel.

# Phase 1: Measurement

Six agents launched simultaneously, each analyzing a different dimension:

The bundle agent ran npm run build and parsed the output. The API agent profiled every route in src/app/api/. The component agent hunted for 'use client' directives that shouldn't be there. And so on.

Each agent worked in isolation. They didn't see each other's progress or pollute each other's context. When they finished, each returned a structured report.

# What They Found

The reports converged on a clear picture:

The blog post page was the smoking gun. Every MDX post was loading DitherShader, DryKeysQuest, and other heavy components whether it needed them or not. Sixty-five to ninety kilobytes of wasted JavaScript per page load.

# Phase 2: Optimization

Six new agents launched, each tasked with fixing one area:

The bundle agent created dynamic import wrappers for CodeBlock and the heavy MDX components. The API agent added revalidate: 3600 to the GitHub route. The font agent removed Geist and Geist_Mono from layout.tsx entirely.

Each agent made changes independently. They created new files, modified existing ones, and followed the project's code style without being told. When one agent needed to know how another's work affected its task, it read the files directly.

# Phase 3: Verification

One final command: npm run build.

The blog post page dropped from 421 KB to 160 KB. A 62% reduction. The heavy components now load on demand, only when a post actually uses them.

Lint passed. No errors. The orchestrator synthesized a summary of what changed and what still needed manual attention (the 8.6 MB of GIFs, which require video conversion, not code changes).

# What This Taught Me

The whole thing took about ten minutes of wall-clock time. A single agent could have done the same analysis, but it would have been one long, meandering conversation. By the time it got to the fifth optimization, it would have forgotten what the first one found.

Instead, each agent had fresh context. The bundle agent didn't need to know about fonts. The font agent didn't need to know about APIs. They focused, delivered, and got out of the way.

Was it worth the overhead? For this task, absolutely. The parallelism meant six measurements happened in the time of one. The isolation meant no context pollution. The structure meant I could see exactly what each agent contributed.

For a simpler task, adding a single feature, I'd stick with one agent. But for a cross-cutting concern like performance, where multiple systems need simultaneous attention, orchestration is the right tool.

# Another Example: The SOLID Refactoring Pass

A few weeks after the performance pass, I wanted to tackle something more ambitious: refactoring the entire codebase to follow SOLID principles. This wasn't about fixing one thing—it was about restructuring dozens of files across multiple domains.

# The Scope

I asked Kilo to analyze the codebase and create a plan for SOLID refactoring. The analysis identified:

15 Single Responsibility violations (files doing too many things)
8 Open/Closed violations (hard-coded algorithms, switch statements)
4 Interface Segregation violations (bloated interfaces)
12 Dependency Inversion violations (tight coupling to external services)

The biggest offenders:

telegram-bot.ts (263 lines): bot init + 7 command handlers + database operations + formatting
bloq.ts (350 lines): filesystem I/O + parsing + filtering + statistics
dither-shader.tsx (484 lines): 4 dithering algorithms + 4 color modes + image loading + canvas rendering

# The Orchestration Plan

The orchestrator proposed a seven-wave approach, each wave building on the previous:

# Wave 1: Foundation (5 agents in parallel)

Five agents launched simultaneously to create the foundation:

Each agent worked in isolation. The API utilities agent didn't know about the location service. The fingerprint agent didn't know about GitHub. They focused, delivered, and returned structured summaries.

# Wave 2: Lib Reorganization (5 agents in parallel)

With the foundation in place, five agents tackled the large lib files:

The key insight: each agent read the original file, understood its structure, and created a modular replacement. The original files became thin re-export shells for backward compatibility:

# Wave 3: API Routes (4 agents in parallel)

Four agents refactored API routes to use the new shared utilities:

# Wave 4: Components (2 agents in parallel)

Two agents tackled the component violations:

The DitherShader refactor is worth examining. Before:

After:

Adding a new dither mode now means creating a new class, not modifying existing code.

# Wave 5: Hooks (2 agents in parallel)

# Wave 6: Data Files (2 agents in parallel)

# Wave 7: Component Folder Structure

Final cleanup to organize components into consistent folder structure with index.ts exports.

# The Results

# The Messy Middle

It wasn't a smooth ride. The orchestrator broke twice—the build check failed, and it spiraled into "stupid loops" that I had to manually break. I actually got frustrated and took a day-long break, worried that it might somehow commit broken code directly to main.

When I returned, it was still broken with an obscure import reference that the build error didn't clearly explain. But then, I tried something simple: I fed that raw, messy error message directly back into the orchestrator's context. Magically, it clicked. The orchestrator identified the drift, fixed the reference, and the entire refactor finally succeeded. I couldn't be happier.

Build passed. Lint passed. All original import paths still work via re-exports.

# What This Orchestration Achieved

Single Responsibility: Each file now has one reason to change
Open/Closed: Strategy pattern for algorithms, easy to extend
Interface Segregation: Small, focused interfaces
Dependency Inversion: Services abstract external dependencies

The parallelism was key. Wave 1's five agents created the foundation in parallel. Wave 2's five agents refactored lib files in parallel. Each wave built on the previous, but within each wave, agents worked independently.

Could a single agent have done this? Eventually. But it would have been one 8-hour conversation with constant context switching. Instead, 25+ focused agent sessions, each with fresh context, each delivering a specific piece.

# What I Took Away

Start simple. One agent, one conversation. Add orchestration when complexity demands it.
Context is the enemy. The more you stuff into a single conversation, the more the agent forgets.
Explicit is better. Orchestration forces you to define what each piece does. That clarity is valuable even if you don't use multiple agents.
The orchestrator matters. A bad orchestrator creates more chaos than it solves. Design the coordination carefully.
Summaries are an art. What gets passed between agents determines success. Too little context, the next agent is lost. Too much, you've recreated the bloat problem.
Parallelism is real. Six agents measuring simultaneously beats one agent measuring six things sequentially. The time savings compound.
Isolation enables focus. The font agent never saw the bundle analysis. It didn't need to. Fresh context, clean decisions.
Verification closes the loop. The build output doesn't lie. Numbers before, numbers after. The orchestrator's job isn't done until the changes are proven.

# Further Reading

Building Effective Agents - Anthropic's engineering team on patterns they've seen work
Self-Driving Codebases - Cursor's research into autonomous agent loops
Swarm - OpenAI's lightweight multi-agent orchestration framework
Learning to Reason with LLMs - OpenAI's research into internal chain-of-thought (o1)
Kilo Code Orchestrator Mode - Documentation for Kilo's implementation

In the spirit of the topic, this entire article was drafted and refined by the very orchestrator it describes. If you've enjoyed the flow, credit the patterns, not just the model.

xxx

Fast, Slow, and Slightly Outsourced

An exploration of Tri-System Theory and the subtle line between 'cognitive offloading' and 'cognitive surrender'; with some personal guardrails I’m building to stop my own engineering brain from atrophying.

xxx

The Workflow I Kept Postponing Was the Point

The embarrassing realization that the engineering discipline I kept treating as overhead was the actual work. A post-mortem on repeated failures, the skill gaps behind them, and what a real operating model looks like when the workflow finally has to contain wisdom.

All Bloqs

xxx

Mar 8, 2026

12 min read

From Soloist to Symphony

Lessons learned from coordinating 25+ agent sessions: solving context bloat with isolation and specialization.

#ai

#architecture

#debugging

#reflections

# The One-Agent Problem

"Context bloat is the 'forget-me-not' of the LLM world. Treat your agent like a generalist, and by line 500, they'll be confidently hallucinating your own API."

Sometimes it is the model. But more often, it's the impossible expectation of asking a single brain to be its own architect, coder, and critic all at once without losing the thread.

# What Is Agent Orchestration?

# The Orchestration Patterns

Here are the core patterns they identified:

# Prompt Chaining

Break a task into sequential steps, where each LLM call processes the output of the previous one.

Good for: Tasks that cleanly decompose into fixed subtasks. Write an outline, validate it, then write the document.

# Routing

Classify the input and direct it to a specialized handler.

Good for: Different input types that need different expertise. Route easy queries to smaller models, hard ones to larger models.

# Parallelization

Run multiple agents simultaneously and aggregate results.

Good for: Independent subtasks or when you want multiple perspectives (voting/review).

# Orchestrator-Workers

A central agent dynamically breaks down tasks, delegates to workers, and synthesizes results.

Good for: Complex tasks where you can't predict the subtasks ahead of time. Like coding agents that need to modify an unknown number of files.

# Evaluator-Optimizer

One agent generates, another evaluates in a loop.

Good for: Tasks with clear evaluation criteria where iteration improves quality. Translation, complex search, code review.

# Handoffs

OpenAI's Swarm framework emphasizes this: agents don't just complete tasks; they explicitly hand off control to other specialized agents when needed.

Good for: Dynamic workflows where the next step's expert is only known after the current step is done.

# How Kilo Code Does It

I've been using Kilo Code's Orchestrator Mode, and it implements the orchestrator-workers pattern with some interesting twists.

When you give it a complex task, it can:

Analyze and decompose the request into subtasks
Delegate to specialized modes (code, architect, debug, ask)
Run subtasks in isolation with their own context
Resume with summaries so the parent doesn't get cluttered

# The Trade-offs

Orchestration isn't always the answer. Here's when it helps and when it hurts:

# Pros

Specialization: Each agent focuses on what it's good at
Context isolation: No conversation bloat, no drift
Parallelism: Independent tasks run simultaneously
Resilience: One agent's failure doesn't corrupt the whole context
Auditability: You can see what each agent contributed

# Cons

Latency: More agents means more LLM calls
Cost: Each subtask consumes tokens
Complexity: More moving parts to debug
Context transfer: Information must be explicitly passed between agents
Coordination overhead: The orchestrator itself can become a bottleneck

# When to Orchestrate

I've found orchestration valuable when:

The task spans multiple domains (architecture + implementation + testing)
Subtasks are genuinely independent (can run in parallel or any order)
Context bloat is killing quality (the agent forgets early decisions)
You want specialized behavior (strict code review vs exploratory coding)

I avoid orchestration when:

A single prompt with good context is enough
The task is simple enough to hold in one conversation
Latency matters more than thoroughness
The subtasks are tightly coupled (constant back-and-forth needed)

# What About Long-Horizon Agents?

A fair question: if "Long-Horizon Agents" (LHAs) are becoming more capable, do we still need orchestration?

# Towards Self-Driving Codebases

We're moving from "AI as a tool" to something more akin to Cursor's vision of Self-Driving Codebases. This shift relies on three key principles:

Planner-Executor-Judge: A triad of roles ensuring every change is proposed with intent, executed with precision, and verified for correctness.
Anti-fragility: Orchestration makes the process resilient. If one agent fails or loops, the orchestrator can break the loop or retry with a different specialist. The symphony plays on.
Quantity as Signal: Setting the ambition. Instead of asking for "a fix," we ask for an analysis that might generate 20-100 tasks. That scale is only manageable through orchestration.

# Git Worktrees: The Secret Sauce

If isolation is the heart of orchestration, Git Worktrees are the secret sauce.

# A Real Example: The Performance Pass

I wanted to see what orchestration looked like on a real task, not a toy problem. So I pointed Kilo at this very codebase with a simple request: run a performance optimization pass.

The orchestrator proposed a three-phase approach. Measure first, then optimize, then verify. Each phase would run multiple agents in parallel.

# Phase 1: Measurement

Six agents launched simultaneously, each analyzing a different dimension:

Each agent worked in isolation. They didn't see each other's progress or pollute each other's context. When they finished, each returned a structured report.

# What They Found

The reports converged on a clear picture:

# Phase 2: Optimization

Six new agents launched, each tasked with fixing one area:

# Phase 3: Verification

One final command: npm run build.

The blog post page dropped from 421 KB to 160 KB. A 62% reduction. The heavy components now load on demand, only when a post actually uses them.

Lint passed. No errors. The orchestrator synthesized a summary of what changed and what still needed manual attention (the 8.6 MB of GIFs, which require video conversion, not code changes).

15 Single Responsibility violations (files doing too many things)
8 Open/Closed violations (hard-coded algorithms, switch statements)
4 Interface Segregation violations (bloated interfaces)
12 Dependency Inversion violations (tight coupling to external services)

The biggest offenders:

telegram-bot.ts (263 lines): bot init + 7 command handlers + database operations + formatting
bloq.ts (350 lines): filesystem I/O + parsing + filtering + statistics
dither-shader.tsx (484 lines): 4 dithering algorithms + 4 color modes + image loading + canvas rendering

After:

Adding a new dither mode now means creating a new class, not modifying existing code.

Single Responsibility: Each file now has one reason to change
Open/Closed: Strategy pattern for algorithms, easy to extend
Interface Segregation: Small, focused interfaces
Dependency Inversion: Services abstract external dependencies

# What I Took Away

Start simple. One agent, one conversation. Add orchestration when complexity demands it.
Context is the enemy. The more you stuff into a single conversation, the more the agent forgets.
Explicit is better. Orchestration forces you to define what each piece does. That clarity is valuable even if you don't use multiple agents.
The orchestrator matters. A bad orchestrator creates more chaos than it solves. Design the coordination carefully.
Summaries are an art. What gets passed between agents determines success. Too little context, the next agent is lost. Too much, you've recreated the bloat problem.
Parallelism is real. Six agents measuring simultaneously beats one agent measuring six things sequentially. The time savings compound.
Isolation enables focus. The font agent never saw the bundle analysis. It didn't need to. Fresh context, clean decisions.
Verification closes the loop. The build output doesn't lie. Numbers before, numbers after. The orchestrator's job isn't done until the changes are proven.

# Further Reading

Building Effective Agents - Anthropic's engineering team on patterns they've seen work
Self-Driving Codebases - Cursor's research into autonomous agent loops
Swarm - OpenAI's lightweight multi-agent orchestration framework
Learning to Reason with LLMs - OpenAI's research into internal chain-of-thought (o1)
Kilo Code Orchestrator Mode - Documentation for Kilo's implementation

In the spirit of the topic, this entire article was drafted and refined by the very orchestrator it describes. If you've enjoyed the flow, credit the patterns, not just the model.

xxx

Fast, Slow, and Slightly Outsourced

xxx

The Workflow I Kept Postponing Was the Point