May 22, 2026 • 16 min read • Agentic Harness Engineering Series

Orchestration Patterns: Scaling to Multi-Agent Execution

Four patterns for coordinating multiple agents: DAG decomposition with cycle detection, Git worktree isolation for parallel state, MCP-based cross-instance routing, and the slash-command Skill Registry.

The patterns in posts 4–7 govern a single agent running a single task. Most production use cases eventually outgrow this. A research task that covers five distinct subtopics can saturate a single context window; running the subtopics sequentially serializes latency that is inherently parallelizable; and specialized subtasks (security analysis, financial modeling, literature review) belong on instances tuned for those domains, not a general-purpose local harness.

Section D addresses this scaling problem with four patterns: the DAG Orchestrator (D1) decomposes tasks into a dependency graph and executes independent subtasks concurrently; the Worktree Context (D2) gives each concurrent subtask an isolated filesystem without container overhead; the MCP Dispatch Router (D3) routes subtasks to remote harness instances over HTTP; and the Skill Registry (D4) extends the agent with callable slash-command handlers that can replace or augment the standard pipeline.

D1 — The DAG Orchestrator

Single-agent execution is a special case of task orchestration where the dependency graph has one node. The DAG Orchestrator generalizes this: harness/orchestrator.py calls the planner for a subtask decomposition, verifies the resulting graph is acyclic, executes independent subtasks concurrently, and assembles a cross-reference synthesis from the outputs.

def run(task: str, config: ModelConfig) -> str:
    plan = planner.plan(task, memory.get_context(task))
    subtasks = plan.subtasks  # list of {id, task, depends_on}

    # Check for cycles before launching any threads
    if not _check_dag_cycles(subtasks):
        raise OrchestratorError("cycle detected in subtask graph")

    results: dict[str, str] = {}
    errors: dict[str, str] = {}

    with ThreadPoolExecutor(max_workers=SUBTASK_MAX_WORKERS) as pool:
        futures = {
            pool.submit(_run_subtask, s, config): s
            for s in _ready(subtasks, results)
        }
        # Process futures as they complete; submit newly-ready tasks
        for future in as_completed(futures):
            subtask = futures[future]
            try:
                results[subtask["id"]] = future.result()
            except Exception as e:
                errors[subtask["id"]] = str(e)
            # Check if new tasks are now unblocked
            for s in _ready(subtasks, results):
                if s["id"] not in results and s["id"] not in errors:
                    futures[pool.submit(_run_subtask, s, config)] = s

    return _assemble(task, results, errors, config)

The cycle check runs before any threads are launched. Kahn's topological sort algorithm detects cycles in O(V + E) — a negligible cost that prevents the entire thread pool from deadlocking on a graph that will never complete. This is the right place to fail: loudly, immediately, before any compute is committed.

D1 — DAG Orchestrator: Parallel Subtask Execution

Independent subtasks execute concurrently in a thread pool. Dependent subtasks wait until their prerequisites complete. The assembler synthesizes a unified cross-reference output from all results.

The assembly step is the most underspecified component of the orchestrator. An assembler that receives N independent subtask outputs must synthesize coherent cross-references — identifying where subtopics connect, contradict, or amplify each other. This task degrades with N > 8 on models below 30B parameters; at that scale, the assembler begins dropping connections between distant subtasks. The practical limit for a 7–13B assembler model is 4–6 subtasks before assembly quality degrades.

D2 — The Worktree Context

Concurrent subtasks need filesystem isolation. The naive implementation — multiple agent loops writing to the same output directory — produces the "silent overwrite" failure described in the Harness Thesis: run 43–61 all recorded PASS status, but examination revealed that later runs were overwriting earlier outputs, and the evaluation was scoring the overwritten file. No error was raised.

The Worktree Context solves this without container overhead or process isolation. Before the orchestrator dispatches any subtask, it calls Git's worktree add to create an isolated linked working tree on its own branch:

@contextmanager
def worktree_context(subtask_id: str):
    branch = f"subtask/{subtask_id}"
    path = f".worktrees/{subtask_id}"
    try:
        subprocess.run(
            ["git", "worktree", "add", "-b", branch, path, "HEAD"],
            check=True
        )
        yield Path(path)
    finally:
        subprocess.run(["git", "worktree", "remove", "--force", path])
        subprocess.run(["git", "branch", "-D", branch])
        subprocess.run(["git", "worktree", "prune"])
D2 — Worktree Context: Git-Native State Isolation

Each subtask gets an isolated Git worktree on its own branch. All file operations resolve into the worktree root. Cleanup runs in a finally block regardless of failure mode.

The finally block guarantees cleanup regardless of what goes wrong inside the subtask. --force handles uncommitted changes left behind by a failed subtask. git worktree prune removes stale metadata from interrupted runs.

Worktree creation is instantaneous — it's a pointer into the shared object store, not a full repository clone. Disk usage equals one working tree copy per concurrent subtask. Long-running subtasks can commit intermediate progress inside their worktree (git commit -m "checkpoint: ..."); these commits persist in the shared object store even after the worktree is removed, enabling recovery from interrupted runs.

The pattern solves filesystem isolation. It does not solve coordination: subtasks cannot observe each other's intermediate findings while they are running. Cross-subtask information sharing requires the MCP Dispatch Router (D3) or explicit shared state outside the worktree (e.g., the memory store).

D3 — The MCP Dispatch Router

The DAG Orchestrator executes all subtasks locally by default. Some subtasks should run elsewhere: security analysis on a security team's hardened instance, financial analysis on a compliance-controlled instance, GPU-intensive literature review on a dedicated inference server. The MCP Dispatch Router extends the orchestrator with remote execution without modifying the local execution model.

# HARNESS_MCP_ENDPOINTS = '{"security": "http://sec-harness:8000", "finance": "http://fin-harness:8000"}'

def dispatch(subtask: dict, mcp_endpoints: dict) -> tuple[str, bool]:
    for keyword, endpoint in mcp_endpoints.items():
        if keyword.lower() in subtask["task"].lower():
            response = requests.post(f"{endpoint}/run_task", json={
                "task": subtask["task"],
                "session_id": subtask["id"]
            }, timeout=300)
            data = response.json()
            return data["content"], data["ok"]
    return None, False  # no match → falls through to local execution
D3 — MCP Dispatch Router: Keyword-Based Remote Routing

Subtask task strings are matched against configured keyword patterns. The first match routes the subtask to the corresponding remote endpoint. Unmatched subtasks execute locally.

Matching is first-match-wins and case-insensitive substring matching. The order of entries in HARNESS_MCP_ENDPOINTS determines priority when keywords overlap. Remote and local execution are interchangeable from the assembly step's perspective: both return a content string and an ok flag.

Error handling is best-effort: a failed remote subtask is recorded in the errors dict passed to _assemble(), which notes the gap in its synthesis rather than raising. This design keeps orchestration resilient to flaky remote instances at the cost of potentially incomplete assembly output.

D4 — The Skill Registry

The standard agent loop — plan, research, synthesize, evaluate, persist — covers the general case. Some tasks need a fundamentally different loop. A literature review needs arXiv fetching, multi-persona curation, and cluster synthesis. A browser navigation task needs Playwright, not a search API. An email draft needs OAuth and a send-confirmation step. These are not variations on the standard loop — they are different loops entirely.

The Skill Registry lets each of these live in its own handler without modifying the core agent:

_SKILLS: dict[str, Callable] = {
    "/lit-review": lit_review_skill.run,
    "/browser":    browser_skill.run,
    "/email":      email_skill.run,
    "/github":     github_skill.run,
    "/deck":       deck_skill.run,
}

def dispatch_cli(task: str, config: ModelConfig) -> str:
    for prefix, handler in _SKILLS.items():
        if task.startswith(prefix):
            args = task[len(prefix):].strip()
            return handler(args, config)
    return agent.run(task, config)  # fallback: standard loop
D4 — Skill Registry: Slash-Command Dispatch

Slash-prefixed tasks dispatch to registered skill handlers. Skills can implement pre-hooks, replace the agent loop, or add post-processing. Unmatched tasks fall through to the standard agent loop.

Adding a skill requires one entry in _SKILLS and one handler function. No modification to the agent loop. Skills that replace the agent loop bypass the Wiggum Loop by default — skills that want evaluation must call wiggum.loop() explicitly. The five built-in skills span a range from high-leverage (the /lit-review skill runs a seven-stage pipeline that would take hours to replicate with the standard loop) to low-leverage (the /deck skill adds markdown-to-PowerPoint post-processing that could also be a post-hook).

The practical rule for when to implement a task as a skill versus handling it through the standard agent loop: if the task requires a fundamentally different information retrieval strategy, or a send/write action with irreversible external effects that requires an explicit confirmation step, it belongs in a skill. Everything else belongs in the standard loop.

D5 — The Agent Channel

The DAG Orchestrator (D1) launches subtasks as independent subprocesses and waits for them to write output files. For tasks that run for minutes, this is a black box: the parent has no visibility into whether a subtask is making progress, has hit a recoverable blocker, or is about to produce a finding worth surfacing before final synthesis. The Agent Channel closes this gap with a lightweight file-based IPC layer.

Before spawning subtasks, the orchestrator creates a temporary directory and passes its path as HARNESS_MSG_CHANNEL and a per-subtask index as HARNESS_SUBTASK_IDX. Inside the subprocess, the agent calls send_message() to report state changes:

from harness.agent_channel import send_message

# from inside a running subtask subprocess:
send_message("parent", "progress", "round 2 of 3 complete — 4 sources retrieved")
send_message("parent", "finding",  "key result: cache hit rate dropped 40% after model swap")
send_message("parent", "blocker",  "search API rate-limited — retrying in 15s")

Each call atomically writes a JSON file to the channel directory via write-then-rename, preventing partial reads by the orchestrator's polling thread. The parent calls poll_messages() on a background thread and emits [EVENT] lines to the dashboard WebSocket for each new message. Four message types cover the lifecycle: "progress" (logged and forwarded), "finding" (surfaced before synthesis), "blocker" (logged; parent continues), "done" (explicit early-completion signal). The channel is a no-op when HARNESS_MSG_CHANNEL is unset — subtasks that don't call send_message() work identically to before, so the pattern is purely additive.

Research context: The multi-component decomposition underlying D1–D5 is validated by a growing body of planning research. Decomposing tool use into separate planner, caller, and summarizer roles consistently outperforms monolithic single-agent approaches, particularly at smaller model sizes where a single model cannot manage combined planning, invocation, and synthesis reliably. (arXiv:2401.07324) MCTS-inspired planning achieves ~10% improvement over greedy reactive strategies by accounting for inter-tool dependencies — the same dependency-awareness that Kahn’s algorithm enforces in the DAG Orchestrator. (arXiv:2603.12740) Separately, a study of 12 commercial planning agents found they bypassed safety constraints in over 92% of cases when no explicit safety prompts were present, with near-deterministic bypass for web-use agents. (arXiv:2601.10758) The E-section security patterns in the next post exist specifically because orchestration at scale produces exactly this failure mode.

The next post covers Section E — Security Patterns — which constrain what the agent can do to the host system (AST Guard, Path Sandbox) and what external content can do to the agent's memory (Injection Scanner, CDP Guard).

← Previous 7 · Verification Patterns Next → 9 · Security Patterns