May 28, 2026 • 6 min read • Agentic Harness Engineering

The Planner: Two-Pass Pre-Research Analysis

Before a single web search runs, planner.py makes two fast glm4:9b calls — a prior knowledge assessment and a structured plan — to ensure searches target actual gaps and synthesis has context about what was already known.

Naive research agents generate search queries directly from the task string. The planner takes a different approach: ask the model what it already knows before searching, identify specific gaps that web search would fill, then generate queries that target those gaps. Two small LLM calls before any expensive search or synthesis work begins.

Pass 1: prior knowledge assessment

task + memory context → glm4:9b (temp 0.1) → known_facts[] + gaps[]

The prior knowledge prompt asks two things: what the model already knows with confidence, and what it would need to look up. If memory covers the task well, gaps come back empty — and the pipeline synthesizes without searching at all.

The model responds with exactly two JSON arrays. The gaps array is what matters most — it directly seeds the search queries in Pass 2. If the memory store already has a strong match (a prior run on the same topic), the model is instructed to return an empty gaps array, and the pipeline can synthesize from memory without any new searches. This is the mechanism that makes repeated similar tasks fast: the planner detects redundancy before any DDGS calls are made.

This pass is skippable via HARNESS_SKIP_PRIOR_KNOWLEDGE=1 — useful for controlled autoresearch experiments where you want to eliminate the variable of prior knowledge conditioning.

Pass 2: structured plan

The main plan prompt receives the task, any memory context, and the known_facts/gaps from Pass 1. It produces a JSON plan with seven fields:

Field	Values	How it's used downstream
`task_type`	`enumerated` \| `best_practices` \| `research`	Selects `SYNTH_INSTRUCTION` variant (count vs technical vs prose)
`complexity`	`low` \| `medium` \| `high`	Logged to run trace; potentially used by future routing logic
`expected_sections`	integer or null	Used in count-check fallback when the regex doesn't find a number in the task
`search_queries`	list of 2 strings	Replaces auto-generated queries in `gather_research()`
`prior_work_summary`	one sentence or empty	Injected into synthesis prompt via `synthesis_context()`
`notes`	one actionable sentence	Injected into synthesis prompt — e.g. "specificity was weak last time — include concrete tool names"
`subtasks`	list of research directives	Triggers multi-step orchestration; filtered to remove assembly steps

The subtask filter

Subtasks are only populated for tasks that explicitly require synthesizing across multiple distinct domains. The planner is instructed to list 2–3 self-contained web research directives and nothing else — no "synthesize", "assemble", "combine", or "write" steps, because those are the orchestrator's job.

Despite this instruction, models sometimes include assembly steps anyway. _parse_plan() applies a post-hoc filter against a set of assembly verbs before returning the plan:

_ASSEMBLY_WORDS = re.compile(
    r'\b(synthesize|synthesise|assemble|combine|integrate|unify|merge|compile|write|create)\b',
    re.IGNORECASE,
)
subtasks = [
    s for s in raw_subtasks
    if not _ASSEMBLY_WORDS.search(s)
]

This prevents the orchestrator from recursively delegating its own synthesis step back to the planner.

Synthesis context injection

The Plan.synthesis_context() method builds a formatted block that gets prepended to the synthesis prompt alongside the memory context:

**Prior work:** (one sentence summary of relevant past runs)
**Planner note:** specificity was weak last time — include concrete tool names and version numbers
**Verified facts (no search needed):**
- Python's asyncio event loop uses a single thread by default
- The GIL is released during I/O-bound operations

The planner note is particularly valuable for autoresearch experiments: if the evaluator consistently flags a specific weakness (depth, specificity, examples), the planner can carry that feedback forward as a synthesis instruction that supplements SYNTH_INSTRUCTION. This is a second channel for improving output quality alongside the instruction-level mutation that autoresearch performs on SYNTH_INSTRUCTION itself.

Robustness

Both passes use temperature: 0.1 to get deterministic, structured JSON. Both use dirtyjson as a fallback parser for common model-output defects (unescaped quotes, trailing commas). If both parsers fail, _extract_string_list() tries regex extraction of just the array values. If everything fails, the planner returns a Plan() with safe defaults — the pipeline continues with empty queries (auto-generated instead) and no injected context. The planner never raises.

The planner uses glm4:9b regardless of which producer model is configured. This is intentional: planning is a structured JSON generation task where a 9B model is fast (2–5 seconds) and adequate. Using the full 32B producer for planning would add ~30 seconds of latency with no meaningful improvement in plan quality.

Pass 1: prior knowledge assessment

Pass 2: structured plan

The subtask filter

Synthesis context injection

Robustness

Related posts