May 26, 2026 • 7 min read • Agentic Harness Engineering

The Wiki Sync Skill: Deterministic Source Extraction and Gap-Targeted Code Injection

No LLM required — regex reads six source files and writes an idempotent Implementation Reference section into wiki/pipeline.md. When Wiggum flags a knowledge gap in the evaluator's issues, a second pass injects the actual source code for the flagged functions.

The harness maintains a wiki/pipeline.md document describing the system architecture. The human-written sections cover the pipeline diagram, stage descriptions, and design decisions. But the implementation details — which model is active, what the pass threshold is set to, how the memory ranking formula is weighted — change as autoresearch and experiments evolve. Keeping the wiki accurate by hand is a maintenance tax that compounds with every experiment.

wiki_sync.py solves this deterministically: it reads six Python source files with regex, extracts the current values of key constants and prompts, and writes a structured markdown section directly into the wiki. No LLM hallucination, no stale documentation.

Regular sync: what gets extracted

The sync() function reads agent.py, wiggum.py, planner.py, orchestrator.py, memory.py, and inference.py and produces six subsections:

Regular sync: regex reads six source files and writes six structured sections into wiki/pipeline.md

Models by stage

Producer model (MODEL from agent.py) with HARNESS_PRODUCER_MODEL override env var
Wiggum producer and evaluator models with their respective override env vars
Planner model and orchestrator assembly model

Key constants (9 values)

MAX_SEARCH_ROUNDS, NOVELTY_THRESHOLD, NOVELTY_EPSILON from agent.py
PASS_THRESHOLD, MAX_ROUNDS from wiggum.py
MAX_CONTEXT_OBSERVATIONS, SEMANTIC_CANDIDATES from memory.py
SUBTASK_MAX_WORKERS, SUBTASK_MAX_RETRIES from orchestrator.py

Wiggum evaluation weights

All five scoring dimensions extracted from the EVAL_PROMPT triple-quoted string in wiggum.py
Sorted by weight descending: depth (0.30), completeness (0.25), relevance (0.20), specificity (0.15), structure (0.10)

Memory retrieval ranking formula

Exact blending coefficients extracted from memory.py via regex
Quality floor value and deduplication logic documented
SEMANTIC_CANDIDATES over-fetch count and final injection count

SYNTH_INSTRUCTION

Current active synthesis prompt extracted from the AUTORESEARCH:SYNTH_INSTRUCTION:BEGIN sentinel block
Truncated to 600 chars if longer — enough to show structure without overwhelming the wiki

Model map (Ollama → vLLM)

_MODEL_MAP literal from inference.py — maps Ollama tags to HuggingFace model IDs
Used when INFERENCE_BACKEND=vllm to resolve which checkpoint to load

Idempotent marker blocks

The generated section is wrapped in HTML comment markers:


## Implementation Reference
*Auto-generated by `/sync-wiki` on 2026-06-01. Do not edit.*
…tables and extracted values…


On re-run, the regex replaces the entire block between the markers rather than appending. The human-written sections above and below are untouched. This means python wiki_sync.py is safe to run after every significant pipeline change — it brings the wiki current without touching editorial content.

Gap-targeted sync: code injection from Wiggum failures

The second mode, sync_gaps(issues), is triggered automatically during a Wiggum FAIL cycle. When the evaluator's revision suggestions contain phrases like "unclear how evaluation criteria", "how the synthesis prompt constructs", or "how search results are filtered", the skill identifies which gap pattern was triggered and injects the relevant source code directly into the wiki.

There are nine gap patterns, each with a set of trigger substrings and a list of source code targets to extract:

Gap-targeted sync: evaluator issues trigger pattern matching, which extracts live source code into the wiki gaps section

Planning prompts

Triggers: "plan prompt", "task classification", "complexity analysis". Extracts PRIOR_KNOWLEDGE_PROMPT and PLAN_PROMPT heredocs from planner.py.

Evaluation prompt

Triggers: "evaluation criteria", "revision logic", "evaluation loop". Extracts full EVAL_PROMPT string from wiggum.py (up to 40 lines).

Synthesis construction

Triggers: "synthesis prompt", "how LLM is instructed", "formatting rules". Extracts the synthesize() function body from agent.py.

Novelty scoring

Triggers: "novelty", "search filtering", "how results are filtered". Extracts assess_novelty() from memory.py.

ChromaDB / embedding

Triggers: "chromadb", "embedding", "semantic similarity threshold". Extracts _get_chroma_ef() and _get_chroma().

Memory compression

Triggers: "compression model", "how compressed". Extracts compress_and_store() from memory.py (up to 30 lines).

Research gathering loop

Triggers: "gather_research", "saturation loop", "quality floor". Extracts gather_research() from agent.py (up to 35 lines).

Planning — make_plan()

Triggers: "how queries are determined", "plan queries", "complexity score". Extracts make_plan() from planner.py.

Extracted code appears as fenced Python blocks in a  marker section, regenerated on each FAIL cycle. This means the wiki always contains the source code most relevant to whatever the evaluator is currently complaining about.

Context injection for /contextualize

The get_relevant_wiki_context() function assembles a targeted slice of pipeline.md for the /contextualize pre-research skill. It returns three parts in sequence, capped at 8,000 characters total:

Human-written body (pipeline diagram + architectural overview, up to 3,000 chars)
The Implementation Reference marker block (constants, models, weights)
The Gap-Targeted Extractions block (source code for currently flagged issues)

This gives the producer model accurate function names, current constant values, and (when relevant) actual source code to cite — without dumping the entire wiki into the context window.

The gap-targeted extraction is a lightweight form of automated self-documentation: when the system fails on a gap, it writes the relevant code into the wiki so future runs have better grounding. It doesn't require the model to understand the code — it just needs to recognize trigger phrases in the evaluator's natural-language feedback.

Regular sync: what gets extracted

Models by stage

Key constants (9 values)

Wiggum evaluation weights

Memory retrieval ranking formula

SYNTH_INSTRUCTION

Model map (Ollama → vLLM)

Idempotent marker blocks

Gap-targeted sync: code injection from Wiggum failures

Planning prompts

Evaluation prompt

Synthesis construction

Novelty scoring

ChromaDB / embedding

Memory compression

Research gathering loop

Planning — make_plan()

Context injection for /contextualize

Related posts