The Wiki Sync Skill: Deterministic Source Extraction and Gap-Targeted Code Injection
No LLM required — regex reads six source files and writes an idempotent Implementation Reference section into wiki/pipeline.md. When Wiggum flags a knowledge gap in the evaluator's issues, a second pass injects the actual source code for the flagged functions.
The harness maintains a wiki/pipeline.md document describing the system architecture. The human-written sections cover the pipeline diagram, stage descriptions, and design decisions. But the implementation details — which model is active, what the pass threshold is set to, how the memory ranking formula is weighted — change as autoresearch and experiments evolve. Keeping the wiki accurate by hand is a maintenance tax that compounds with every experiment.
wiki_sync.py solves this deterministically: it reads six Python source files with regex, extracts the current values of key constants and prompts, and writes a structured markdown section directly into the wiki. No LLM hallucination, no stale documentation.
Regular sync: what gets extracted
The sync() function reads agent.py, wiggum.py, planner.py, orchestrator.py, memory.py, and inference.py and produces six subsections:
Regular sync: regex reads six source files and writes six structured sections into wiki/pipeline.md
Models by stage
- Producer model (
MODELfrom agent.py) withHARNESS_PRODUCER_MODELoverride env var - Wiggum producer and evaluator models with their respective override env vars
- Planner model and orchestrator assembly model
Key constants (9 values)
MAX_SEARCH_ROUNDS,NOVELTY_THRESHOLD,NOVELTY_EPSILONfrom agent.pyPASS_THRESHOLD,MAX_ROUNDSfrom wiggum.pyMAX_CONTEXT_OBSERVATIONS,SEMANTIC_CANDIDATESfrom memory.pySUBTASK_MAX_WORKERS,SUBTASK_MAX_RETRIESfrom orchestrator.py
Wiggum evaluation weights
- All five scoring dimensions extracted from the
EVAL_PROMPTtriple-quoted string in wiggum.py - Sorted by weight descending: depth (0.30), completeness (0.25), relevance (0.20), specificity (0.15), structure (0.10)
Memory retrieval ranking formula
- Exact blending coefficients extracted from memory.py via regex
- Quality floor value and deduplication logic documented
- SEMANTIC_CANDIDATES over-fetch count and final injection count
SYNTH_INSTRUCTION
- Current active synthesis prompt extracted from the
AUTORESEARCH:SYNTH_INSTRUCTION:BEGINsentinel block - Truncated to 600 chars if longer — enough to show structure without overwhelming the wiki
Model map (Ollama → vLLM)
_MODEL_MAPliteral from inference.py — maps Ollama tags to HuggingFace model IDs- Used when
INFERENCE_BACKEND=vllmto resolve which checkpoint to load
Idempotent marker blocks
The generated section is wrapped in HTML comment markers:
## Implementation Reference
*Auto-generated by `/sync-wiki` on 2026-06-01. Do not edit.*
…tables and extracted values…
<!-- sync-wiki:end -->
On re-run, the regex replaces the entire block between the markers rather than appending. The human-written sections above and below are untouched. This means python wiki_sync.py is safe to run after every significant pipeline change — it brings the wiki current without touching editorial content.
Gap-targeted sync: code injection from Wiggum failures
The second mode, sync_gaps(issues), is triggered automatically during a Wiggum FAIL cycle. When the evaluator's revision suggestions contain phrases like "unclear how evaluation criteria", "how the synthesis prompt constructs", or "how search results are filtered", the skill identifies which gap pattern was triggered and injects the relevant source code directly into the wiki.
There are nine gap patterns, each with a set of trigger substrings and a list of source code targets to extract:
Gap-targeted sync: evaluator issues trigger pattern matching, which extracts live source code into the wiki gaps section
Planning prompts
Triggers: "plan prompt", "task classification", "complexity analysis". Extracts PRIOR_KNOWLEDGE_PROMPT and PLAN_PROMPT heredocs from planner.py.
Evaluation prompt
Triggers: "evaluation criteria", "revision logic", "evaluation loop". Extracts full EVAL_PROMPT string from wiggum.py (up to 40 lines).
Synthesis construction
Triggers: "synthesis prompt", "how LLM is instructed", "formatting rules". Extracts the synthesize() function body from agent.py.
Novelty scoring
Triggers: "novelty", "search filtering", "how results are filtered". Extracts assess_novelty() from memory.py.
ChromaDB / embedding
Triggers: "chromadb", "embedding", "semantic similarity threshold". Extracts _get_chroma_ef() and _get_chroma().
Memory compression
Triggers: "compression model", "how compressed". Extracts compress_and_store() from memory.py (up to 30 lines).
Research gathering loop
Triggers: "gather_research", "saturation loop", "quality floor". Extracts gather_research() from agent.py (up to 35 lines).
Planning — make_plan()
Triggers: "how queries are determined", "plan queries", "complexity score". Extracts make_plan() from planner.py.
Extracted code appears as fenced Python blocks in a <!-- sync-wiki-gaps:start/end --> marker section, regenerated on each FAIL cycle. This means the wiki always contains the source code most relevant to whatever the evaluator is currently complaining about.
Context injection for /contextualize
The get_relevant_wiki_context() function assembles a targeted slice of pipeline.md for the /contextualize pre-research skill. It returns three parts in sequence, capped at 8,000 characters total:
- Human-written body (pipeline diagram + architectural overview, up to 3,000 chars)
- The Implementation Reference marker block (constants, models, weights)
- The Gap-Targeted Extractions block (source code for currently flagged issues)
This gives the producer model accurate function names, current constant values, and (when relevant) actual source code to cite — without dumping the entire wiki into the context window.
The gap-targeted extraction is a lightweight form of automated self-documentation: when the system fails on a gap, it writes the relevant code into the wiki so future runs have better grounding. It doesn't require the model to understand the code — it just needs to recognize trigger phrases in the evaluator's natural-language feedback.