The Interview Skills: /grill-me and /onboarding
/grill-me is a saturation-driven user interview that mirrors the research pipeline's gather loop — questions are planned by an LLM, answers are novelty-gated, and knowledge accumulates through the same compress_knowledge() function used in research runs. /onboarding extends it with a three-question fixed scaffold, persistent TOML config, and ChromaDB memory seeding.
Both skills solve the same problem the research pipeline solves: how to stop gathering information at the right time. Too few questions and the knowledge brief is shallow. Too many and the user fatigues. The novelty gate from gather_research() — stop when marginal information value drops below threshold — applies equally well to human interview answers as it does to web search results. The implementation reuses the same functions.
The research loop analogy
plan_query() — LLM generates next search queryplan_question() — LLM generates next interview questionask_fn(question) — returns user's answer stringassess_novelty(results, context) — score 0–10assess_novelty(packet, knowledge_state) — same functioncompress_knowledge() — roll new results into statecompress_knowledge() — same function, same signatureThe fatigue detector mirrors the oversearch detector: if the user gives two consecutive answers under 15 words, the loop treats this as a signal that they're done or disengaged and stops. The novelty gate fires only after the minimum round count (3) — early rounds are allowed even if they score low, because the knowledge state is empty and the LLM can't assess novelty meaningfully against nothing.
/grill-me: the general-purpose interview
# Interview about a goal, output a brief
python agent.py "/grill-me understand my research needs for a lit review on RL"
# Disable novelty gate, run all 8 rounds
python agent.py "/grill-me understand my codebase --thorough"
# Tailor questions toward a specific output type
python agent.py "/grill-me plan my Q3 goals --for deck"
python agent.py "/grill-me profile my use case --for research"
The first question in every session is always the same fixed opener: "Tell me about your goal: [goal] — what are you trying to accomplish and why?" This anchors the knowledge state before the LLM starts planning follow-up questions from it. From round 2 onward, plan_question() receives the current knowledge state (capped at 1,200 chars) and generates one targeted question. The temperature is 0.4 — higher than synthesis — to encourage varied questions across sessions with similar goals.
The --for <skill> flag passes a target_skill string into the question prompt and brief synthesis prompt, steering the questions toward what a specific skill needs. --for deck prompts questions about slide count, audience, tone, and key message; --for research steers toward sources, depth, and scope constraints.
The output is a structured markdown brief written to data/briefs/<slug>-<timestamp>.md:
## Context
## Goals
## Constraints & non-goals
## Open questions
## Suggested next steps
/onboarding: persistent personalization
# First run — starts the interview if .harness-user.toml is absent
python agent.py "/onboarding"
# Re-run — additive update, merges new answers with existing config
python agent.py "/onboarding"
/onboarding auto-triggers when .harness-user.toml doesn't exist — the harness checks at startup and runs the interview interactively before the first task. Re-running is safe: answers are merged additively, new values overwrite old ones, and absent keys from the new run are preserved from the existing config.
The interview runs in two phases:
Three questions always asked in order: role and primary domain; main use cases; preferred output format. These populate the TOML config skeleton regardless of what else the user says.
Calls
plan_question() from grill_me_skill with goal "onboarding: learn about this user". Novelty gate and fatigue detector both active. Stops early if answers become repetitive or short.Persistent outputs
After the interview, synthesize_and_write() calls the LLM once to produce two artifacts from the accumulated knowledge state:
.harness-user.toml — machine-readable config loaded by the harness at startup:
[user]
role = "ML researcher"
domain = "agentic systems"
preferred_model = "qwen3.6-35b"
preferred_format = "markdown with headers"
verbosity = "detailed"
[routing]
research_tasks = "always use /deep"
coding_tasks = "add /cite for attribution"
data/user_profile.md — human-readable profile with five sections: Background, Primary use cases, Output preferences, Domain expertise, Notes.
After writing both files, _seed_memory() splits the knowledge state into sentences and upserts them into a user_context ChromaDB collection. This makes user context available for semantic retrieval during the /contextualize skill — when the agent handles self-referential tasks, it can retrieve relevant user context alongside the wiki pages.
The TOML merge strategy preserves old keys absent from the new run — it never overwrites with empty strings. This means partial re-runs are safe: if the user skips the output preferences question, the existing preference is kept. Only non-empty new values replace old ones.
/grill-me uses the COMPRESS_MODEL environment variable for question planning and brief synthesis, defaulting to HARNESS_PRODUCER_MODEL. For interactive interviews the question planning calls are fast — question generation is capped at 120 tokens — but the brief synthesis at the end takes a full inference call at up to 800 tokens.