June 1, 2026 • 6 min read • Agentic Harness Engineering

The Interview Skills: /grill-me and /onboarding

/grill-me is a saturation-driven user interview that mirrors the research pipeline's gather loop — questions are planned by an LLM, answers are novelty-gated, and knowledge accumulates through the same compress_knowledge() function used in research runs. /onboarding extends it with a three-question fixed scaffold, persistent TOML config, and ChromaDB memory seeding.

Both skills solve the same problem the research pipeline solves: how to stop gathering information at the right time. Too few questions and the knowledge brief is shallow. Too many and the user fatigues. The novelty gate from gather_research() — stop when marginal information value drops below threshold — applies equally well to human interview answers as it does to web search results. The implementation reuses the same functions.

The research loop analogy

plan_query() — LLM generates next search query

plan_question() — LLM generates next interview question

DDGS web search — returns result documents

ask_fn(question) — returns user's answer string

assess_novelty(results, context) — score 0–10

assess_novelty(packet, knowledge_state) — same function

compress_knowledge() — roll new results into state

compress_knowledge() — same function, same signature

Oversearch detector — stop if no new queries possible

Fatigue detector — stop after 2 consecutive short answers

The fatigue detector mirrors the oversearch detector: if the user gives two consecutive answers under 15 words, the loop treats this as a signal that they're done or disengaged and stops. The novelty gate fires only after the minimum round count (3) — early rounds are allowed even if they score low, because the knowledge state is empty and the LLM can't assess novelty meaningfully against nothing.

/grill-me: the general-purpose interview

# Interview about a goal, output a brief
python agent.py "/grill-me understand my research needs for a lit review on RL"

# Disable novelty gate, run all 8 rounds
python agent.py "/grill-me understand my codebase --thorough"

# Tailor questions toward a specific output type
python agent.py "/grill-me plan my Q3 goals --for deck"
python agent.py "/grill-me profile my use case --for research"

The first question in every session is always the same fixed opener: "Tell me about your goal: [goal] — what are you trying to accomplish and why?" This anchors the knowledge state before the LLM starts planning follow-up questions from it. From round 2 onward, plan_question() receives the current knowledge state (capped at 1,200 chars) and generates one targeted question. The temperature is 0.4 — higher than synthesis — to encourage varied questions across sessions with similar goals.

The --for <skill> flag passes a target_skill string into the question prompt and brief synthesis prompt, steering the questions toward what a specific skill needs. --for deck prompts questions about slide count, audience, tone, and key message; --for research steers toward sources, depth, and scope constraints.

The output is a structured markdown brief written to data/briefs/<slug>-<timestamp>.md:

## Context
## Goals
## Constraints & non-goals
## Open questions
## Suggested next steps

/onboarding: persistent personalization

# First run — starts the interview if .harness-user.toml is absent
python agent.py "/onboarding"

# Re-run — additive update, merges new answers with existing config
python agent.py "/onboarding"

/onboarding auto-triggers when .harness-user.toml doesn't exist — the harness checks at startup and runs the interview interactively before the first task. Re-running is safe: answers are merged additively, new values overwrite old ones, and absent keys from the new run are preserved from the existing config.

The interview runs in two phases:

Phase 1 — fixed scaffold (rounds 1–3, no novelty gate)
Three questions always asked in order: role and primary domain; main use cases; preferred output format. These populate the TOML config skeleton regardless of what else the user says.

Phase 2 — free-form novelty-gated (rounds 4–6)
Calls plan_question() from grill_me_skill with goal "onboarding: learn about this user". Novelty gate and fatigue detector both active. Stops early if answers become repetitive or short.

Persistent outputs

After the interview, synthesize_and_write() calls the LLM once to produce two artifacts from the accumulated knowledge state:

.harness-user.toml — machine-readable config loaded by the harness at startup:

[user]
role             = "ML researcher"
domain           = "agentic systems"
preferred_model  = "qwen3.6-35b"
preferred_format = "markdown with headers"
verbosity        = "detailed"

[routing]
research_tasks = "always use /deep"
coding_tasks   = "add /cite for attribution"

data/user_profile.md — human-readable profile with five sections: Background, Primary use cases, Output preferences, Domain expertise, Notes.

After writing both files, _seed_memory() splits the knowledge state into sentences and upserts them into a user_context ChromaDB collection. This makes user context available for semantic retrieval during the /contextualize skill — when the agent handles self-referential tasks, it can retrieve relevant user context alongside the wiki pages.

The TOML merge strategy preserves old keys absent from the new run — it never overwrites with empty strings. This means partial re-runs are safe: if the user skips the output preferences question, the existing preference is kept. Only non-empty new values replace old ones.

/grill-me uses the COMPRESS_MODEL environment variable for question planning and brief synthesis, defaulting to HARNESS_PRODUCER_MODEL. For interactive interviews the question planning calls are fast — question generation is capped at 120 tokens — but the brief synthesis at the end takes a full inference call at up to 800 tokens.

The research loop analogy

/grill-me: the general-purpose interview

/onboarding: persistent personalization

Persistent outputs

Related posts