Five Personas, One Veto: Consensus Filtering for Fine-Tuning Data
A dual-gate curation pipeline: papers pass only if the mean score across five LLM personas reaches 3.5 and no single persona scores below a veto floor of 2. The Contrarian is designed to be the hardest to satisfy.
The harness accumulates research papers through two automated paths: an ArXiv fetcher that pulls agentic AI papers by keyword, and the /annotate skill that can ingest any URL or local file. Each paper gets a structured annotation — topic, motivation, contribution, detail, evidence, weaker result, narrow impact, broad impact — produced by the evaluator LLM.
But annotation quality varies, and not every paper in the corpus is worth including in the fine-tuning dataset. A paper might be well-annotated but trivially incremental, or methodologically weak, or too specialized to generalize. curator.py applies a second filter: five independent LLM personas, each with a distinct value system, scoring each paper before it reaches build_finetune_from_annotations.py.
The five personas
Senior software engineer
Scores on actionable implementation value. Overly theoretical papers with no practical pathway score low. Specific methods, architectures, or techniques a developer could implement score high.
Research scientist
Scores on methodological soundness: are claims backed by experiments, baselines reasonable, limitations honest? Vague claims or missing evaluations score low regardless of novelty.
Knowledge architect
Scores on connectivity: does this paper introduce a concept or finding that connects meaningfully to the broader landscape of AI agent research? Narrow or incremental papers with little cross-paper relevance score low.
Skeptical reviewer
Actively looks for overclaiming, trivial contributions, manufactured problems. Only genuinely novel, well-scoped contributions score high. Designed to push back where the other personas might accept.
ML student
Scores on accessibility and field-entry value. Highly specialised or prerequisite-heavy papers score low. Papers that illuminate a key idea or open problem clearly — something worth reading on the way into the field — score high.
Each persona is implemented as a distinct system prompt. The same user prompt — paper title plus formatted annotation — is sent to all five in sequence at temperature 0.2. The response format is strict: exactly two lines, SCORE: <1-5> and REASON: <one sentence>.
The dual gate
A paper passes curation only if it satisfies both conditions simultaneously:
Pass conditions
The veto floor is the more interesting constraint. A paper that earns 4, 4, 4, 4, 1 has a mean of 3.4 — borderline on the threshold — but the single score of 1 from the Contrarian vetoes it outright. A paper that earns 4, 4, 4, 3, 3 has a mean of 3.6 and no veto — it passes. The gate encodes the intuition that a single strong objection from one perspective should be disqualifying, even if the rest of the committee approves.
Both thresholds are configurable via command-line flags: --mean-threshold and --veto-floor. A lenient run uses --mean-threshold 3.0 --veto-floor 1; a strict run could use --mean-threshold 4.0 --veto-floor 3.
Curation pipeline
arxiv_*_annotated.csv files in the harness directory (or a single file via --input). Each row is one paper with 8 annotation columns.curation_log.jsonl and skips any paper whose arxiv_id already has a decision. Re-running the curator on a partially-processed batch costs nothing for already-evaluated papers.SCORE: and REASON: lines with regex; falls back to the first 120 characters of raw output if parsing fails.passed, mean, and veto_by (list of persona names that vetoed) to curation_log.jsonl.arxiv_*_curated.csv. The next stage, build_finetune_from_annotations.py, reads *_curated.csv instead of *_annotated.csv.Output format
Each paper's decision is appended to curation_log.jsonl as a JSON object:
{
"arxiv_id": "2401.12345",
"title": "2401.12345",
"scores": [
{"persona": "Pragmatic Engineer", "score": 4, "reason": "Provides concrete ReAct loop implementation details..."},
{"persona": "Academic Rigorist", "score": 3, "reason": "Evaluation uses reasonable baselines but no ablation..."},
{"persona": "Synthesis Thinker", "score": 4, "reason": "The tool-calling taxonomy connects directly to..."},
{"persona": "Contrarian", "score": 2, "reason": "Claims novelty over prior work but the distinction..."},
{"persona": "Newcomer", "score": 4, "reason": "Clearly explains the problem before the solution..."}
],
"mean": 3.4,
"passed": false,
"veto_by": ["Contrarian"],
"tokens_in": 1840,
"tokens_out": 120
}
The veto_by field is a list of persona names whose score fell below the veto floor. In this example, the Contrarian's score of 2 triggers the veto despite a collective mean of 3.4 that would have been borderline-passing on the mean threshold alone.
Stats and diagnostics
Running python curator.py --stats reads the existing log without scoring any new papers:
Curation log: 148 papers | 89 passed | 59 failed | 23 vetoed
Veto counts by persona:
Contrarian: 18
Academic Rigorist: 7
Pragmatic Engineer: 3
Synthesis Thinker: 2
Newcomer: 1
The Contrarian leading the veto count is the expected behavior — it was designed to be the hardest gatekeeper. A high Academic Rigorist veto count typically signals that the annotation pipeline is accepting papers with weak evaluation sections; a high Newcomer count suggests the corpus is drifting toward over-specialized material.
Why five personas instead of one evaluator
A single LLM evaluator with a general rubric tends to produce consistent but undifferentiated scores — a well-written annotation of any reasonable paper gets 3.5–4.0. The five-persona design forces the model to inhabit contradictory value systems sequentially. The Contrarian prompt in particular explicitly instructs the model to find reasons to reject rather than reasons to accept, which breaks the sycophantic pattern that LLMs fall into when asked to evaluate content they just read.
The veto mechanism is equivalent to requiring that no single perspective is completely alienated by the paper. A paper that scores 5 from the engineer but 1 from the rigorist is probably a well-written blog post masquerading as research — the kind of thing that would corrupt the fine-tuning signal if included.
The curator feeds directly into the DPO training pipeline. For more on how curated annotations become preference pairs, see The Fine-tune View.