May 23, 2026 • 5 min read • Agentic Harness Engineering

The MCP View: Exposing the Harness as a Tool Server

Three tools — run_task, run_orchestrated, get_run — make the full harness research pipeline callable from any Model Context Protocol client, with a live task log auto-refreshing every 5 seconds.

The Model Context Protocol (MCP) is a standard interface for exposing tool capabilities to LLM-based clients. Claude Desktop, Claude Code, and any MCP-compatible agent can call registered tools by name with typed parameters. The harness implements an MCP server alongside its REST API, which means the entire research pipeline — planner, search, synthesis, Wiggum evaluation loop — is callable as a tool from another agent.

The MCP view surfaces the registered tool manifest and a live log of every tool invocation that has been dispatched through the MCP interface.

MCP view showing TOOLS (3) section with run_task, run_orchestrated, and get_run tool cards, each showing parameter badges and description, and TASK LOG section showing 0 entries

MCP view: three registered tools with parameter badges (indigo = required, dim = optional) and descriptions. The task log is currently empty — no MCP-originated calls have been made in this session.

The Three Tools

run_task taskapi_key

Run a single research or synthesis task through the harness agent pipeline. The task parameter is the same natural-language task string accepted by the CLI and Submit view — it can include skill flags like /deep or /cite, and should include a .md output path. api_key is optional; when omitted, the server uses its configured default key. Returns the queued item ID synchronously; the run executes asynchronously.

run_orchestrated taskapi_key

Run a complex multi-subtask task through the orchestrator. The orchestrator decomposes the task into parallel subtasks, runs them concurrently (up to subtask_max_workers=4), and aggregates results before synthesis. Use this for tasks that benefit from parallel research across multiple angles — the MCP caller gets back a single synthesis when all subtasks complete.

get_run run_id

Retrieve a run record from data/runs.jsonl by run ID. Returns a JSON summary: task string, final score, pass/fail status, duration, producer model, and output path. This is the read side of the MCP interface — poll after run_task to check whether a submitted run has completed and what it scored.

Parameter badges use two visual states: indigo background for required parameters, dim background for optional ones. This matches the convention from the harness Runs view and lets you read the tool signature at a glance without reading the description.

Usage Pattern: Task → Poll

The intended MCP usage pattern from a client like Claude Desktop:

# 1. Submit a task — returns an item_id immediately
result = run_task(task="Research DPO fine-tuning best practices and save to ~/Desktop/dpo.md")
item_id = result["item_id"]

# 2. Poll for completion using get_run (or wait for webhook)
run = get_run(run_id=item_id)
# run.final: "PASS" | "FAIL" | "ERROR"
# run.wiggum_scores: [8.3]
# run.output_path: "~/Desktop/dpo.md"

The MCP server doesn't block on task completion — run_task enqueues and returns immediately, matching the behavior of the REST queue endpoint. The MCP client is responsible for deciding whether to poll with get_run, listen for a webhook (CLAUDE_WEBHOOK_URL), or simply wait a fixed duration before requesting results.

This is the pattern behind the harness MCP server used in this project's own development loop: Claude Code calls run_task to execute a research synthesis, waits for the run to complete, then calls get_run to retrieve the score and output path — all without leaving the coding environment.

The Task Log

The lower section of the MCP view is a live task log, auto-refreshing every 5 seconds. Each entry shows a timestamp (HH:MM:SS), a label identifying the source tool call, an event type (start / done / fail / line), and the event text. The current log shows 0 entries because no MCP-originated calls have been made in the current server session — the pipeline runs shown in the Analytics view were all submitted via the CLI or Submit view, not via MCP.

Same pipeline, different interface

run_task calls exactly the same code path as POST /api/queue and the CLI. Skills, flags, Wiggum evaluation, context window management — all identical. The MCP interface is a thin wrapper, not a different execution path.

Harness-as-tool for other agents

Any MCP client can register the harness server and call run_task to delegate research to a local expert. A Claude Code agent that needs literature context can hand off to the harness, get back a scored Markdown report, and incorporate the findings — without running the full pipeline itself.

Orchestrated vs. single-task

run_orchestrated is appropriate for tasks where the decomposition into parallel subtasks is valuable — "compare three RAG architectures" vs. "research RAG best practices." The orchestrator incurs additional overhead for the decomposition pass and synchronization, so single-task is faster for straightforward research questions.

Tool manifest source

The tool list is fetched from /api/mcp/tools and reflects the live registered tools. Adding a new MCP tool to harness/api/routes/mcp.py and restarting the server causes it to appear in the MCP view immediately — no dashboard rebuild required.

The Three Tools

Usage Pattern: Task → Poll

The Task Log

Same pipeline, different interface

Harness-as-tool for other agents

Orchestrated vs. single-task

Tool manifest source

Related posts