May 24, 2026 • 6 min read • Agentic Harness Engineering

The op CLI: A Rich REPL for the Research Harness

Two invocation modes, eight skills, four browser flags, and a pyfiglet splash screen — op is the terminal-first interface to the harness pipeline.

The harness dashboard is the right tool for monitoring runs in progress and reviewing historical results. But for interactive research — spinning up a task, navigating a website, drafting an email — the terminal is faster. op.py is a single-file CLI that puts the entire agent.py pipeline behind a clean prompt with persistent history, slash-command dispatch, and a Rich-rendered interface.

Two invocation modes

The design follows the pattern of tools like gh and op from 1Password: bare invocation drops into an interactive REPL; passing arguments runs a single task and exits.

Interactive REPL

Run op with no arguments. A pyfiglet "op" logo in isometric ASCII renders inside a Rich double-edge panel. Active inference endpoints appear below the logo. The prompt op ▶   waits for input, backed by ~/.op_history for up-arrow recall across sessions.

Single-task mode

Pass the task directly on the command line — no quotes needed. op research best practices for AI agent cost management, save to ~/Desktop/out.md dispatches to agent.run(), prints elapsed time when done, and exits.

The REPL uses prompt_toolkit for line editing, which gives it history navigation, Ctrl-C handling (print a hint, stay alive), and EOF exit on Ctrl-D. History is persisted to ~/.op_history as a plaintext file, so you can grep prior tasks.

Eight skills

The help panel distinguishes between slash-prefixed skills (dedicated dispatch) and free-form tasks (routed through the full research pipeline):

Command Description
/browser <url> <goal> LLM-guided web navigation with saturation extraction. The agent loads the page, evaluates whether the goal is satisfied, and continues clicking/scrolling until saturation or success.
/sitemap <url> [goal] Discover all pages on a domain, rank by relevance to the optional goal. Goal is optional — without it, the skill returns a flat inventory of all reachable paths.
/annotate <url|path> Fetch or load a paper or document, run it through the Wiggum evaluator, and return a structured annotation with topic, motivation, contribution, and impact fields.
/email <contact> <goal> Draft and optionally send an email via Gmail. The contact name is resolved against known contacts; the goal becomes the drafting instruction.
/panel Enable the 3-persona Wiggum review panel for the next task. The panel runs three independent evaluators and synthesizes their scores before returning a result.
/re-orient Rebuild the orientation cache from current GitHub state — branch, recent commits, open PRs, CI status. Keeps the harness grounded when the repo changes significantly.
research <topic> Free-form research task. Routes through the full pipeline: plan → parallel search → synthesis → Wiggum eval loop. The most common invocation.
summarize <url|path> Fetch a URL or load a local file, apply the summarizer, and return a compressed version. Useful for pre-processing long documents before research.

Browser flags

Browser tasks — anything routed through the Playwright skill — support four environment-variable flags that control window visibility and session lifetime. The flags are parsed from the command line and injected via os.environ, which agent.py reads at task time:

--headed Show the browser window during navigation (useful for debugging what the agent sees)
--keep-browser Leave the browser process running after the task completes
--reuse-browser Reconnect to an existing CDP session rather than launching a new one
--no-wiggum Skip the quality evaluation loop entirely — returns synthesis output directly

These flags compose freely. A common debug invocation is:

op --headed --keep-browser /browser https://docs.example.com find the rate limits section

which shows the browser window and leaves it open after the agent finishes, so you can inspect the final page state.

Splash screen and endpoint display

On REPL entry, op renders an isometric ASCII "op" logo using pyfiglet's isometric1 font inside a Rich double-edge panel with cyan borders. Below the panel, it reads inference.ENDPOINTS and prints the active model names — so you know immediately which inference backends are live before running anything.

  models: qwen3.6-35b  qwen3-8b
  type -h for help, exit to quit

If the inference module fails to import (e.g., the server isn't running), the endpoint line is silently omitted rather than crashing.

Task dispatch internals

All task execution flows through a single _run(task, extra_args) function that calls agent.run(task, use_wiggum=not no_wiggum). The function records wall-clock elapsed time and prints it on completion. Keyboard interrupts during a run print a yellow "interrupted" message and return to the prompt without killing the process. SystemExit from within the agent is caught and suppressed — downstream code uses it as a clean-exit signal.

Flag parsing

Arguments starting with -- are separated from task words before joining. This means op --headed research X and op research X --headed are equivalent — flag position doesn't matter.

History persistence

~/.op_history is a plaintext file in prompt_toolkit's FileHistory format. It persists across sessions, so up-arrow recall works even after a restart. You can grep it directly: grep "browser" ~/.op_history.

REPL fallback

Passing only flags with no task words (e.g. op --headed) drops into REPL mode with the flags pre-loaded, so every task run in that session inherits the browser settings.

Vim exit aliases

exit, quit, q, and :q all exit the REPL cleanly — a small concession to muscle memory from vim and psql.

Example invocations

# Full research task, output saved by the pipeline
op research best practices for cost management in AI agents, save to ~/Desktop/out.md

# LLM-guided browser navigation
op /browser https://docs.anthropic.com find the tool use pricing page

# Sitemap discovery against a goal
op /sitemap https://stripe.com find integration guides

# Headed browser with window visible
op --headed /browser https://github.com summarize recent issues in the top repo

# Annotate a local PDF
op /annotate ~/Desktop/arxiv_2401.12345.pdf

# Skip evaluation for a fast synthesis
op --no-wiggum summarize https://example.com/long-report

The op binary is registered in op.bat for Windows, which wraps python op.py %*. Add the harness directory to PATH and op becomes available system-wide without a full install.

Relation to the dashboard

The op CLI and the dashboard are complementary rather than overlapping. op is the fast-path entry point: no browser tab needed, no server dependency beyond the inference backends. The dashboard is the monitoring and audit layer: session history, artifact browser, analytics, security event log. A run kicked off via op writes to the same runs.jsonl, artifacts.jsonl, and sessions.jsonl files that the dashboard reads — so every CLI task shows up in the dashboard's history automatically.

For the dashboard views that surface this data, see The Sessions and Artifacts Views and The Submit View.