June 1, 2026 • 5 min read • Agentic Harness Engineering

The Plugin System: /forge:plugin and /forge:list

/forge:plugin generates a complete plugin from a natural-language description: the LLM produces a JSON spec, files are written to plugins/<name>/, and the plugin is hot-loaded into the skill registry without a restart. /forge:list shows all installed plugin commands grouped by plugin. Plugin skills auto-inject domain knowledge into every synthesis run; plugin commands become first-class /plugin:command slash tokens.

The plugin system is the extension point above the built-in skills. Where a built-in skill requires editing skills/__init__.py and agent.py, a plugin is a directory of markdown files and a JSON manifest — no Python changes needed. /forge:plugin automates even that: describe what the plugin should do in plain English and the harness writes the files itself.

Plugin structure

Every plugin lives at plugins/<name>/ with this layout:

plugin.json Manifest: plugin name, description, list of commands and skill files skills/*.md Domain knowledge automatically injected into every synthesis run while the plugin is loaded — acts as a persistent system-prompt extension commands/*.md Prompt templates for explicit slash commands, invoked by /plugin:command-name in the task string

The distinction between skills and commands matters. A skill file is passive — its content is appended to the synthesis context on every task regardless of what the user asked, providing always-on domain expertise. A command file is active — it's only injected when the user explicitly invokes /plugin:command-name, and its content becomes the primary task framing.

/forge:plugin

# Generate a plugin from a description
python agent.py "/forge:plugin a code-review plugin that applies the Karpathy rubric to diffs"

# Another example
python agent.py "/forge:plugin a financial-analysis plugin with FRED and BEA citation discipline"

The skill passes the description directly to the LLM with a tightly structured prompt that enforces a specific JSON schema:

{
  "name": "code-review",
  "manifest": {
    "name": "code-review",
    "description": "Apply the Karpathy rubric to code diffs",
    "version": "1.0",
    "commands": [
      {
        "name": "review",
        "description": "Review staged diff against rubric",
        "path_optional": true,
        "template": "commands/review.md"
      }
    ],
    "skills": [
      { "name": "rubric", "path": "skills/rubric.md" }
    ]
  },
  "skills": {
    "rubric.md": "# Karpathy Code Review Rubric\n\nNo magic numbers..."
  },
  "commands": {
    "review.md": "Review the following diff against the rubric above..."
  }
}

The prompt constrains the output to 1–3 commands and 1–2 skill files, and requires the name to be a lowercase hyphen-slug. The LLM is instructed to output only valid JSON with no markdown fences. If the model wraps its output in fences anyway, the handler strips them before parsing.

After the JSON is parsed, create_plugin() in plugin_loader.py writes the files to disk and immediately calls load_all() — the hot-reload function that re-scans plugins/ and updates skills.REGISTRY in place. The new commands are live in the current process without a restart.

Hot-loading into the skill registry

Each plugin command is registered into skills.REGISTRY as a standalone skill with the key plugin:command-name:

registry["code-review:review"] = {
    "description": "Review staged diff against rubric",
    "hook":        "standalone",
    "prompt":      None,
    "auto":        None,
}

This means parse_skills() recognises /code-review:review as a valid skill token in any task string — the same way built-in skills like /deep or /cite are parsed. The plugin command is then dispatched through the synthesis pipeline with the command template prepended to the synthesis context as _plugin_cmd_context.

On load_all(), the loader tracks which keys it added to the registry (_registry_keys) so it can remove them cleanly before the next reload. This prevents stale commands from accumulating if a plugin is deleted or renamed between loads.

/forge:list

python agent.py "/forge:list"

Lists all installed plugin commands, grouped by plugin name:

[forge:list] 3 plugin command(s):

  ▸ code-review
    /code-review:review           Review staged diff against rubric
    /code-review:summarize        Summarize changes in plain English

  ▸ financial-analysis
    /financial-analysis:thesis    Generate a trading thesis with citations

The listing is sorted alphabetically by plugin name, then by command key within each plugin. The description comes from the description field in the manifest's command entry.

Auto-inject skills vs. command templates

Plugin skill files marked with auto_inject: true (the default) are concatenated into the synthesis context on every task via get_skill_context() in plugin_loader.py. This makes them equivalent to a persistent addition to the synthesis system prompt — useful for domain-specific heuristics, citation formats, or style rules that should apply across all tasks while the plugin is installed.

Command templates are only active when explicitly invoked. A commands/review.md file containing a detailed diff review rubric does nothing unless the user runs /code-review:review — it doesn't silently affect research synthesis or annotate tasks.

Plugins persist across restarts because they're files on disk. load_all() is called at agent startup automatically, so any plugin in plugins/ is live from the first task. To disable a plugin without deleting it, rename its plugin.json to plugin.json.disabled — the loader skips directories without a valid manifest.

/forge:plugin uses temperature: 0.3 — low enough for consistent JSON structure, but not zero, so descriptions with ambiguous scope get slightly different interpretations on retry. If the generated plugin doesn't match your intent, run /forge:plugin again with a more specific description rather than editing the JSON by hand.