May 25, 2026 • 12 min read • Agentic Harness Engineering Series

SBOM and AIBOM for Agentic Systems

pip freeze knows about fastapi and ollama. It does not know about kimi-k2.5:cloud. An AI Bill of Materials fills that gap — enumerating the model artifacts, custom Modelfiles, and cloud endpoints that constitute the other half of an agentic system's supply chain.

The NTIA minimum elements for a Software Bill of Materials — supplier name, component name, version, unique identifier, dependency relationship, author, timestamp — were designed for a world where software components are versioned packages with cryptographic hashes. AI models are not that. A GGUF file has a hash; a cloud endpoint has a name and a terms-of-service page. A custom Modelfile has a base model, a quantization, and a system prompt baked in. None of these appear in a pyproject.toml or a pip freeze output.

This post documents what a complete supply chain picture looks like for the harness — both the software layer (SBOM, covering Python packages and system binaries) and the AI layer (AIBOM, covering model artifacts). The actual AIBOM.md document is committed to the repo. This post explains the design decisions behind it and the gaps that remain open.

I1 — The Three Supply Chain Layers

An agentic harness has three supply chain layers that a traditional SBOM only covers one of:

I1 — Supply Chain Layers: Coverage by Document Type

Traditional SBOM covers only the software layer. The model and cloud layers require an AIBOM. Runtime binaries (Ollama, llama.cpp, ffmpeg) span both documents.

Layer Examples Tracking mechanism Covered by SBOM?
Software fastapi, ollama, chromadb, jinja2 pip/uv lock, PyPI hash Yes — fully
Runtime binaries Ollama, llama.cpp, whisper.cpp, ffmpeg, Tesseract Git submodule SHA, system package version Partial — submodules yes; system deps no
Local models pi-qwen-32b, Qwen3-Coder:30b, atla/selene-mini Ollama manifest ID (SHA256 prefix) No
Custom Modelfiles pi-qwen-32b (system prompt overlay), nanda-annotator-v2-q4km Ollama ID + system prompt hash No
Cloud endpoints ⚠ untracked kimi-k2.5:cloud (Moonshot AI), glm-5.1:cloud (Zhipu AI) Name only — no content hash, no pin No

The cloud endpoint row is the one that keeps appearing in security conversations and disappearing from tooling. kimi-k2.5:cloud is a first-class runtime dependency of autoresearch.py — the loop calls it on every stuck episode. It is not in pyproject.toml; it is not in uv.lock; it does not appear in pip freeze. The only artifact that surfaces it is the AIBOM.

I2 — The Software SBOM Layer

The harness's Python dependencies are declared in pyproject.toml and resolved to pinned hashes in uv.lock. Generating a machine-readable SBOM from the lock file is one command:

pip install cyclonedx-bom
cyclonedx-py environment --output-format json > sbom-python.cdx.json

The CycloneDX 1.5 JSON output satisfies the NTIA minimum elements for the Python layer. Each component entry carries a PURL (Package URL), the declared version, and the PyPI hash. The runtime binaries that are not Python packages require separate entries:

Component Version pin NTIA identifier License
Ollama ≥ 0.4 (runtime); not pinned GitHub release tag MIT
llama.cpp Git submodule SHA (pinned) git SHA MIT
whisper.cpp Git submodule SHA (pinned) git SHA MIT
ffmpeg System (≥ 6.0); not pinned OS package name + version LGPL 2.1
Tesseract OCR System (≥ 5.0); not pinned OS package name + version Apache 2.0
Chromium (Playwright) Playwright-bundled (pinned transitively) Playwright version + browser hash BSD / project-specific

The gap in the software SBOM is system dependencies: ffmpeg and Tesseract are installed outside Python's package manager. Their versions are not in any lock file. In a containerized deployment, the Dockerfile pins them; in a bare-metal deployment like this one, they are environment assumptions rather than declared dependencies.

I3 — The AIBOM: Local Models

The harness currently runs eight distinct model roles across six Ollama tags. Four are standard base models; two are custom Modelfiles. The distinction matters for the AIBOM because a custom Modelfile is not just a base model with a tag — it bakes in a system prompt that shapes every inference call, making the system prompt part of the effective model identity.

Model Role Arch / params Quant Custom Modelfile
pi-qwen-32b Primary producer / agent Qwen2, 32.8B Q4_K_M Yes — task-completing agent persona
pi-qwen3.6 Alternate producer Qwen3 MoE, 36.0B / 3.6B active Q4_K_M Yes — system prompt + sampling overrides
atla/selene-mini Wiggum evaluator (judge) Llama, 8.0B Q4_K_M No — evaluation-specialist fine-tune
Qwen3-Coder:30b Autoresearch proposer Qwen3 MoE, 30.5B Q4_K_M No
nanda-annotator-v2-q4km Lit-review annotator Qwen2, 7.6B Q4_K_M Yes — annotation persona, structured output
qwen3:8b Eval suite, general tasks Qwen3, 8.0B Standard No

The NTIA minimum element set maps cleanly to Ollama model metadata. The unique identifier is the Ollama manifest ID — a SHA256 prefix of the manifest blob that Ollama uses internally. It is not a content hash of the weights themselves (which are stored as separate blobs), but it is a stable identifier that changes when the model is updated via ollama pull. For the purposes of an AIBOM, it is the most actionable identifier available without external registry tooling.

The Modelfile-as-Artifact Problem

A custom Modelfile is a derived artifact: it takes a base model and overlays a system prompt, sampling parameters, and stop tokens. From the AIBOM perspective, the base model and the overlay are both supply chain inputs. The effective behavior of pi-qwen-32b depends on both:

# Effective model identity for pi-qwen-32b has two components:
base_model_id  = "edee0c094406"          # Ollama manifest ID (qwen2.5-32b-instruct-q4_K_M)
system_prompt  = sha256(SYSTEM_PROMPT)   # hash of the task-completing agent persona text

# A change to either component changes the model's effective behavior.
# Only the base_model_id appears in `ollama list`.

Standard SBOM tooling captures neither. The AIBOM tracks both. The current AIBOM.md records the Ollama ID and describes the system prompt's role; a hardened version would include the system prompt SHA256 and re-verify it on each Modelfile rebuild.

If a system prompt is part of the model's effective identity, then changing the system prompt without updating the AIBOM creates a silent divergence between the documented model and the running one. This is the same class of problem as updating a dependency without updating the lock file — the declared state and the runtime state drift apart.

I4 — The Cloud Endpoint Gap

Two models in the Ollama registry are cloud endpoints with no local weights:

Tag Provider Role Local size Pinnable?
kimi-k2.5:cloud Moonshot AI Autoresearch Kimi unblock No
glm-5.1:cloud Zhipu AI Registered; not currently wired No

When autoresearch.py calls ollama.chat(model=KIMI_MODEL, ...), the Ollama daemon forwards the request to Moonshot AI's API. The model that responds may differ from one call to the next — providers update cloud models without version-bump guarantees, and the Ollama cloud-model manifest carries only a routing entry, not a content hash. There is no mechanism to verify that the model responding today is the same as the one that responded yesterday.

Supply chain risk: A cloud model that is updated by its provider between two autoresearch runs can produce different instruction proposals for the same prompt. If those proposals produce different eval scores, the autoresearch loop may accept or reject based on model behavior that has changed underneath it — not based on the instruction change being tested. This is the same threat model as a mutable dependency: the environment changes while the code stays constant.

The AIBOM is the only artifact that surfaces this risk. A standard SBOM audit of the harness repo would find nothing wrong: kimi-k2.5:cloud does not appear in pyproject.toml, uv.lock, or any Python import. It appears only in autoresearch.py as a string default:

KIMI_MODEL = os.environ.get("KIMI_MODEL", "kimi-k2.5:cloud")

This is the canonical gap between an SBOM and an AIBOM: one audits what the package manager knows; the other audits what the running system actually calls.

What Mitigation Looks Like

Full mitigation of the cloud model risk would require provider-side versioning — a kimi-k2.5:cloud@2026-05-25 endpoint that Moonshot AI commits not to update. Providers do not generally offer this. The practical mitigations available on the client side are:

I5 — AIBOM Format and Tooling State

There is no settled standard for AIBOMs yet. The active tracks as of mid-2026:

Initiative Status Relevance to agentic systems
CISA / NTIA AI SBOM working group Framing documents published; minimum elements draft pending Likely to extend NTIA SBOM minimum elements to cover model artifacts
CycloneDX 1.5 ML-BOM Released; type: machine-learning-model + modelCard fields Best available machine-readable format; covers architecture, quantization, intended use, training data provenance
OWASP CycloneDX extensions Active; model card schema, pedigree (base model chain) Covers the Modelfile-as-derived-artifact problem via pedigree.ancestors
Hugging Face model cards Widespread; not a BOM format Useful as provenance source for AIBOM entries; not machine-readable for audit tooling

The harness AIBOM is a Markdown document rather than a CycloneDX JSON, for two reasons. First, CycloneDX 1.5 ML-BOM tooling for Ollama-managed models does not exist — there is no cyclonedx-py equivalent that introspects ollama list and emits a compliant JSON. Second, the most important entries in the harness AIBOM are the cloud endpoints and custom Modelfiles, both of which require human-in-the-loop curation that automated tooling cannot provide. A Markdown document is maintainable manually; a JSON document generated by a tool that doesn't understand Modelfiles is not.

The right long-term state is a hybrid: generate the Python layer automatically from uv.lock via CycloneDX, and maintain the model layer manually in Markdown with a validation script that cross-checks Ollama manifest IDs against the document's recorded values. Neither half is complete without the other.

The regeneration test: An AIBOM is only useful if it stays current. The AIBOM.md in the repo includes a "How to Regenerate" section that lists the three steps: ollama list for current IDs, ollama show <tag> for architecture verification, and SHA256 of the system prompt text for custom Modelfiles. If a model is updated via ollama pull and the AIBOM isn't regenerated, the manifest ID will be stale. That staleness is detectable — which is the point.

What the Literature Leaves Open

← Previous Agentic Threat Hardening Next → Security Patterns