May 25, 2026 • 12 min read • Agentic Harness Engineering Series

SBOM and AIBOM for Agentic Systems

pip freeze knows about fastapi and ollama. It does not know about kimi-k2.5:cloud. An AI Bill of Materials fills that gap — enumerating the model artifacts, custom Modelfiles, and cloud endpoints that constitute the other half of an agentic system's supply chain.

The NTIA minimum elements for a Software Bill of Materials — supplier name, component name, version, unique identifier, dependency relationship, author, timestamp — were designed for a world where software components are versioned packages with cryptographic hashes. AI models are not that. A GGUF file has a hash; a cloud endpoint has a name and a terms-of-service page. A custom Modelfile has a base model, a quantization, and a system prompt baked in. None of these appear in a pyproject.toml or a pip freeze output.

This post documents what a complete supply chain picture looks like for the harness — both the software layer (SBOM, covering Python packages and system binaries) and the AI layer (AIBOM, covering model artifacts). The actual AIBOM.md document is committed to the repo. This post explains the design decisions behind it and the gaps that remain open.

I1 — The Three Supply Chain Layers

An agentic harness has three supply chain layers that a traditional SBOM only covers one of:

I1 — Supply Chain Layers: Coverage by Document Type

Traditional SBOM covers only the software layer. The model and cloud layers require an AIBOM. Runtime binaries (Ollama, llama.cpp, ffmpeg) span both documents.

Layer	Examples	Tracking mechanism	Covered by SBOM?
Software	fastapi, ollama, chromadb, jinja2	pip/uv lock, PyPI hash	Yes — fully
Runtime binaries	Ollama, llama.cpp, whisper.cpp, ffmpeg, Tesseract	Git submodule SHA, system package version	Partial — submodules yes; system deps no
Local models	pi-qwen-32b, Qwen3-Coder:30b, atla/selene-mini	Ollama manifest ID (SHA256 prefix)	No
Custom Modelfiles	pi-qwen-32b (system prompt overlay), nanda-annotator-v2-q4km	Ollama ID + system prompt hash	No
Cloud endpoints ⚠ untracked	kimi-k2.5:cloud (Moonshot AI), glm-5.1:cloud (Zhipu AI)	Name only — no content hash, no pin	No

The cloud endpoint row is the one that keeps appearing in security conversations and disappearing from tooling. kimi-k2.5:cloud is a first-class runtime dependency of autoresearch.py — the loop calls it on every stuck episode. It is not in pyproject.toml; it is not in uv.lock; it does not appear in pip freeze. The only artifact that surfaces it is the AIBOM.

I2 — The Software SBOM Layer

The harness's Python dependencies are declared in pyproject.toml and resolved to pinned hashes in uv.lock. Generating a machine-readable SBOM from the lock file is one command:

pip install cyclonedx-bom
cyclonedx-py environment --output-format json > sbom-python.cdx.json

The CycloneDX 1.5 JSON output satisfies the NTIA minimum elements for the Python layer. Each component entry carries a PURL (Package URL), the declared version, and the PyPI hash. The runtime binaries that are not Python packages require separate entries:

Component	Version pin	NTIA identifier	License
Ollama	≥ 0.4 (runtime); not pinned	GitHub release tag	MIT
llama.cpp	Git submodule SHA (pinned)	git SHA	MIT
whisper.cpp	Git submodule SHA (pinned)	git SHA	MIT
ffmpeg	System (≥ 6.0); not pinned	OS package name + version	LGPL 2.1
Tesseract OCR	System (≥ 5.0); not pinned	OS package name + version	Apache 2.0
Chromium (Playwright)	Playwright-bundled (pinned transitively)	Playwright version + browser hash	BSD / project-specific

The gap in the software SBOM is system dependencies: ffmpeg and Tesseract are installed outside Python's package manager. Their versions are not in any lock file. In a containerized deployment, the Dockerfile pins them; in a bare-metal deployment like this one, they are environment assumptions rather than declared dependencies.

I3 — The AIBOM: Local Models

The harness currently runs eight distinct model roles across six Ollama tags. Four are standard base models; two are custom Modelfiles. The distinction matters for the AIBOM because a custom Modelfile is not just a base model with a tag — it bakes in a system prompt that shapes every inference call, making the system prompt part of the effective model identity.

Model	Role	Arch / params	Quant	Custom Modelfile
`pi-qwen-32b`	Primary producer / agent	Qwen2, 32.8B	Q4_K_M	Yes — task-completing agent persona
`pi-qwen3.6`	Alternate producer	Qwen3 MoE, 36.0B / 3.6B active	Q4_K_M	Yes — system prompt + sampling overrides
`atla/selene-mini`	Wiggum evaluator (judge)	Llama, 8.0B	Q4_K_M	No — evaluation-specialist fine-tune
`Qwen3-Coder:30b`	Autoresearch proposer	Qwen3 MoE, 30.5B	Q4_K_M	No
`nanda-annotator-v2-q4km`	Lit-review annotator	Qwen2, 7.6B	Q4_K_M	Yes — annotation persona, structured output
`qwen3:8b`	Eval suite, general tasks	Qwen3, 8.0B	Standard	No

The NTIA minimum element set maps cleanly to Ollama model metadata. The unique identifier is the Ollama manifest ID — a SHA256 prefix of the manifest blob that Ollama uses internally. It is not a content hash of the weights themselves (which are stored as separate blobs), but it is a stable identifier that changes when the model is updated via ollama pull. For the purposes of an AIBOM, it is the most actionable identifier available without external registry tooling.

The Modelfile-as-Artifact Problem

A custom Modelfile is a derived artifact: it takes a base model and overlays a system prompt, sampling parameters, and stop tokens. From the AIBOM perspective, the base model and the overlay are both supply chain inputs. The effective behavior of pi-qwen-32b depends on both:

# Effective model identity for pi-qwen-32b has two components:
base_model_id  = "edee0c094406"          # Ollama manifest ID (qwen2.5-32b-instruct-q4_K_M)
system_prompt  = sha256(SYSTEM_PROMPT)   # hash of the task-completing agent persona text

# A change to either component changes the model's effective behavior.
# Only the base_model_id appears in `ollama list`.

Standard SBOM tooling captures neither. The AIBOM tracks both. The current AIBOM.md records the Ollama ID and describes the system prompt's role; a hardened version would include the system prompt SHA256 and re-verify it on each Modelfile rebuild.

If a system prompt is part of the model's effective identity, then changing the system prompt without updating the AIBOM creates a silent divergence between the documented model and the running one. This is the same class of problem as updating a dependency without updating the lock file — the declared state and the runtime state drift apart.

I4 — The Cloud Endpoint Gap

Two models in the Ollama registry are cloud endpoints with no local weights:

Tag	Provider	Role	Local size	Pinnable?
`kimi-k2.5:cloud`	Moonshot AI	Autoresearch Kimi unblock	—	No
`glm-5.1:cloud`	Zhipu AI	Registered; not currently wired	—	No

When autoresearch.py calls ollama.chat(model=KIMI_MODEL, ...), the Ollama daemon forwards the request to Moonshot AI's API. The model that responds may differ from one call to the next — providers update cloud models without version-bump guarantees, and the Ollama cloud-model manifest carries only a routing entry, not a content hash. There is no mechanism to verify that the model responding today is the same as the one that responded yesterday.

Supply chain risk: A cloud model that is updated by its provider between two autoresearch runs can produce different instruction proposals for the same prompt. If those proposals produce different eval scores, the autoresearch loop may accept or reject based on model behavior that has changed underneath it — not based on the instruction change being tested. This is the same threat model as a mutable dependency: the environment changes while the code stays constant.

The AIBOM is the only artifact that surfaces this risk. A standard SBOM audit of the harness repo would find nothing wrong: kimi-k2.5:cloud does not appear in pyproject.toml, uv.lock, or any Python import. It appears only in autoresearch.py as a string default:

KIMI_MODEL = os.environ.get("KIMI_MODEL", "kimi-k2.5:cloud")

This is the canonical gap between an SBOM and an AIBOM: one audits what the package manager knows; the other audits what the running system actually calls.

What Mitigation Looks Like

Full mitigation of the cloud model risk would require provider-side versioning — a kimi-k2.5:cloud@2026-05-25 endpoint that Moonshot AI commits not to update. Providers do not generally offer this. The practical mitigations available on the client side are:

Log the model name and timestamp of every cloud call in the run log (runs.jsonl). If behavior changes, the log identifies which run was affected.
Treat cloud model calls as externally sourced content, not as trusted inference. The Kimi unblock suggestion is injected into the proposer prompt as guidance, not as a direct commit decision — the local proposer still generates the actual candidate.
Cap cloud model invocations to advisory roles. The harness never commits an instruction directly from a cloud model; Kimi's output is one input among several to the local proposer. This limits blast radius if the cloud model is compromised or updated adversarially.
Include cloud endpoints in the AIBOM and treat them as externally verified dependencies — similar to how a software SBOM marks dependencies with known vulnerabilities rather than removing them.

I5 — AIBOM Format and Tooling State

There is no settled standard for AIBOMs yet. The active tracks as of mid-2026:

Initiative	Status	Relevance to agentic systems
CISA / NTIA AI SBOM working group	Framing documents published; minimum elements draft pending	Likely to extend NTIA SBOM minimum elements to cover model artifacts
CycloneDX 1.5 ML-BOM	Released; `type: machine-learning-model` + `modelCard` fields	Best available machine-readable format; covers architecture, quantization, intended use, training data provenance
OWASP CycloneDX extensions	Active; model card schema, pedigree (base model chain)	Covers the Modelfile-as-derived-artifact problem via `pedigree.ancestors`
Hugging Face model cards	Widespread; not a BOM format	Useful as provenance source for AIBOM entries; not machine-readable for audit tooling

The harness AIBOM is a Markdown document rather than a CycloneDX JSON, for two reasons. First, CycloneDX 1.5 ML-BOM tooling for Ollama-managed models does not exist — there is no cyclonedx-py equivalent that introspects ollama list and emits a compliant JSON. Second, the most important entries in the harness AIBOM are the cloud endpoints and custom Modelfiles, both of which require human-in-the-loop curation that automated tooling cannot provide. A Markdown document is maintainable manually; a JSON document generated by a tool that doesn't understand Modelfiles is not.

The right long-term state is a hybrid: generate the Python layer automatically from uv.lock via CycloneDX, and maintain the model layer manually in Markdown with a validation script that cross-checks Ollama manifest IDs against the document's recorded values. Neither half is complete without the other.

The regeneration test: An AIBOM is only useful if it stays current. The AIBOM.md in the repo includes a "How to Regenerate" section that lists the three steps: ollama list for current IDs, ollama show <tag> for architecture verification, and SHA256 of the system prompt text for custom Modelfiles. If a model is updated via ollama pull and the AIBOM isn't regenerated, the manifest ID will be stale. That staleness is detectable — which is the point.

What the Literature Leaves Open

CycloneDX 1.5 defines a pedigree.ancestors field for ML models, intended to capture the base model chain (e.g., Qwen2.5-32B-Instruct → pi-qwen-32b Modelfile). For a model served through Ollama without a HuggingFace model card, the ancestor chain relies on the Modelfile's FROM directive, which may reference another Ollama tag rather than a canonical HuggingFace repo ID. How should AIBOM tooling handle indirect provenance chains where each link is an Ollama tag rather than a versioned artifact?
Cloud endpoint models (kimi-k2.5:cloud, glm-5.1:cloud) are referenced by name but have no content hash available to the client. ISO/IEC 5962:2021 (SPDX) and CycloneDX both require a unique identifier for each component. A cloud model name is not a unique identifier in the cryptographic sense. What is the right AIBOM entry for a component that can only be identified, not verified?
Custom Modelfiles bake a system prompt into the model definition. From a supply chain perspective, is the system prompt a configuration artifact (like a config file, tracked separately from the model) or a model artifact (part of the model's effective identity, requiring its own AIBOM entry)? The distinction affects how prompt changes are tracked in audit logs and whether they trigger a new AIBOM version.
The harness's autoresearch loop modifies SYNTH_INSTRUCTION — a runtime prompt that shapes agent behavior — through an automated optimization process. Each kept experiment is a new effective model configuration. Should the AIBOM track every accepted autoresearch candidate as a new model version? If so, the AIBOM would have 100+ entries for a single training run. If not, only the final committed state is tracked, and intermediate optimization history is invisible to supply chain auditors.
The harness uses atla/selene-mini as the Wiggum evaluator. The evaluator's outputs — per-dimension scores, issue flags — feed directly into the accept/reject decision for new instruction candidates. If ATLA updates the model, the scoring distribution shifts and autoresearch's optimization target silently changes. This is a supply chain dependency not on a software component but on an evaluative function. How should AIBOMs represent dependencies on evaluator models whose behavioral drift has a direct feedback effect on the system being evaluated?

← Previous Agentic Threat Hardening Next → Security Patterns

I1 — The Three Supply Chain Layers

I2 — The Software SBOM Layer

I3 — The AIBOM: Local Models

The Modelfile-as-Artifact Problem

I4 — The Cloud Endpoint Gap

What Mitigation Looks Like

I5 — AIBOM Format and Tooling State

What the Literature Leaves Open

Related in this series