May 19, 2026 • 8 min read • Agentic Harness Engineering

The Memory View: 2,173 Observations, Quality Signals, and an Ontology Graph

Every run the harness completes writes structured observations into a persistent semantic store. The Memory view is the interface into that store: a paginated, searchable list of 2,173 memories across 87 pages, a detail panel with full provenance and inline RLHF quality signals, and a force-directed ontology graph that maps concept relationships across the entire accumulated corpus.

Most agentic systems treat each run as stateless. The harness deliberately does not. After every research task completes, the run is decomposed into discrete observations — named facts extracted from the synthesis, tagged with source type, quality score, and task provenance — and written into a ChromaDB-backed semantic store. The Memory view is how you inspect and curate what the system has learned.

Memory subtitle: Semantic observation store · RLHF quality signals · Ontology graph — three distinct functions in one view.

The Memories List

The default tab shows all accumulated memories in reverse-chronological order. As of this writing the store holds 2,173 entries across 87 pages.

Memory list view showing 2,173 memories with type badges and quality scores

The Memories tab: 2,173 observations, paginated 25 per page, filterable by sentiment and searchable by content.

Each row in the list is a memory card carrying four pieces of inline metadata: the date the observation was created, its source type (a controlled vocabulary), a Wiggum quality score where applicable, and a fact count. The type vocabulary reflects the full range of things the harness does:

Research types

lit-review — arXiv/S2 synthesis
research — general web + RAG
best_practices — structured rules extraction
osint — open-source intelligence

System types

skill — skill-level performance observations
browser_history — browser task observations
market — market signals tasks
research — general synthesis outputs

The filter bar at the top offers four views: All, Positive, Neutral, and Negative. These map directly to the RLHF sentiment attached to each memory. The free-text search box queries against memory titles and narrative content. Together they make it possible to, for example, find all positively-rated lit-review observations from a specific domain without scrolling through 87 pages.

The Detail Panel

Clicking any memory opens a full detail panel on the right. This is where the semantic structure of the memory store becomes concrete.

Memory detail panel showing provenance, task string, narrative, quality signals, and facts for a SLM vs LLM efficiency lit review

Detail view for a lit-review memory: run ID, output path, full task string, narrative summary, upvote/downvote quality signals, and extracted facts.

The detail panel is structured around four sections:

Provenance Run ID (e.g. 20260601T001320Z-01e19668aae6) and the exact output file path on disk. Every memory is traceable to the run that produced it.
Task The full task string passed to the harness, including all skill flags and output path. Reproducing the run is a copy-paste away.
Narrative A one-to-two sentence summary of what the run found, auto-generated at write time. The narrative is what surfaces in search results and graph node labels.
Quality signal Inline upvote / downvote buttons with the current sentiment score displayed in real time. Clicking either writes an RLHF signal back to the store — these votes accumulate and can drive future DPO fine-tuning.
Facts Discrete extracted observations from the run output. Each fact is a claim the synthesis made that survived extraction, stored as a named string in the memory record.

The quality signal mechanism is the most consequential part of the detail panel. The upvote/downvote interface is not cosmetic — every interaction writes a labeled preference pair to the RLHF store. As the harness accumulates signal, those pairs become the training data for direct preference optimization. The pipeline from "human rates a run output" to "model improves on similar tasks" runs through this UI.

The Review Tab

Between the Memories list and the Graph sits a Review tab. This surfaces memories that have not yet received a quality signal — newly created observations awaiting human judgment. The workflow is: run completes → memory is written → memory appears in Review → human votes → memory moves to the rated pool. Keeping the Review queue short is a proxy for how current the preference signal is.

The Ontology Graph

The Graph tab renders all 2,173 memories as a force-directed network. Nodes are concepts extracted from memory narratives; edges connect concepts that co-occur across runs. The layout is physics-simulated: related concepts cluster, isolated observations drift to the periphery. The result is a visual map of what the harness knows and how those topics are connected.

Note: The graph requires a separate API call that builds the full ontology at render time. With inference running concurrently on the same machine (e.g. lit review jobs), the graph endpoint can time out. Switch away and back to retry once the model server is free.

At 2,173 nodes the graph is dense. The intended use is not to read individual nodes but to identify clusters — topic areas where the harness has accumulated concentrated signal — and gaps, where nodes are sparse or isolated. A well-used harness over time produces a graph whose cluster structure mirrors the research agenda: dense in areas of repeated investigation, sparse at the frontier.

What the Memory View Enables

The practical use cases for the Memory view fall into three categories:

Provenance lookup

When a run output contains a claim you want to verify, search the memory for the task string or run ID to find the exact source file and the full task that generated it.

Quality curation

The Review queue surfaces unrated memories. Regular curation keeps the RLHF signal fresh and ensures the preference store reflects current quality standards, not just historical runs.

Coverage mapping

The Graph view shows which topics have deep coverage (dense clusters) and which are underexplored (isolated nodes). Use it to identify where the next research batch should focus.

Signal-guided retrieval

Filter to Positive memories only, then search a domain. The result is a curated subset of the highest-rated prior research on that topic — a fast orientation before running new tasks.

The Numbers

As of May 19, 2026: 2,173 memories across 87 pages. The store has been accumulating since the harness was first deployed, and includes research, lit-review, OSINT, skill-level, and browser-task observations. The oldest entries are from early experiments; the newest reflect lit reviews kicked off minutes before this screenshot was taken. The store grows by 4–10 new observations per run depending on how many discrete facts the synthesis produces.

The memory store doubles as the harness's institutional knowledge. Any task that has been run before leaves a trace that future runs can learn from — through explicit retrieval, through RLHF signal, or through the DPO pairs that quality votes eventually generate.