Blog

Thoughts on ML, computer vision, and AI research

SBOM and AIBOM for Agentic Systems

May 25, 2026 • 12 min read • Agentic Harness Engineering

pip freeze doesn't know about kimi-k2.5:cloud. An AIBOM does. Supply chain transparency for the full stack: Python packages, local GGUF models, custom Modelfiles with system-prompt overlays, and cloud endpoints that appear nowhere in a traditional SBOM.

Agentic Threat Hardening: The OWASP Top 10, Applied

May 25, 2026 • 25 min read • Agentic Harness Engineering

OWASP's Agentic Security Initiative Top 10 maps ten attack classes that emerge when LLMs gain tools, memory, and autonomy. The full coverage audit against the harness—four defenses covered, four partial, two gaps—with nine prioritized mitigations, research citations, and code.

Leverage: What the Metric Measures, and Why the Replacement Framing Gets the Math Wrong

May 25, 2026 • 12 min read • Analysis

The harness computes a leverage value on every run. A close reading of the formula—token amortization, the CapEx gap, the TAC calibration assumption—produces a quantitative argument against the corporate narrative that AI “naturally” targets lower-value human capital first.

Agentic Harness Engineering: The Architecture Series

May 25, 2026 • 12 posts • 27 patterns

Twelve posts covering the complete harness design across eight categories—plus a pattern catalog presenting all 27 named agentic system design patterns in textbook structure: Intent, Problem, Solution, Structure, and Related patterns.

Closing the Loops

May 25, 2026 • 10 extensions • Agentic Harness Engineering

Ten posts extending the self-improvement patterns from Section G — the regression harness, the autoresearch optimizer, the voice pipeline, operational telemetry, and RAG context enrichment experiments forming compounding feedback loops.

Seven Principles and a Moving Frontier: The Harness Roadmap

May 24, 2026 • 14 min read • Agentic Harness Engineering

The goals that stayed constant, the milestones that multiplied, and what three rounds of roadmap revision reveal about building self-improving systems. How each completed milestone made the next constraint visible.

The Harness Data Model: Schemas, Entities, and Query Patterns

May 23, 2026 • 7 subsections • Agentic Harness Engineering

A complete reference for the five-file JSONL schema at the core of the harness: entity hierarchy, ID format, per-stage token accounting, message role taxonomy, and querying patterns in jq, pandas, and DuckDB.

Experiments and Alignment Foundations

May 23, 2026 • 2 posts

Four production experiments that exposed the evaluator ceiling, the producer ceiling, and the synthesis instruction bottleneck—plus multi-objective alignment methods beyond scalar rewards.

Small Language Models and the Efficiency-Accuracy Frontier

May 8, 2026 • 16 min read

SLM-Bench: 15 models, 9 tasks, 4 hardware configs. Accuracy and energy efficiency don't co-optimize—so model selection is a portfolio problem, not a ranking problem. How the three-model architecture from experiment-04 operationalizes this insight.

Literature Reviews: Tools, Knowledge Graphs, Security, and Fine-Tuning

May 1, 2026 • 4 posts

Four surveys covering agentic tool use and planning, structured knowledge extraction and graph retrieval, prompt injection attack patterns, and fine-tuning and alignment deep cuts—and what each means for harness design.

Literature Reviews: Evaluation, Judges, and Structured Queries

April 29, 2026 • 5 posts

Five surveys: benchmark contamination and judge reliability, evaluation uncertainty and calibration, automated evaluation robustness, SPARQL-grounded knowledge queries, and judge benchmarks with test-time scaling.

Circuit Extraction: Interpreting Object Detectors

January 13, 2026 • 14 min read

Using activation patching and co-activation analysis to extract the minimal computational circuit for pot detection in Faster R-CNN.

Object Detection on Drone Orthomosaics with SAM

January 10, 2026 • 8 min read

An overview of using Meta's Segment Anything Model for automated object detection in high-resolution aerial imagery, with applications in precision agriculture.

Sparse Linear Probing for Efficient Detection

January 9, 2026 • 10 min read

Using L1-regularized linear probes to identify minimal feature subsets from SAM and Faster R-CNN that are sufficient for pot detection.

Extracting Features from Vision Model Backbones

January 7, 2026 • 12 min read

A technical guide to extracting and visualizing internal representations from SAM and Faster R-CNN for interpretability research.

Mechanistic Interpretability for Agricultural AI

January 5, 2026 • 10 min read

Exploring how mechanistic interpretability techniques can help us understand what vision models learn about agricultural environments and build more trustworthy AI systems.

SAM vs Faster R-CNN: A Practical Comparison

January 3, 2026 • 10 min read

Comparing Segment Anything Model and Faster R-CNN for aerial object detection—architecture, fine-tuning approaches, and when to use each.

Fine-Tuning Vision Foundation Models

December 28, 2025 • 12 min read

A practical guide to fine-tuning strategies for vision models like SAM and Faster R-CNN, with insights on data efficiency and domain adaptation.

Building a GeoTIFF Object Detection Web App

December 28, 2025 • 5 min read

A walkthrough of building a web application for running Faster R-CNN inference on geospatial imagery with FastAPI, WebSockets, and Leaflet.

Training Faster R-CNN for Geospatial Object Detection

December 20, 2025 • 8 min read

A deep dive into training object detection models on aerial imagery, from SAM masks to production-ready Faster R-CNN with hard negative mining.