Blog
Thoughts on ML, computer vision, and AI research
SBOM and AIBOM for Agentic Systems
pip freeze doesn't know about kimi-k2.5:cloud. An AIBOM does. Supply chain transparency for the full stack: Python packages, local GGUF models, custom Modelfiles with system-prompt overlays, and cloud endpoints that appear nowhere in a traditional SBOM.
Agentic Threat Hardening: The OWASP Top 10, Applied
OWASP's Agentic Security Initiative Top 10 maps ten attack classes that emerge when LLMs gain tools, memory, and autonomy. The full coverage audit against the harness—four defenses covered, four partial, two gaps—with nine prioritized mitigations, research citations, and code.
Leverage: What the Metric Measures, and Why the Replacement Framing Gets the Math Wrong
The harness computes a leverage value on every run. A close reading of the formula—token amortization, the CapEx gap, the TAC calibration assumption—produces a quantitative argument against the corporate narrative that AI “naturally” targets lower-value human capital first.
Agentic Harness Engineering: The Architecture Series
Twelve posts covering the complete harness design across eight categories—plus a pattern catalog presenting all 27 named agentic system design patterns in textbook structure: Intent, Problem, Solution, Structure, and Related patterns.
Closing the Loops
Ten posts extending the self-improvement patterns from Section G — the regression harness, the autoresearch optimizer, the voice pipeline, operational telemetry, and RAG context enrichment experiments forming compounding feedback loops.
- The Regression Harness → 9 tasks, 11 criterion functions, CRD runner, and three-persona experiment panel
- When the Loop Defeats Itself → 90 experiments, three nested failure modes, four convergence detectors
- From Hill-Climbing to Pareto → GEPA's Pareto frontier explains the autoresearch oscillation
- What SkillOpt Gets Right → Three gaps SkillOpt exposes: validation gating, fast/slow epochs, skill-as-artifact
- The Audio Data Flywheel → Voice requests as ASR training data; NeMo RL as the second loop
- The Telemetry Router →
runs.jsonlsimultaneously seeds autoresearch, lit-review, and DPO curation - Memory as Infrastructure → Dual-store (SQLite + ChromaDB), quality-weighted ranking, RLHF feedback, UMAP ontology graph
- Building the Detectors → What the four proposed convergence detectors became after 107 experiments: regex over cosine, Kimi over hard exit
- Beige Book RAG + DPO Cold Start → Prepend: −0.08. Append: +0.13. A falsified hypothesis and what the position swap generates as DPO training signal
- Live Data Beats Narrative: FRED RAG Results → fred_end: +0.40 vs control. Live FRED series 3× more useful than Beige Book prose. Position-swap finding replicates.
- OSINT Enrichment: Nine Layers of Passive Recon → DNS, RDAP, crt.sh, Wayback, Shodan, urlscan, OTX — parallel fetch, zero-config baseline, wired into gather_research()
Seven Principles and a Moving Frontier: The Harness Roadmap
The goals that stayed constant, the milestones that multiplied, and what three rounds of roadmap revision reveal about building self-improving systems. How each completed milestone made the next constraint visible.
The Harness Data Model: Schemas, Entities, and Query Patterns
A complete reference for the five-file JSONL schema at the core of the harness: entity hierarchy, ID format, per-stage token accounting, message role taxonomy, and querying patterns in jq, pandas, and DuckDB.
Experiments and Alignment Foundations
Four production experiments that exposed the evaluator ceiling, the producer ceiling, and the synthesis instruction bottleneck—plus multi-objective alignment methods beyond scalar rewards.
Small Language Models and the Efficiency-Accuracy Frontier
SLM-Bench: 15 models, 9 tasks, 4 hardware configs. Accuracy and energy efficiency don't co-optimize—so model selection is a portfolio problem, not a ranking problem. How the three-model architecture from experiment-04 operationalizes this insight.
Literature Reviews: Tools, Knowledge Graphs, Security, and Fine-Tuning
Four surveys covering agentic tool use and planning, structured knowledge extraction and graph retrieval, prompt injection attack patterns, and fine-tuning and alignment deep cuts—and what each means for harness design.
Literature Reviews: Evaluation, Judges, and Structured Queries
Five surveys: benchmark contamination and judge reliability, evaluation uncertainty and calibration, automated evaluation robustness, SPARQL-grounded knowledge queries, and judge benchmarks with test-time scaling.
Circuit Extraction: Interpreting Object Detectors
Using activation patching and co-activation analysis to extract the minimal computational circuit for pot detection in Faster R-CNN.
Object Detection on Drone Orthomosaics with SAM
An overview of using Meta's Segment Anything Model for automated object detection in high-resolution aerial imagery, with applications in precision agriculture.
Sparse Linear Probing for Efficient Detection
Using L1-regularized linear probes to identify minimal feature subsets from SAM and Faster R-CNN that are sufficient for pot detection.
Extracting Features from Vision Model Backbones
A technical guide to extracting and visualizing internal representations from SAM and Faster R-CNN for interpretability research.
Mechanistic Interpretability for Agricultural AI
Exploring how mechanistic interpretability techniques can help us understand what vision models learn about agricultural environments and build more trustworthy AI systems.
SAM vs Faster R-CNN: A Practical Comparison
Comparing Segment Anything Model and Faster R-CNN for aerial object detection—architecture, fine-tuning approaches, and when to use each.
Fine-Tuning Vision Foundation Models
A practical guide to fine-tuning strategies for vision models like SAM and Faster R-CNN, with insights on data efficiency and domain adaptation.
Building a GeoTIFF Object Detection Web App
A walkthrough of building a web application for running Faster R-CNN inference on geospatial imagery with FastAPI, WebSockets, and Leaflet.
Training Faster R-CNN for Geospatial Object Detection
A deep dive into training object detection models on aerial imagery, from SAM masks to production-ready Faster R-CNN with hard negative mining.