May 28, 2026 • 5 min read • Agentic Harness Engineering

The Search Cache: SQLite TTL Caching for DDGS Queries and Research Contexts

Two SQLite tables handle two different caching problems: per-query DDGS result caching for all runs, and whole-research-context caching for back-to-back autoresearch experiments — both with 24-hour TTL, SHA-256 keys, and lazy expiry on write.

DDGS rate-limits aggressively. During autoresearch experiments — where the same five eval tasks run dozens of times in sequence — without caching, the search loop would hit DDGS for the same queries on every experiment iteration, burning rate limit quota and adding 10–30 seconds of network latency per eval run. The search cache eliminates that latency for repeated queries and makes the eval loop reproducible: all runs within the TTL window see exactly the same search results.

Two tables, two use cases

search_cache

Always active — no opt-in required
Key: SHA-256 of lower-cased, whitespace-normalised query string
Value: JSON-serialized list of DDGS result dicts
TTL: 24 hours (configurable per call)
Eviction: lazy on write — expired rows deleted when a new result is stored
Scope: individual search queries

research_cache

Opt-in: only active when RESEARCH_CACHE=1 is set
Key: SHA-256 of task string + task_type
Value: full gather_research() output — merged context, search round count, novelty scores
TTL: 24 hours
Activated by autoresearch.py so interactive runs are unaffected
Scope: full research tasks

The research cache is a superset of the search cache. If RESEARCH_CACHE=1, an autoresearch experiment that runs eval task T_A 40 times performs all DDGS calls and novelty scoring exactly once — subsequent iterations skip the entire gather step and go straight to synthesis. This makes autoresearch experiments ~50% faster when the research context hasn't changed.

The cache key

def _cache_key(query: str) -> str:
    normalised = " ".join(query.lower().split())
    return hashlib.sha256(normalised.encode()).hexdigest()

Normalisation collapses whitespace and lowercases before hashing. This means "Best practices for RAG pipelines" and "best practices for rag pipelines" produce the same cache key, and so does "best practices for rag pipelines" (extra spaces). The SHA-256 hex string is the SQLite primary key — no risk of collision in practice for a local cache with thousands of entries.

Read / write path

cached_search(query, search_fn) → get(query) → hit or None → on miss: search_fn(query, n) → put(query, results, ttl)

cached_search() is the high-level interface used in agent.py. It wraps any callable search function — the DDGS call, the headed browser search, or a test stub — making the cache transparent to the caller. The underlying get() function checks expiry and deletes the row if expired before returning None, so expired entries don't linger indefinitely even without a background cleanup job.

Lazy eviction

put() runs a single DELETE after every write:

# Upsert the new entry
conn.execute("INSERT INTO search_cache ... ON CONFLICT DO UPDATE ...", ...)

# Lazy eviction — cheap because expires_at is indexed
conn.execute("DELETE FROM search_cache WHERE expires_at < ?", (now,))

There's no background thread, no scheduled job, and no separate maintenance step. Every write sweeps expired entries. The expires_at index makes this O(log n) — fast enough to run on every write without measurable overhead. The trade-off is that entries expire precisely at TTL boundary only if a write happens to trigger eviction; a dormant cache accumulates expired rows. For a research harness with constant write activity during experiments, this is not a problem in practice.

Schema migration

The database was added early in the harness's development, before created_at, expires_at, search_rounds, and novelty_scores columns existed. Rather than requiring users to delete the database and lose cached results, _migrate() adds any missing columns on connection:

def _migrate(conn, table, col_defs):
    existing = {row[1] for row in conn.execute(f"PRAGMA table_info({table})")}
    for col_def in col_defs:
        col_name = col_def.split()[0]
        if col_name not in existing:
            conn.execute(f"ALTER TABLE {table} ADD COLUMN {col_def}")
    conn.commit()

Migration runs on every connection open but is fast — PRAGMA table_info is a metadata query, not a table scan. The pattern handles all historical database states: fresh installs get all columns from the CREATE TABLE, existing databases with missing columns get them added silently.

CLI maintenance

# Print stats: entry count, oldest, newest, expired count (both tables)
python search_cache.py

# Delete all entries
python search_cache.py --clear

# Delete only expired entries (manual eviction)
python search_cache.py --expired

The research cache (RESEARCH_CACHE=1) makes autoresearch experiments faster, but it also means experiments that change the search queries (e.g. planner prompt changes) reuse stale research contexts from before the change. Set RESEARCH_CACHE=0 or run python search_cache.py --clear when evaluating changes that affect the gather step rather than just synthesis.

ChromaDB semantic layer

On top of the SQLite exact-key cache, search_cache.py maintains two ChromaDB collections that enable semantic fallback and the Research History view:

search_cache_vec — when an exact SHA-256 key miss occurs, the module queries ChromaDB for the nearest cached query by cosine distance. If a sufficiently similar query exists (distance < SEMANTIC_CACHE_THRESHOLD = 0.15), its results are returned rather than hitting DDGS again. Equivalent queries worded differently — "RAG best practices" vs "retrieval-augmented generation tips" — resolve to the same cached result.
research_cache_vec — full research contexts are embedded and stored alongside their SQLite entry. This collection powers GET /api/research-history, which semantic-searches across all past research contexts by query string, returning ranked results with similarity scores.

Both collections use the harness embedding model (falling back to all-MiniLM-L6-v2 if the inference shim is unavailable). The ChromaDB client is a lazy singleton — it's only initialized on first use, so the SQLite-only code path remains unchanged when ChromaDB is not installed.

The semantic cache threshold of 0.15 cosine distance is conservative by design. At this distance, queries are near-paraphrases of each other. Looser thresholds risk serving results for a semantically adjacent but meaningfully different query — better to re-fetch than to silently return off-topic results.

Two tables, two use cases

search_cache

research_cache

The cache key

Read / write path

Lazy eviction

Schema migration

CLI maintenance

ChromaDB semantic layer

Related posts