May 18, 2026 • 15 min read • Agentic Harness Engineering

The Alt-Data Pipeline: From Beige Book to Paper Trading Thesis

Five new enrichment layers — BEA economic accounts, cross-sectional market signals, yfinance fundamentals, Alpaca portfolio context, and a structured thesis synthesis template — extend the harness from economic data mining into a human-reviewed paper trading pipeline. The report comes before any trade gets greenlit.

The FRED RAG experiment established that live quantitative series data produces a +0.40 composite score lift over web-search-only control — three times the effect of Beige Book prose. The mechanism is straightforward: numeric observations are harder to hallucinate around than qualitative summaries, and the evaluator's specificity and grounded dimensions reward the difference.

But FRED is a macro-level instrument. It tells you that GDP contracted at −0.6% in Q1 2025 and that fixed investment accelerated to +6.4% in Q1 2026. It does not tell you which companies are benefiting from that capex surge, what their momentum looks like relative to peers, whether their valuation leaves room for the thesis to pay off, or whether you already own them. For that, you need more layers.

This post documents the five layers added to close that gap: BEA economic accounts, market signals computed from the trending tickers database, yfinance fundamentals, Alpaca portfolio state, and a structured synthesis instruction that forces the model to output machine-readable theses rather than free-form prose.

The Enrichment Stack

The full pipeline, in the order layers fire during gather_research():

research layer
[web search context] + [Beige Book narrative]
economic APIs
+ [FRED series: rates, employment, inflation, housing] + [BEA accounts: GDP components, state income, trade balance]
market data
+ [Market Signals: momentum rank, Hurst H, Bollinger z, cointegration] + [yfinance: P/E, revenue growth, analyst targets, upcoming earnings]
portfolio state
+ [Alpaca: current positions, buying power, pending orders]
synthesis
→ LLM synthesis → Wiggum eval → Report (.md) → [human review] → optional Alpaca order execution

Each layer is intent-gated: FRED fires only when _themes_for_query() detects economic keywords; BEA fires on similar themes; market signals and yfinance fire when equity intent is detected; Alpaca fires on trading intent. A pure OSINT task gets none of them. A trading thesis task gets all five.

Layer 1: BEA Economic Accounts

The Bureau of Economic Analysis publishes GDP components, state-level income, price parities, and international trade data at a structural level that FRED does not cover. Where FRED gives you the real GDP growth rate, BEA gives you the decomposition: which industries drove the growth, which states led or lagged, and how the current account balance evolved across goods and services.

The bea_tool.py catalogue covers four BEA datasets across 17 curated queries:

DatasetCoverageCatalogue entries
NIPANational income and product accounts: GDP components, personal income2
RegionalReal GDP, personal income, price parities by state and county9
GDPbyIndustryValue-added decomposition across 22 industry groups2
ITAInternational transactions: goods balance, services balance, total3

One implementation detail worth noting: BEA's API does not support the LAST{N} year shorthand that FRED uses. Passing Year=LAST5 returns error 201 (“No data exists for the Year/Frequencies passed”). The fix is a one-line helper that computes specific year strings dynamically:

def _yr(n: int) -> str:
    cur = date.today().year
    return ",".join(str(y) for y in range(cur - n + 1, cur + 1))

Every catalogue entry now uses _yr(5), _yr(3), or _yr(2) instead of string literals. This is evaluated at module load time, so the catalogue always reflects the current calendar year without manual updates.

The citation format for BEA carries enough context to trace any claim back to its source table:

[BEA:NIPA:T10101:US:2026Q1]
[BEA:Regional:SAGDP9N:STATE:2024]
[BEA:GDPbyIndustry:1:ALL:2025]
[BEA:ITA:BalGdsSrvs:AllCountries:2025Q4]

BEA key activation. Unlike FRED, BEA API keys require email activation before first use. The activation email link sometimes takes several hours to propagate on BEA's backend even after clicking. The correct diagnostic is a direct GetDataSetList call: if it returns error code 4, the key is not yet active. If it returns a dataset list, the key is live and you can proceed.

Layer 2: Market Signals

The trending tickers database (trending-tickers-unified.db) is populated by trending_tickers_skill.py, which runs on a schedule and stores daily OHLCV data for a momentum-screened universe. The market_signals_skill.py layer reads that database and computes five signal families per ticker:

Cross-sectional momentum rank

Percentile rank of N-day returns across the full universe. An adaptive window selector checks 175d, 120d, 90d, 60d, and 30d in sequence and picks the longest window where at least 50% of tickers have sufficient history. This prevents the rank from being computed over inconsistent lookback periods when the database is young.

Hurst exponent (R/S method)

Estimated from log-return R/S analysis across lags [4, 8, 16, 32, 64, 128]. H < 0.50 indicates mean-reverting behavior; H > 0.65 indicates a persistent trending regime. A linear fit of log(lag) vs. log(R/S) over at least three lags is required — tickers with fewer than 64 price observations return None.

Bollinger band z-score

z = (last_price − SMA₂₀) / σ₂₀. Values above +1.5 flag overbought conditions; below −1.5 flag oversold. Combined with the Hurst regime, this distinguishes between a trending breakout (high H, high z) and a mean-reversion setup (low H, high z).

ADF stationarity + Johansen cointegration (pairs)

For each pair in the universe, the Johansen test identifies cointegrated pairs at the 95% confidence level. Cointegrated pairs feed an Ornstein–Uhlenbeck half-life estimate: Δspread = β × spreadt−1; half_life = −log(2) / log(1 + β). Short half-lives (under 20 days) identify viable mean-reversion candidates.

SMA-20/50 cross signal

Classifies each ticker as golden (SMA20 crossing above SMA50), death (crossing below), above, or below. Used as a secondary confirmation signal rather than a primary entry trigger.

All signals carry citation IDs that survive into the synthesis context and can be scored by the Wiggum evaluator for groundedness:

[SIGNAL:momentum_rank:NVDA:2026-05-28]
[SIGNAL:hurst:ADSK:2026-05-28]
[SIGNAL:bb_zscore:TSLA:2026-05-28]
[SIGNAL:johansen:GOOG-META:2026-05-28]

Layer 3: yfinance Fundamentals

Market signals tell you what price is doing. yfinance tells you what the business is doing and what the market thinks it's worth. The yfinance_tool.py layer pulls a snapshot for each ticker detected in the query or drawn from the trending tickers database (up to 10 by default):

Field groupFields fetched
ValuationP/E (trailing + forward), EPS, market cap, beta
FinancialsRevenue, revenue growth YoY, gross margin, operating margin
Analyst consensusRating (1–5 scale), # analysts, price target range and mean, implied upside
PositioningShort ratio (days to cover)
Earnings calendarNext earnings date, last reported EPS vs. estimate
Recent newsTop 3 headlines from yf.Ticker.news

Ticker extraction works in two modes. If the query contains explicit uppercase 1–5 character sequences (filtered against a stopword list), those tickers are fetched directly. If no explicit tickers are found, the tool falls back to the trending database, pulling the tickers with the most recent and densest OHLCV coverage.

A live snapshot from the day this post was written, for illustration:

### [YF:NVDA:snapshot:2026-05-28] NVIDIA Corporation (NVDA)
Sector/Industry: Technology / Semiconductors
Price: $214.25  52w: $132.92 – $236.54
Valuation: P/E=32.81 (fwd 16.93)  EPS=6.53  MCap=$5.19T  β=2.24
Financials: Revenue=$253.5B (+85.2% YoY)  GrossMargin=+74.1%  OpMargin=+65.6%
Analysts (58): Strong Buy  Target=$180–$500 mean=$295.69  (+38.0% upside)
Short ratio: 1.92 days to cover

The 58-analyst strong-buy consensus, +38% implied upside from mean target, and 85% revenue growth are exactly the kind of figures that move a thesis from vague macro correlation (“AI capex is accelerating”) to specific claim (“NVDA at 32x trailing earnings with 38% upside to consensus target, supported by +85% revenue growth and expanding margins”). The citation [YF:NVDA:snapshot:2026-05-28] makes that claim attributable and Wiggum-scorable.

Layer 4: Alpaca Portfolio Context

Any trading system that ignores current positions will eventually recommend buying what you already own or selling what you don't have. The Alpaca layer solves this by injecting a live portfolio snapshot for any query with trading intent:

## Portfolio State (Alpaca PAPER)

Account
- Portfolio value: $116,771.77
- Cash: $41,533.83
- Buying power: $339,530.04
- Day trades today: 0

Open Positions (3)
- [PORTFOLIO:TSLA] TSLA long 106sh entry=$343.72 current=$442.10
  MV=$46,863 P&L=+$10,424 (+27.9%)
- [PORTFOLIO:GOOG] GOOG long 45sh entry=$198.25 current=$386.12
  MV=$17,376 P&L=+$8,453 (+93.9%)
- [PORTFOLIO:NVDA] NVDA long 53sh entry=$179.60 current=$214.25
  MV=$11,355 P&L=+$1,836 (+19.2%)

The model now knows that $41k in cash is available, that NVDA is already held at a 19% gain, and that adding to it would increase concentration in semiconductors. This directly influences how the synthesis weights new thesis ideas against existing exposure. The [PORTFOLIO:{ticker}] citation format is included in the structured thesis template so the model can explicitly reference existing holdings in its risk factor sections.

The Structured Thesis Template

The fifth layer is not a data source but a synthesis instruction. When _is_trading_task() detects trading-intent keywords in the query — phrases like trading thesis, paper trade, trade setup, alpaca, or long thesis — the synthesis instruction switches from the default research template to SYNTH_INSTRUCTION_TRADING:

SYNTH_INSTRUCTION_TRADING = """
Output a trading thesis report in markdown starting with # (no preamble).

Structure each thesis exactly as follows:

## Thesis N: [Direction] [TICKER] — [One-line rationale]
**Direction**: Long | Short
**Ticker**: SYMBOL
**Conviction**: High | Medium | Low
[THESIS:{direction}:{ticker}:{rationale-slug}:{date}]

**Macro context**: cite [FRED:...], [BEA:...], or Beige Book signals
**Signal support**: cite [SIGNAL:momentum_rank:...], [SIGNAL:hurst:...]
**Fundamentals**: cite [YF:{ticker}:snapshot:{date}]
**Risk factors**: 2-3 specific bearish counters

**Entry**: price level or range
**Target**: price target with basis
**Stop**: stop-loss level with basis
**Time horizon**: weeks | 1-3 months | 3-6 months | 6-12 months
**Suggested position size**: % of paper portfolio
"""

The template forces three things the default instruction does not. First, every claim must cite a specific data source from the enrichment context — a FRED series ID, a BEA table, a signal citation, or a yfinance snapshot. The evaluator's groundedness dimension scores against this. Second, risk factors are mandatory. A thesis without bearish counters is not a thesis, it's a wish. Third, the [THESIS:...] citation is machine-readable: direction, ticker, rationale slug, and date are all parseable fields that could drive downstream execution.

The citation taxonomy. By the time synthesis fires, the context window contains six distinct citation namespaces: [FRED:...], [BEA:...], [SIGNAL:...], [YF:...], [PORTFOLIO:...], and [THESIS:...]. The Wiggum evaluator already scores groundedness by checking whether the synthesis output contains verifiable claims tied to specific sources. Adding new citation types extends the surface area it can evaluate against without changing the evaluation rubric.

Human in the Loop

The Alpaca integration is deliberately split into two phases. The enrichment layer — injecting current portfolio state — fires automatically whenever trading intent is detected. The execution layer — placing orders — does not.

automatic (fires on trading intent)
gather_research() → enrich with portfolio context → synthesize → Wiggum eval → Report (.md)
manual (explicit command only)
conda run -n ollama-pi python -m harness.alpaca_tool buy NVDA 10

The report is the gate. A thesis that looks compelling in the context window may not survive contact with the actual numbers — position sizing that implies 80% concentration in one sector, a stop-loss that's already been breached, or a target that's below the current price. The report surfaces these problems before any capital is committed, even paper capital.

The Alpaca CLI supports market orders, limit orders, dollar-notional sizing, and good-till-cancelled flags:

# review the report first, then execute selectively
python -m harness.alpaca_tool positions          # check current state
python -m harness.alpaca_tool buy NVDA 10        # market order, 10 shares
python -m harness.alpaca_tool buy NVDA --notional 2000 --limit 212.00 --gtc
python -m harness.alpaca_tool cancel-all         # pull all pending orders

The Pipeline DAG

The full enrichment graph is visualized in the dashboard's Pipeline view, added alongside the other nav items. Each node shows its data category; edges show the direction of data flow. The synth → orders edge is dashed to indicate it is conditional on explicit human action rather than automatic.

The DAG has two structural properties worth naming. The first is that all enrichment layers are additive: each one appends to the context window, and any individual layer can be disabled without breaking the others (via environment flags like HARNESS_BEA_DISABLE=1). The second is that the synthesis step sees all layers simultaneously — it can cite a FRED rate, a BEA state GDP figure, a momentum rank, and a yfinance analyst target in the same thesis section, because they all live in the same context merge.

This is a different architecture than a multi-agent pipeline where specialized agents hand off results in sequence. Here, all data collection happens in parallel during gather_research(), then merges into a single context block. The LLM sees the full picture in one shot rather than receiving pre-summarized outputs from upstream agents. Whether this approach or the agentic handoff approach produces better synthesis quality is an open empirical question; the RAG experiments to date have only tested single-pass synthesis.

What's Next

The pipeline is now complete from macro narrative to paper trade. The natural next experiments are:

DirectionWhat it tests
Run the thesis generator against live market conditionsWhether the model produces well-structured, grounded theses with all six citation types present
Wiggum scoring on thesis outputWhether the trading instruction template achieves higher groundedness scores than the default research template on finance tasks
Backtest the signal layerWhether the Hurst H > 0.65 trending filter and cross-sectional momentum rank have predictive validity over 30- and 90-day horizons in the current universe
Broader alt-data sourcesPrediction market data (Kalshi, Polymarket), Google Trends, SEC EDGAR filings — each following the same intent-gated enrichment pattern

The thesis generation experiment is the most immediately actionable: it closes the design loop, produces a Wiggum-scorable artifact, and requires no new infrastructure. Running it against the same six tasks used in the FRED and Beige Book RAG experiments would produce a direct comparison of how much the additional four layers move the needle on composite score.