Phase 5 — eight strategies, all "no edge"
The autonomous-window phase (2026-05-15) attempted three strategy spikes on the 5-ticker × 3y window, then five more variants across the day. All eight ended in one of three categories.
The strategies attempted
| Spike | Strategy | Honest mean Sharpe | Verdict |
|---|---|---|---|
| S20 | refined-news (density × tone) | -3.90 ± 1.63 (21d), +2.55 ± 0.20 (180d) | scale-artifact post-L24 — collapsed to ~0 |
| S21 | quality-tilted low-volatility | +0.76 ± 4.96 (21d), +0.46 ± 1.86 (180d) | scale-invariance pathology of negative_sharpe |
| S26 | PEAD (text-confirmed earnings surprise) | -0.06 ± 0.38 (21d), +1.34 (180d post-L24) | window-position artifact; 21d = 0, 180d = +1.34 |
| S22 | pure-price momentum diagnostic | mixed sign, single-seed | shuffled_target FAIL — harness regime trade-off |
The three failure modes
Scale artifact. The strategy reported a strong Sharpe, but it came from feature-magnitude mismatch — some columns were 1000× larger than others, and the model latched onto magnitude rather than signal. S20 post-L24 was the canonical example: honest Sharpe collapsed from +2.55 to ~+1.0 with mean_pnl = 0 to 7 decimals once features were properly standardised.
Scale-invariance pathology. The strategy reported a non-zero Sharpe, but its actual P&L was zero to seven decimal places. Positions had collapsed to near-zero. negative_sharpe is scale-invariant, so the Sharpe was numerically defined but meaningless. S21 was the canonical example. Fixed structurally in Phase 6.E via position-aware losses (L32).
Window-position artifact. S26 looked great on a 180-day evaluation window (Sharpe +1.34) but produced near-zero Sharpe on a 21-day window over the same data (Sharpe +0.16). A real edge must be consistent across nested windows. This finding became the motivation for the window-stability test in Phase 6.B (L54 later confirmed the pattern recurs).
The harness regime trade-off
The whole leakage-test machinery had a regime problem no strategy escaped:
| Configuration | shuffled_target | look_ahead | future_news |
|---|---|---|---|
| Standardised features (S20/S21/S26 post-L24) | PASS | FAIL (false neg.) | FAIL (false neg.) |
| Raw features (S22) | FAIL (single-seed noise or real leak) | PASS | PASS |
No configuration produced all-4-PASS for any strategy. This re-framed the deliverable from "no measurable edge at this scale" to "leakage-test methodology needs Phase 6 recalibration". The fixes shipped in Phase 6.A-E.
What it meant
Three honest "no edge" verdicts, but the honest read was also that the harness itself had visible gaps that prevented confident scoring. The autonomous window's deliverable was the diagnosis, not a working strategy — which set up Phase 6 as the harness recalibration phase.