Variants Evaluated

We benchmarked the headline strategy against five alternative basket configurations and two architectural variants. The headline (the 4-instrument basket put-only with halts engaged plus the regime-stress ML overlay, excess Sharpe +0.371) was the strongest under the robustness checks. Every alternative is reported on this page with full numbers.

Basket composition

Equal-weight $50,000 per instrument, put-only architecture, halt framework engaged. All variants run over the same OOS window 2018-01-01 to 2024-12-31.

Basket n_inst Sharpe (excess) AnnRet AnnVol MaxDD Δ vs SPX baseline (+0.286)
(A) SPX put-only alone 1 -0.347 +2.04% 0.84% -0.78% -0.633
(B) 3-name basket (AAPL+MSFT+WMT) 3 +0.350 +2.42% 0.26% -0.19% +0.064
(C) 3-ETF basket (SPX+TLT+GLD) 3 -0.312 +2.22% 0.35% -0.31% -0.598
(D) the 4-instrument basket ← HEADLINE base 4 +0.359 +2.41% 0.23% -0.13% +0.073
(D′) Headline + regime-stress ML overlay 4 +0.371 +2.41% 0.21% -0.12% +0.085
(E) Full 6 (3-ETF basket + 3-name basket) 6 -0.036 +2.32% 0.26% -0.17% -0.322

Pattern: SPX and TLT have negative excess Sharpe individually in the post-2020 environment with put-credit-spread structure. Including them in the basket (variants C and E) drags the book toward zero. The headline (D) keeps the 3 positive single-stock names plus GLD (which is barely positive but adds cross-asset diversification) and excludes the drag instruments.

Iron condor vs put-only

We evaluated put-credit-spreads against iron condors across the universe. Put-only outperformed iron condor in the 2018-2024 OOS window because the post-2020 trending-equity regime systematically destroyed short-call positions. We ship put-only.

Mode Universe Sharpe (excess) Δ vs SPX baseline (+0.286) AnnRet MaxDD
Iron condor, SPX only 1 inst -1.882 -2.168 +0.48% -4.59%
Iron condor, 3-instrument cluster SPX/TLT/GLD -2.292 -2.578 +1.42% -1.34%
Put-only, SPX 1 inst -0.347 -0.633 +2.04% -0.78%
Put-only, 3-instrument cluster SPX/TLT/GLD -0.369 -0.655 +2.22% -0.31%
Put-only, the 4-instrument basket 4 inst +0.359 +0.073 +2.41% -0.13%

Iron condor underperforms put-only by more than 1.5 Sharpe units in every paired comparison. The call wing loses on indices (SPX/TLT/GLD) that trended up post-2020, so adding it cannot improve a put-only strategy in this regime. Reported as a tested extension that did not add value.

Halt framework: engaged vs disengaged

We compare the headline strategy with the halt framework engaged against the same strategy with the halt framework disengaged. Disengaged means every Monday is a candidate entry regardless of regime indicators; engaged means three layers of halts gate entries (tail-event halt on extreme single-day market shocks, drawdown halt on a trailing-90-day window, vol-regime auto-resume).

Ticker Halts disengaged Halts engaged Δ from halts
AAPL +0.155 +0.264 +0.109
MSFT +0.037 +0.269 +0.232
WMT +0.063 +0.263 +0.200
GLD -0.971 +0.138 +1.109
Aggregate basket -0.069 +0.359 +0.428

The halt framework’s contribution is consistently positive: every instrument has higher Sharpe with halts engaged than disengaged. The aggregate-basket benefit (+0.428 Sharpe at the basket level) is larger than the per-instrument average because cross-instrument correlation falls when halts engage.

ML overlays evaluated

Two calibrated XGBoost overlays were evaluated. One is activated in the headline; the other is not.

Overlay What it does Result Activated in headline?
Per-trade quality classifier Per-instrument XGBoost predicting whether a trade closes at profit-target Out-of-sample log-loss worse than majority-class baseline on 4 of 4 tickers (margins 0.002 to 0.016) ✗ (not activated, base sizing used)
Regime-stress overlay Pooled XGBoost predicting probability of a stress event in the next quarter (63 trading days) Out-of-sample Brier 0.1079 vs naive baseline 0.1256, a 14.1% reduction, well above the 5% acceptance gate ✓ (applied as (1 − p_stress) book-exposure scaler)

The per-trade quality classifier did not beat the majority-class baseline because the base win rate of 64-71% per instrument leaves the “always predict win” baseline log-loss already low. The features available do not predict per-trade outcomes with enough resolution to improve on it.

The regime-stress overlay passes because cross-asset macro features (volatility indices, yield curve, credit-stress proxies) are diagnostic of upcoming regime breaks at the basket-day level. The model learns those associations.

Robustness checks

Item Value
Raw trials tested across all variants 12 (single-stock × 9 + ETF × 3)
Average pairwise return correlation (ρ̄) 0.261
Implied independent trials (N̂) 9
Headline DSR (PSR) 1.0000
Headline PBO via CSCV (S=16, 12,870 logits) 0.0402

Both gates pass (DSR ≥ 0.95, PBO ≤ 0.30).

Summary

Six basket configurations were tested across put-only and iron-condor architectures, with the halt framework engaged and disengaged, and with and without the ML overlay. The headline is the strongest configuration across these tests.