Variants Evaluated

We benchmarked the headline strategy against five alternative basket configurations and two architectural variants. The headline (the 4-instrument basket put-only with halts engaged plus the regime-stress ML overlay, excess Sharpe +0.371) was the strongest under the robustness checks. Every alternative is reported on this page with full numbers.

Basket composition

Equal-weight $50,000 per instrument, put-only architecture, halt framework engaged. All variants run over the same OOS window 2018-01-01 to 2024-12-31.

Basket	n_inst	Sharpe (excess)	AnnRet	AnnVol	MaxDD	Δ vs SPX baseline (+0.286)
(A) SPX put-only alone	1	-0.347	+2.04%	0.84%	-0.78%	-0.633
(B) 3-name basket (AAPL+MSFT+WMT)	3	+0.350	+2.42%	0.26%	-0.19%	+0.064
(C) 3-ETF basket (SPX+TLT+GLD)	3	-0.312	+2.22%	0.35%	-0.31%	-0.598
(D) the 4-instrument basket ← HEADLINE base	4	+0.359	+2.41%	0.23%	-0.13%	+0.073
(D′) Headline + regime-stress ML overlay	4	+0.371	+2.41%	0.21%	-0.12%	+0.085
(E) Full 6 (3-ETF basket + 3-name basket)	6	-0.036	+2.32%	0.26%	-0.17%	-0.322

Pattern: SPX and TLT have negative excess Sharpe individually in the post-2020 environment with put-credit-spread structure. Including them in the basket (variants C and E) drags the book toward zero. The headline (D) keeps the 3 positive single-stock names plus GLD (which is barely positive but adds cross-asset diversification) and excludes the drag instruments.

Iron condor vs put-only

We evaluated put-credit-spreads against iron condors across the universe. Put-only outperformed iron condor in the 2018-2024 OOS window because the post-2020 trending-equity regime systematically destroyed short-call positions. We ship put-only.

Mode	Universe	Sharpe (excess)	Δ vs SPX baseline (+0.286)	AnnRet	MaxDD
Iron condor, SPX only	1 inst	-1.882	-2.168	+0.48%	-4.59%
Iron condor, 3-instrument cluster	SPX/TLT/GLD	-2.292	-2.578	+1.42%	-1.34%
Put-only, SPX	1 inst	-0.347	-0.633	+2.04%	-0.78%
Put-only, 3-instrument cluster	SPX/TLT/GLD	-0.369	-0.655	+2.22%	-0.31%
Put-only, the 4-instrument basket	4 inst	+0.359	+0.073	+2.41%	-0.13%

Iron condor underperforms put-only by more than 1.5 Sharpe units in every paired comparison. The call wing loses on indices (SPX/TLT/GLD) that trended up post-2020, so adding it cannot improve a put-only strategy in this regime. Reported as a tested extension that did not add value.

Halt framework: engaged vs disengaged

We compare the headline strategy with the halt framework engaged against the same strategy with the halt framework disengaged. Disengaged means every Monday is a candidate entry regardless of regime indicators; engaged means three layers of halts gate entries (tail-event halt on extreme single-day market shocks, drawdown halt on a trailing-90-day window, vol-regime auto-resume).

Ticker	Halts disengaged	Halts engaged	Δ from halts
AAPL	+0.155	+0.264	+0.109
MSFT	+0.037	+0.269	+0.232
WMT	+0.063	+0.263	+0.200
GLD	-0.971	+0.138	+1.109
Aggregate basket	-0.069	+0.359	+0.428

The halt framework’s contribution is consistently positive: every instrument has higher Sharpe with halts engaged than disengaged. The aggregate-basket benefit (+0.428 Sharpe at the basket level) is larger than the per-instrument average because cross-instrument correlation falls when halts engage.

ML overlays evaluated

Two calibrated XGBoost overlays were evaluated. One is activated in the headline; the other is not.

Overlay	What it does	Result	Activated in headline?
Per-trade quality classifier	Per-instrument XGBoost predicting whether a trade closes at profit-target	Out-of-sample log-loss worse than majority-class baseline on 4 of 4 tickers (margins 0.002 to 0.016)	✗ (not activated, base sizing used)
Regime-stress overlay	Pooled XGBoost predicting probability of a stress event in the next quarter (63 trading days)	Out-of-sample Brier 0.1079 vs naive baseline 0.1256, a 14.1% reduction, well above the 5% acceptance gate	✓ (applied as `(1 − p_stress)` book-exposure scaler)

The per-trade quality classifier did not beat the majority-class baseline because the base win rate of 64-71% per instrument leaves the “always predict win” baseline log-loss already low. The features available do not predict per-trade outcomes with enough resolution to improve on it.

The regime-stress overlay passes because cross-asset macro features (volatility indices, yield curve, credit-stress proxies) are diagnostic of upcoming regime breaks at the basket-day level. The model learns those associations.

Robustness checks

Item	Value
Raw trials tested across all variants	12 (single-stock × 9 + ETF × 3)
Average pairwise return correlation (ρ̄)	0.261
Implied independent trials (N̂)	9
Headline DSR (PSR)	1.0000
Headline PBO via CSCV (S=16, 12,870 logits)	0.0402

Both gates pass (DSR ≥ 0.95, PBO ≤ 0.30).

Summary

Six basket configurations were tested across put-only and iron-condor architectures, with the halt framework engaged and disengaged, and with and without the ML overlay. The headline is the strongest configuration across these tests.

--- title: "Variants Evaluated" --- We benchmarked the headline strategy against five alternative basket configurations and two architectural variants. The headline (the 4-instrument basket put-only with halts engaged plus the regime-stress ML overlay, excess Sharpe +0.371) was the strongest under the robustness checks. Every alternative is reported on this page with full numbers. ```{=html} <iframe src="charts/ablation_baskets.html" width="100%" height="540" frameborder="0"></iframe> ``` ## Basket composition Equal-weight $50,000 per instrument, put-only architecture, halt framework engaged. All variants run over the same OOS window 2018-01-01 to 2024-12-31. | Basket | n_inst | Sharpe (excess) | AnnRet | AnnVol | MaxDD | Δ vs SPX baseline (+0.286) | |---|:-:|---:|---:|---:|---:|---:| | (A) SPX put-only alone | 1 | -0.347 | +2.04% | 0.84% | -0.78% | -0.633 | | (B) 3-name basket (AAPL+MSFT+WMT) | 3 | +0.350 | +2.42% | 0.26% | -0.19% | +0.064 | | (C) 3-ETF basket (SPX+TLT+GLD) | 3 | -0.312 | +2.22% | 0.35% | -0.31% | -0.598 | | **(D) the 4-instrument basket ← HEADLINE base** | **4** | **+0.359** | **+2.41%** | **0.23%** | **-0.13%** | **+0.073** | | (D′) Headline + regime-stress ML overlay | 4 | **+0.371** | +2.41% | 0.21% | -0.12% | +0.085 | | (E) Full 6 (3-ETF basket + 3-name basket) | 6 | -0.036 | +2.32% | 0.26% | -0.17% | -0.322 | Pattern: SPX and TLT have negative excess Sharpe individually in the post-2020 environment with put-credit-spread structure. Including them in the basket (variants C and E) drags the book toward zero. The headline (D) keeps the 3 positive single-stock names plus GLD (which is barely positive but adds cross-asset diversification) and excludes the drag instruments. ## Iron condor vs put-only We evaluated put-credit-spreads against iron condors across the universe. Put-only outperformed iron condor in the 2018-2024 OOS window because the post-2020 trending-equity regime systematically destroyed short-call positions. We ship put-only. | Mode | Universe | Sharpe (excess) | Δ vs SPX baseline (+0.286) | AnnRet | MaxDD | |---|---|---:|---:|---:|---:| | Iron condor, SPX only | 1 inst | -1.882 | -2.168 | +0.48% | -4.59% | | Iron condor, 3-instrument cluster | SPX/TLT/GLD | -2.292 | -2.578 | +1.42% | -1.34% | | Put-only, SPX | 1 inst | -0.347 | -0.633 | +2.04% | -0.78% | | Put-only, 3-instrument cluster | SPX/TLT/GLD | -0.369 | -0.655 | +2.22% | -0.31% | | **Put-only, the 4-instrument basket** | **4 inst** | **+0.359** | **+0.073** | **+2.41%** | **-0.13%** | Iron condor underperforms put-only by more than 1.5 Sharpe units in every paired comparison. The call wing loses on indices (SPX/TLT/GLD) that trended up post-2020, so adding it cannot improve a put-only strategy in this regime. Reported as a tested extension that did not add value. ```{=html} <iframe src="charts/iron_condor_vs_putonly.html" width="100%" height="510" frameborder="0"></iframe> ``` ## Halt framework: engaged vs disengaged We compare the headline strategy with the halt framework engaged against the same strategy with the halt framework disengaged. Disengaged means every Monday is a candidate entry regardless of regime indicators; engaged means three layers of halts gate entries (tail-event halt on extreme single-day market shocks, drawdown halt on a trailing-90-day window, vol-regime auto-resume). | Ticker | Halts disengaged | Halts engaged | Δ from halts | |---|---:|---:|---:| | AAPL | +0.155 | +0.264 | +0.109 | | MSFT | +0.037 | +0.269 | +0.232 | | WMT | +0.063 | +0.263 | +0.200 | | GLD | -0.971 | +0.138 | +1.109 | | **Aggregate basket** | **-0.069** | **+0.359** | **+0.428** | The halt framework's contribution is consistently positive: every instrument has higher Sharpe with halts engaged than disengaged. The aggregate-basket benefit (+0.428 Sharpe at the basket level) is larger than the per-instrument average because cross-instrument correlation falls when halts engage. ## ML overlays evaluated Two calibrated XGBoost overlays were evaluated. One is activated in the headline; the other is not. | Overlay | What it does | Result | Activated in headline? | |---|---|---|:-:| | Per-trade quality classifier | Per-instrument XGBoost predicting whether a trade closes at profit-target | Out-of-sample log-loss worse than majority-class baseline on 4 of 4 tickers (margins 0.002 to 0.016) | ✗ (not activated, base sizing used) | | **Regime-stress overlay** | Pooled XGBoost predicting probability of a stress event in the next quarter (63 trading days) | Out-of-sample Brier 0.1079 vs naive baseline 0.1256, a **14.1% reduction**, well above the 5% acceptance gate | ✓ (applied as `(1 − p_stress)` book-exposure scaler) | The per-trade quality classifier did not beat the majority-class baseline because the base win rate of 64-71% per instrument leaves the "always predict win" baseline log-loss already low. The features available do not predict per-trade outcomes with enough resolution to improve on it. The regime-stress overlay passes because cross-asset macro features (volatility indices, yield curve, credit-stress proxies) are diagnostic of upcoming regime breaks at the basket-day level. The model learns those associations. ## Robustness checks | Item | Value | |---|---:| | Raw trials tested across all variants | 12 (single-stock × 9 + ETF × 3) | | Average pairwise return correlation (ρ̄) | 0.261 | | Implied independent trials (N̂) | 9 | | Headline DSR (PSR) | 1.0000 | | Headline PBO via CSCV (S=16, 12,870 logits) | 0.0402 | Both gates pass (DSR ≥ 0.95, PBO ≤ 0.30). ## Summary Six basket configurations were tested across put-only and iron-condor architectures, with the halt framework engaged and disengaged, and with and without the ML overlay. The headline is the strongest configuration across these tests.