Results
How to read this page
Sharpe ratios on this page are reported in two complementary forms:
- Sharpe (excess), the headline.
(annualized return − annualized risk-free) / annualized volatility. - Deflated Sharpe (DSR), corrects for selection bias when many baskets are tested. Uses an implied-independent-trials count rather than the raw trial count, since adjacent baskets share underlying-instrument exposure and inflate the apparent number of independent searches.
Other reported metrics:
- PBO (Probability of Backtest Overfitting) via CSCV, the probability that the in-sample-best basket underperforms in OOS, computed by leave-out partitioning the per-instrument return matrix.
- n_trades and win rate per instrument.
- Max drawdown (signed, negative) and annualized return (geometric).
Risk-free rate: The risk-free is the CBOE 13-week T-bill yield (IRX), averaged over the OOS window 2018-2024 = 2.33% annualized. Excess Sharpe = (annualized return − 2.33%) / annualized volatility. The risk-free reflects the actual realized T-bill yield over the test period (which included near-zero rates 2018-2021 and 4-5% rates 2022-2024).
Acceptance gates we set:
- DSR (PSR threshold) ≥ 0.95 against the implied-independent-trials benchmark
- PBO via CSCV ≤ 0.30 on the per-instrument return matrix
Headline
4-instrument cross-asset put-credit-spread basket with the halt framework engaged and a calibrated regime-stress ML overlay that scales book exposure by (1 − p_stress) where p_stress is the model’s probability of a stress event in the next quarter. Short strikes at 16-delta, 5-point wing protection, weekly Monday entries.
| Composition | Asset class | Strike grid |
|---|---|---|
| AAPL | Single-stock equity | $1 |
| MSFT | Single-stock equity | $1 |
| WMT | Single-stock equity | $1 |
| GLD | Commodity ETF | $1 |
| Headline metric | Value |
|---|---|
| Sharpe ratio (excess of risk-free, ML overlay engaged) | +0.371 |
| Sharpe ratio (excess), without ML overlay | +0.359 |
| SPX baseline (single-instrument, predecessor implementation) | +0.286 |
| Δ vs SPX baseline | +0.085 ✓ |
| Risk-free baseline (avg IRX 2018-2024) | 2.33% |
| Geometric mean return (GMRR, annualized) | +2.41% |
| Annualized volatility | 0.21% (with ML) / 0.23% (without) |
| Alpha vs SPY (annualized OLS intercept × 252) | +2.37% |
| Beta vs SPY (OLS slope, daily simple returns) | +0.0014 |
| Correlation with SPY (daily returns) | 0.12 |
| Max drawdown over OOS (7 years) | -0.12% (with ML) / -0.13% (without) |
| Trades total (sum across 4 instruments) | 437 |
| Average trades per year | 62.8 |
| Average return per trade | +8.25% (median +$17.85 net P&L per spread) |
| Win rate (basket aggregate) | 73.0% |
| OOS span | 2018-01-01 → 2024-12-31 (1,760 trading days) |
| ML overlay | Regime-stress scaler (1 − p_stress) applied to daily book exposure |
| ML acceptance gate | Brier-score reduction +14.1% vs naive baseline (gate at ≥5%) ✓ |
Mapping to the HW3 / HW4 rubric: GMRR is the geometric mean return (annualized) row; Alpha and Beta are computed by daily OLS of basket returns vs SPY over the full 1,760-day OOS window (per the project’s src/metrics/portfolio.py); Sharpe, Annualized Volatility, Max Drawdown, Avg Return per Trade, Trades per Year, and Total Trades are reported above. Beta is essentially zero because the strategy harvests volatility risk premium on a defined-risk wing-protected structure rather than holding directional equity beta.
Multiple-testing validation
The basket was selected from a tested universe of 12 instruments (3 ETFs: SPX, TLT, GLD; 9 single-stocks: AAPL, MSFT, GOOGL, JNJ, KO, PG, WMT, JPM, PEP). Selection bias is corrected via DSR + PBO:
| Correction | Value | Acceptance | Pass? |
|---|---|---|---|
| Raw trials tested (N) | 12 | informational | n/a |
| Avg pairwise return correlation (ρ̄) | 0.261 | informational | n/a |
| Implied independent trials (N̂) | 9 | informational | n/a |
| Deflated Sharpe (PSR) | 1.0000 | ≥ 0.95 | ✓ |
| PBO via CSCV (S=16, 12,870 logits) | 0.0402 | ≤ 0.30 | ✓ |
Both gates pass. The headline is statistically significant after correction for the 12-trial selection.
Equity curve
The headline ends at 1.18× starting equity over 7 years (CAGR 2.41%). The SPX baseline (Sharpe 0.286, ann_ret 2.49%) ends very close in absolute terms. The Sharpe advantage (+0.085 with the ML overlay engaged) shows up not in absolute return but in volatility: the headline runs at 0.21% annualized volatility versus the SPX baseline’s 0.55%, so the same return is more risk-efficient.
Drawdown
Maximum drawdown of -0.13% over the 7-year OOS sample. The defined-risk put-credit-spread structure plus the halt framework absorbs every named stress event without measurable equity damage.
Per-instrument breakdown
| Ticker | Sharpe (excess) | n_trades | Win rate | Max DD | Ann. return | Final $ (from $50K start) |
|---|---|---|---|---|---|---|
| AAPL | +0.264 | 112 | 74.1% | -0.50% | +2.44% | $59,170 |
| MSFT | +0.269 | 112 | 73.2% | -0.37% | +2.42% | $59,124 |
| WMT | +0.263 | 107 | 69.2% | -0.18% | +2.40% | $59,017 |
| GLD | +0.138 | 106 | 75.5% | -0.23% | +2.40% | $59,008 |
| Aggregate (equal-weight 4) | +0.359 | 437 | 73.0% | -0.13% | +2.41% | $231,313 |
The book aggregate has higher Sharpe than the per-instrument average because cross-instrument correlation is low (mean pairwise 0.26), so equal-weighting reduces volatility without proportional return loss.
Anchor comparison vs SPX baseline
| Strategy | Architecture | Sharpe (excess) | Trades | OOS window |
|---|---|---|---|---|
| SPX baseline (predecessor implementation, historical reference) | SPX put-only with halts engaged | +0.286 | 210 | 2018–2024 |
| Headline | 4-instrument basket, put-only, halts engaged, regime-stress ML overlay | +0.371 | 437 | 2018–2024 |
| Δ vs SPX baseline | (same OOS window, both halts engaged put-only) | +0.085 | +227 | 2018–2024 |
The new headline beats the SPX baseline on excess Sharpe, with roughly 2× the trade sample, comparable max drawdown, and cross-asset diversification (equity + commodity) the baseline lacks. The SPX baseline (the predecessor implementation) is included as a historical reference; reproducing its exact equity curve is not supported on the current engine version because the engine has materially evolved since the baseline was recorded.
Variants tested: alternative baskets
All baskets below are run with halts engaged, equal-weighted at $50,000 per instrument. None were selected as headline; they are shown to demonstrate what the headline is being chosen against.
| Basket | n_inst | Sharpe (excess) | AnnRet | AnnVol | MaxDD | Δ vs SPX baseline (+0.286) |
|---|---|---|---|---|---|---|
| (A) SPX put-only alone | 1 | -0.347 | +2.04% | 0.84% | -0.78% | -0.633 |
| (B) 3-name basket (AAPL+MSFT+WMT) | 3 | +0.350 | +2.42% | 0.26% | -0.19% | +0.064 |
| (C) 3-ETF basket (SPX + TLT + GLD put-only) | 3 | -0.312 | +2.22% | 0.35% | -0.31% | -0.598 |
| (D) the 4-instrument basket ← HEADLINE | 4 | +0.359 | +2.41% | 0.23% | -0.13% | +0.073 |
| (E) Full 6-instrument book (3-ETF basket + 3-name basket) | 6 | -0.036 | +2.32% | 0.26% | -0.17% | -0.322 |
Pattern: adding SPX or TLT to the headline drags the Sharpe down because their individual put-credit-spread excess Sharpes are negative (-0.35 and -0.43 respectively over OOS). The cross-asset diversification benefit of including GLD (which is barely positive on its own) outweighs the volatility-reduction cost. Adding more drag-instruments (SPX, TLT) does not produce a net benefit.
Variants tested: iron condor architecture
We tested both architectures on the same engine and underlying universe. Iron condor (put + call wings on the same expiry) underperforms put-only across every comparable comparison in OOS 2018-2024:
| Mode | Universe | Sharpe (excess) | Δ vs SPX baseline | Notes |
|---|---|---|---|---|
| Iron condor, SPX only | 1 inst | -1.882 | -2.168 | Call wing destroyed by post-2020 SPX rally |
| Iron condor, 3-instrument cluster | SPX/TLT/GLD | -2.292 | -2.578 | Same pattern across all three IC instruments |
| Put-only, SPX | 1 inst | -0.347 | -0.633 | Same engine, IC removed |
| Put-only, 3-instrument cluster | SPX/TLT/GLD | -0.369 | -0.655 | Same engine, IC removed |
Iron condor underperforms put-only in every comparison. The call-wing leg of the iron condor systematically lost in 2018-2024 due to the trending equity-index regime. Reported as a tested extension that did not add value, not as the headline.
Halts engaged vs disengaged
Demonstrates that the halt framework is doing real work, every instrument has higher Sharpe (excess) when halts are active vs naked.
| Ticker | Sharpe with halts disengaged | Halts Sharpe | Δ from halts |
|---|---|---|---|
| AAPL | +0.155 | +0.264 | +0.109 |
| MSFT | +0.037 | +0.269 | +0.232 |
| WMT | +0.063 | +0.263 | +0.200 |
| GLD | -0.971 | +0.138 | +1.109 |
The halt framework’s contribution is measurable per instrument and consistently positive. GLD has the largest gap because its naked exposure is fully on through every regime; the halt framework gates the worst stretches.
Stress-event behavior
Computed on the headline basket equity curve, halts engaged, equal-weight $50K per instrument (basket starting equity $200K). Drawdown is peak-to-trough WITHIN each window using a running cumulative max.
| Event | Window probed | Net P&L | Peak-to-trough DD | Trough date |
|---|---|---|---|---|
| Volmageddon | 2018-01-22 → 2018-02-16 | -$35 | -0.099% | 2018-02-07 |
| Q4 2018 selloff | 2018-11-26 → 2019-01-24 | +$783 | 0.000% | (curve monotonic up) |
| COVID crash | 2020-02-24 → 2020-04-23 | +$102 | 0.000% | (curve monotonic up) |
| 2022 bear market | 2022-01-03 → 2022-12-30 | +$4,223 | 0.000% | (curve monotonic up) |
| Banking crisis | 2023-02-13 → 2023-04-13 | +$1,652 | 0.000% | (curve monotonic up) |
The 0.000% peak-to-trough entries are not measurement error. They reflect a structural feature of the halt-gated put-credit-spread architecture: during these stress windows the halt framework reduced or paused new entries, the open positions either expired profitably or hit stop-loss within their wing-width bound, and the unutilized capital continued earning the realized T-bill rate. Net trading P&L plus cash carry was positive on every trading day through these windows, so the equity curve never made a new low.
Volmageddon is the one exception. The early-2018 timing meant the basket was fully deployed when the VIX spike hit, and the resulting -$35 net P&L (-0.099% peak-to-trough) is the largest intra-event dip the strategy registered across all five named events.
Trade fates and rates
Every trade exits one of five fates. The distribution across the 437-trade headline blotter:
| Fate | Trigger | Count | % of trades |
|---|---|---|---|
profit_target |
Exit debit ≤ 50% of entry credit | 280 | 64.1% |
stop_loss |
Exit debit ≥ 200% of entry credit (gap-aware fill) | 81 | 18.5% |
time_exit |
DTE ≤ 21 | 75 | 17.2% |
emergency |
|short_delta| > 0.50 | 1 | 0.2% |
eos_force |
End-of-OOS forced close | 0 | 0.0% |
| Total | 437 | 100% |
Reading the rates the rubric asks for:
| Rate | Value |
|---|---|
| Success rate (P&L > 0) | 73.0% |
| Stop-loss rate | 18.5% |
Timeout rate (time_exit) |
17.2% |
| Emergency-exit rate | 0.2% |
Per-trade summary statistics
| Metric | Value |
|---|---|
| Total trades | 437 |
| Winning trades | 319 |
| Losing trades | 118 |
| Mean P&L per spread | +$3.42 |
| Median P&L per spread | +$17.85 |
| Standard deviation of P&L | $37.66 |
| Largest single win | +$49.50 |
| Largest single loss | -$186.00 |
| Mean trade return | +8.25% |
| Mean trade lifetime | 6.5 days |
| Median trade lifetime | 7 days |
| Profit factor (gross win / gross loss) | 1.26 |
The 6.5-day mean lifetime reflects how the strategy actually deploys capital: profit-target exits fire quickly in calm regimes, and the basket spends most of its capital sitting on T-bill carry between trade cycles. Median holding period is 7 days; the 25th-to-75th percentile window is 3 to 10 days; no trade exceeds 14 days because the time-exit rule forces a close at DTE ≤ 21 against the 30-45 DTE entry window.
Per-instrument breakdown
| Ticker | Trades | Wins | Win rate | Mean P&L | Total P&L |
|---|---|---|---|---|---|
| AAPL | 112 | 83 | 74.1% | +$4.11 | +$460.23 |
| MSFT | 112 | 82 | 73.2% | +$3.54 | +$396.33 |
| GLD | 106 | 80 | 75.5% | +$3.06 | +$324.70 |
| WMT | 107 | 74 | 69.2% | +$2.92 | +$312.87 |
P&L per trade across OOS
Trade-return distribution
The distribution is right-skewed by design: a 16-delta short put expires worthless ~84% of the time under lognormal assumptions, and profit_target closes wins early at 50% of credit. Losses are bounded by the wing-width stop. The 1.26 profit factor reflects mean-reversion of variance to the realized, not directional alpha on the underlying.
Ledger (monthly P&L sample)
| Month | Trades closed | Net P&L | Cumulative P&L |
|---|---|---|---|
| 2018-01 | 7 | +$36.75 | +$36.75 |
| 2018-02 | 5 | -$193.88 | -$157.13 |
| 2018-05 | 9 | +$97.05 | -$60.08 |
| 2018-08 | 17 | +$191.15 | +$179.58 |
| 2018-09 | 12 | +$175.50 | +$355.08 |
| 2018-10 | 11 | -$290.25 | +$64.83 |
| 2019-07 | 19 | +$315.85 | +$369.58 |
| 2019-12 | 18 | +$321.70 | +$529.38 |
| 2020-03 | 14 | -$11.20 | +$612.10 |
| … | … | … | … |
| 2024-08 | 5 | -$361.65 | +$1,237.08 |
| 2024-11 | 12 | +$181.80 | +$1,418.88 |
| 2024-12 | 12 | +$75.25 | +$1,494.13 |
Showing 12 of 84 months from January 2018 to December 2024. Net P&L in dollars per spread (per-contract basis at $100 multiplier). Full monthly ledger: monthly_ledger.csv (renders as a sortable table on GitHub).
Blotter
Random sample of 10 trades from the 437-row blotter (seed=42).
| trd_prd | Entry | Exit | Ticker | Side | Qty | Entry credit | Exit debit | Fate | P&L | Return % | Success |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 2018.19 | 2018-05-07 | 2018-05-16 | WMT | P | 1 | $22.45 | $0.90 | profit_target | $+21.55 | +96.0% | True |
| 2018.39 | 2018-09-24 | 2018-10-05 | WMT | P | 1 | $35.80 | $60.55 | time_exit | $-24.75 | -69.1% | False |
| 2018.40 | 2018-10-01 | 2018-10-11 | GLD | P | 1 | $15.05 | $4.50 | profit_target | $+10.55 | +70.1% | True |
| 2019.07 | 2019-02-11 | 2019-02-15 | MSFT | P | 1 | $38.70 | $15.30 | profit_target | $+23.40 | +60.5% | True |
| 2019.30 | 2019-07-22 | 2019-08-01 | WMT | P | 1 | $38.85 | $88.55 | stop_loss | $-49.70 | -127.9% | False |
| 2019.48 | 2019-11-25 | 2019-12-04 | WMT | P | 1 | $29.85 | $17.75 | profit_target | $+12.10 | +40.5% | True |
| 2019.52 | 2019-12-23 | 2019-12-26 | GLD | P | 1 | $19.70 | $9.95 | profit_target | $+9.75 | +49.5% | True |
| 2024.07 | 2024-02-12 | 2024-02-23 | MSFT | P | 1 | $52.80 | $34.25 | time_exit | $+18.55 | +35.1% | True |
| 2024.23 | 2024-06-03 | 2024-06-07 | GLD | P | 1 | $54.40 | $147.50 | stop_loss | $-93.10 | -171.1% | False |
| 2024.23 | 2024-06-03 | 2024-06-05 | WMT | P | 1 | $14.45 | $6.20 | profit_target | $+8.25 | +57.1% | True |
The trd_prd index encodes year + ISO week as a single decimal. Entries on the same Monday share the same trd_prd.
Showing 10 of 437 trades. Full blotter: blotter.csv (all 437 entries, renders as a sortable table on GitHub).
How will you know the strategy is performing as expected?
A rolling 60-trade window of the realized win rate is compared against the OOS baseline of μ = 0.730. The Hoeffding inequality bounds the probability that observed underperformance is due to chance: while the bound stays at or above 50%, the strategy is operating within its modeled regime and trading continues at full size. Backtested over the 2018-2024 OOS sample, the bound was at or above 50% on 88% of post-warmup trades.
How will you quantify when the strategy stops working?
The same Hoeffding bound. When the bound drops below 25% the position size is cut, when it drops below 10% entries are halted entirely and the strategy is reviewed. The thresholds are pre-set, distribution-free, and apply uniformly across the 4-instrument basket. The OOS sample produced no critical signal across 1,760 trading days; full bound-trace and worked example on the Live Monitoring page.
For the data sources behind these numbers, see Data and Sources.