Walk-forward analysis — anti-curve-fitting backtest
Anna backtest 5 lat = 75% WR, +40% rocznie. „Holy grail!" — live 6 mies. = 50% WR, -8%. Catastrophe. Anna optimized parameters HISTORY = curve-fit noise. Walk-forward analysis prevented to: 4 lata IS optimization, 1 rok OS test. OS performance ≈ live expected. Tu pokazujemy framework anti-overfitting.
Problem curve-fitting
Walk-forward mechanika
- Divide history: e.g. 6 lat 2018-2023
- IS window (in-sample): 4 lata 2018-2021 — optimize parameters
- OS window (out-of-sample): 1 rok 2022 — test fixed parameters
- Roll forward: IS 2019-2022, OS 2023
- Repeat: aggregate OS across all rolls
- Aggregated OS: approximates live performance
Walk-forward efficiency (WFE)
Example walk-forward results
Strategia EUR/USD breakout:
- IS 4-year (2018-2021): 70% WR, +30% rocznie
- OS 1-year (2022): 55% WR, +12% rocznie
- WFE = 12/30 = 40%
- Mild curve-fit, ale acceptable
- Live expected: ~55% WR, +10% rocznie
Strategia trend-follow:
- IS 4-year: 60% WR, +25% rocznie
- OS 1-year: 58% WR, +20% rocznie
- WFE = 20/25 = 80%
- Robust, production-ready
Rolling vs Anchored
Tools
- MetaTrader Strategy Tester: built-in walk-forward option
- TradeStation EasyLanguage: native feature, industry standard 90s
- NinjaTrader: built-in advanced
- Python backtrader: free, programmatic, full control
- R quantstrat: academic standard, free
- Custom Excel: simple strategies via VBA
Best practices
- OS window minimum 20% of IS (4-year IS → 1-year OS min)
- Minimum 5 rolling iterations dla statistical significance
- Parameter stability test: > 50% change per iteration = too sensitive
- Combinatorially Symmetric Cross-Validation: advanced
- Monte Carlo overlay: add confidence intervals
„Walk-forward NIE optional dla algo trading. Standard backtest = optimistic 30-50% gap typical. Walk-forward = realistic expectation. WFE > 50% = production-ready. WFE < 25% = throw away."
Red flags
Production-ready criteria
- WFE > 50%
- Parameters stable across iterations
- Positive across multiple OS periods
- Low DD OS
- Reasonable Sharpe (1.0-2.5 OS)
- Logical strategy rationale (NIE black-box magic)
Krzysiek case
Wnioski
Walk-forward analysis = framework anti-curve-fitting backtest. Pro standard od 90s.
Problem: zwykły backtest optimizes historical noise. Backtest 70% WR → live 45% WR typical.
Solution: IS window optimize (4 lata), OS window test (1 rok), roll forward, aggregate.
WFE = OS / IS performance. > 50% = production-ready. < 25% = curve-fit, abandon.
Anna case: backtest 75% WR live 50% = curve-fit catastrophe. Walk-forward would have caught.
Rolling (fixed IS shift) vs Anchored (growing IS). Rolling standard dla active trading.
Tools: MetaTrader, TradeStation, NinjaTrader, Python backtrader, R quantstrat, custom Excel.
Best practices: OS > 20% IS, min 5 iterations, parameter stability, Monte Carlo overlay.
Red flags: WFE < 25%, parameters jumping, OS < 30% IS, negative multiple OS.
Production criteria: WFE > 50%, parameters stable, positive OS, reasonable Sharpe, logical rationale.
Krzysiek case: WFE 60%, OS Sharpe 1.4, Live Sharpe 1.2. Walk-forward correctly predicted real-world.
Powiązane: Monte Carlo complement, backtesting praktyka baseline, expectancy formula baseline metric.
Źródła i bibliografia
-
Robert Pardo Evaluation and Optimization of Trading Strategies · walk-forward bible www.amazon.com ↗
-
CFA Institute Backtesting best practices · industry standard www.cfainstitute.org ↗
-
TradeStation Walk-forward optimization · platform documentation www.tradestation.com ↗
Najczęstsze pytania
Problem curve-fitting backtest?
Curve-fitting = standard problem backtest. Mechanika: trader optimizes parameters (MA periods 14/50, RSI thresholds 30/70, etc.) historical data 2018-2023 → finds "best" combination = 80% WR, +50% rocznie. Looks amazing! Reality: parameters fit NOISE w historical data, NIE real pattern. Live 2024 = 45% WR, -10% rocznie. Performance gap massive. Why it happens: (1) Over-optimization — testing 100+ parameter combinations finds spurious "best". (2) Lookahead bias — using future data accidentally. (3) Survivorship bias — backtesting only currently-existing pairs. (4) Selection bias — picking strategies that worked, ignoring failed. Symptoms curve-fit strategy: (a) Backtest equity curve TOO smooth (95% trades win). (b) Sharpe > 4 historical (rare reality). (c) Parameters very specific (MA 14.7 vs 14.0 makes big difference). (d) Strategy hates parameter shifts (sensitivity high). Anna case: backtested 5 lat = 75% WR, +40% rocznie. Live 6 mies. = 50% WR, -8%. Curve-fit catastrophe. Walk-forward analysis = avoid this.
Walk-forward mechanika?
Walk-forward = anti-curve-fitting framework. Steps: (1) Divide history: e.g. 6 lat data 2018-2023. (2) In-sample (IS) window: 4 lata 2018-2021. Optimize parameters here. (3) Out-of-sample (OS) window: 1 rok 2022. Test fixed parameters bez change. (4) Roll forward: shift window — IS 2019-2022, OS 2023. (5) Repeat: aggregate OS performance across all rolls. Aggregated OS: this approximates live performance. Walk-forward efficiency (WFE): ratio OS performance / IS performance. WFE = 100% = OS same as IS (very rare, suspect). WFE 50-75% = robust strategy. WFE 25-50% = mild curve-fit. WFE < 25% = serious curve-fit. Example: strategia. IS 4-year: 70% WR, +30% rocznie. OS 1-year: 55% WR, +12% rocznie. WFE = 12/30 = 40%. Mild curve-fit. Live expected: ~55% WR, +10% rocznie. Confidence: WFE 40% = OK, not great. WFE > 50% = production-ready strategy.
Anchored vs rolling walk-forward?
Two walk-forward variants: Rolling walk-forward: fixed-length IS window slides forward. Standard 4-year IS. Roll: 2018-2021 IS → 2022 OS. 2019-2022 IS → 2023 OS. Each iteration same window size. Advantages: parameters adapt to changing market regimes (e.g. low-vol 2017-2019 vs high-vol 2020-2023). Strategy "learns" recent regime. Disadvantages: more parameter switching (less stable). Anchored walk-forward: IS window grows from fixed start. 2018-2021 IS → 2022 OS. 2018-2022 IS → 2023 OS. 2018-2023 IS → 2024 OS. Each iteration starts from same beginning, IS grows. Advantages: more data = more robust parameters. Better dla stable strategies. Disadvantages: slower adapt to regime changes. Decision framework: (1) Trend-following strategies: rolling preferred (regime-adaptive). (2) Mean-reversion stable: anchored OK (more data). (3) Limited data < 5 lat: anchored (squeeze max from data). (4) Long history 10+ lat: rolling (regime sensitivity). Industry standard: rolling walk-forward most common, esp. dla active traders. Anchored for institutional long-term.
Tools + best practices?
Walk-forward tools: MetaTrader Strategy Tester: built-in walk-forward option. Easy interface. EA strategies, optimization, OS validation auto. TradeStation EasyLanguage: walk-forward optimization native feature. Industry standard 90s. NinjaTrader: walk-forward built-in advanced. Python backtrader: free, programmatic. Full control. Long initial setup. Custom Excel: possible dla simple strategies. VBA macros. R quantstrat: academic standard. Free. Best practices: (1) OS window minimum 20% of IS: e.g. 4-year IS → 1-year OS minimum. Larger OS = more confidence. (2) Minimum 5 rolling iterations: 5+ OS periods for statistical significance. (3) Parameter stability test: parameters change > 50% per iteration = strategy too sensitive. (4) Combinatorially Symmetric Cross-Validation: advanced anti-curve-fit (academic). (5) Monte Carlo overlay: add Monte Carlo to OS results for confidence intervals. Red flags: WFE < 25%, parameters jumping wildly between iterations, OS performance < 30% of IS. Production-ready criteria: WFE > 50%, parameters stable, positive across multiple iterations, low DD OS, reasonable Sharpe (1.0-2.5 OS). Krzysiek case: backtested strategy IS WFE 60%, OS Sharpe 1.4, Live performance Sharpe 1.2. Walk-forward correctly predicted real-world performance.