Ensemble Forecasting 整合預測方法

Released已發布

algorithm algorithm

Combine multiple forecasting models into ensemble predictions for improved accuracy. Use this skill when the user needs to improve forecast reliability, combine ARIMA/Prophet/ETS outputs, or build a robust forecasting pipeline — even if they say 'combine forecasts', 'model averaging', or 'which forecast should I trust'.

演算法技能：Ensemble Forecasting 分析與應用。

View on GitHub在 GitHub 查看

Overview概述

Ensemble forecasting combines predictions from multiple models to reduce variance and improve accuracy. Simple average of 3-5 diverse models often outperforms the best individual model. Methods: equal-weight average, inverse-error weighting, stacking with a meta-learner. The "forecast combination puzzle" shows simple averaging is hard to beat.

When to Use使用時機

Trigger conditions:

Multiple forecasting models are available and perform similarly
Reducing forecast risk is more important than maximum accuracy
Building a production pipeline that's robust to model failure

When NOT to use:

When one model clearly dominates all others (just use that model)
When computational budget only allows one model

Algorithm 演算法

IRON LAW: Simple Average Often Beats Complex Combination
The "forecast combination puzzle" (Stock & Watson, 2004): equal-weight
averaging of diverse models frequently outperforms sophisticated
weighting schemes. This is because weight estimation introduces noise
that offsets the theoretical gain. Start with simple average and only
move to weighted combination if you have abundant validation data.

Phase 1: Input Validation

Generate forecasts from 3+ diverse models (e.g., ARIMA, ETS, Prophet, ML-based). Ensure models are truly diverse (different assumptions/approaches). Gate: 3+ model forecasts available, models use different methodologies.

Phase 2: Core Algorithm

Simple average: ŷ_ensemble = (1/M) × Σ ŷ_m

Inverse-error weighting: w_m = (1/MSE_m) / Σ(1/MSE_j), ŷ_ensemble = Σ w_m × ŷ_m

Stacking: Train a meta-model (linear regression) that learns optimal weights from cross-validated individual model predictions.

Phase 3: Verification

Compare ensemble vs individual models on held-out data. Ensemble should: have lower average error AND lower maximum error (more robust). Gate: Ensemble RMSE ≤ best individual model RMSE.

Phase 4: Output

Return ensemble forecast with component model contributions.

Output Format輸出格式

{
  "ensemble_forecast": [{"period": "2025-04", "forecast": 1200, "lower_95": 1050, "upper_95": 1350}],
  "model_forecasts": {"arima": 1180, "prophet": 1220, "ets": 1200},
  "weights": {"arima": 0.35, "prophet": 0.30, "ets": 0.35},
  "metadata": {"method": "inverse_error_weighted", "ensemble_rmse": 42, "best_individual_rmse": 48}
}

Examples範例

Sample I/O

Input: ARIMA forecast=1180, Prophet=1220, ETS=1200 for next month sales Expected: Simple average = 1200. If ARIMA historically best (lowest MSE), weighted average shifts toward 1180.

Edge Cases

Input	Expected	Why
All models agree	Ensemble = individual	Consensus, high confidence
Models wildly disagree	Ensemble = compromise, wide CI	High uncertainty, flag for review
One model is outlier	Average dampens outlier	Ensemble robustness benefit

Gotchas注意事項

Diversity is key: Combining 5 ARIMA variants adds little. Combine fundamentally different approaches (statistical + ML + judgmental).
Weight instability: Optimal weights estimated on past data may not be optimal in the future. Simple average avoids this instability.
Correlation between errors: If model errors are correlated (they often are), ensemble improvement is limited. Seek models with uncorrelated errors.
Confidence intervals: Combining point forecasts is easy. Combining prediction intervals properly requires knowledge of error correlation structure.
Over-engineering risk: For stable, well-understood series, a single well-tuned model may outperform an ensemble. Ensembles shine for uncertain or volatile series.

References參考資料

For forecast combination methods survey, see references/combination-survey.md
For stacking meta-learner implementation, see references/stacking.md

Tags標籤

forecastingensemblemodel-combinationaccuracy