arXiv:2601.10863v3 Announce Type: replace
Abstract: Traditional time series forecasting methods optimize for accuracy alone. This objective neglects temporal consistency, in other words, how consistently a model predicts the same future event as the forecast origin changes. We introduce the forecast accuracy and coherence score (forecast AC score for short) for measuring the quality of probabilistic multi-horizon forecasts in a way that accounts for both multi-horizon accuracy and stability. Our score additionally allows user-specified weights to balance accuracy and consistency requirements. As an example application, we implement the score as a differentiable objective function for training seasonal auto-regressive integrated models and evaluate it on the M4 Hourly benchmark dataset. Results demonstrate consistent improvements over traditional maximum likelihood estimation. Regarding stability, the AC-optimized model generated out-of-sample forecasts with 15.8% reduced variance over forecasts targeting the same timestamp. In terms of accuracy, the AC-optimized model achieved considerable improvements for medium-to-long-horizon forecasts. While one-step-ahead forecasts exhibited a 3.9% increase in MSE, forecasts from horizon three onward experienced improved accuracy, with a peak improvement of approximately 6% in MSE at horizons 9-12. These results indicate that our metric successfully trains models to produce more stable and accurate multi-step forecasts in exchange for a relatively small degradation in one-step-ahead performance.
