Evaluating Forecast Accuracy
Measure and interpret forecast quality to choose the best models and identify improvements.
Why Evaluation Matters
Bad evaluation → Wrong model chosen → Terrible production performance
Good evaluation → Confident model selection → Reliable forecasts
You must evaluate forecasts before deployment to ensure quality.
Core Evaluation Approach
1. Split Data: Train vs. Test
Historical data (365 days)
│
├─ Training set (300 days) - Use to fit model
│
└─ Test set (65 days) - Use to evaluate
Why test on different data?
- Training metrics are optimistic (model has seen this data)
- Test metrics show real-world performance (new data)
2. Generate Forecast on Test Set
-- Fit model on training data
CREATE TABLE forecast AS
SELECT * FROM TS_FORECAST(
'training_data',
'date',
'sales',
'AutoETS',
65,
{'seasonal_period': 7}
);
-- Compare against actual test data
SELECT
t.date,
t.sales as actual,
f.point_forecast as predicted,
ABS(t.sales - f.point_forecast) as absolute_error
FROM test_data t
LEFT JOIN forecast f ON t.date = f.date_col
ORDER BY t.date;
3. Calculate Accuracy Metrics
-- Aggregate errors into metrics
SELECT
TS_MAE(...) as MAE,
TS_RMSE(...) as RMSE,
TS_MAPE(...) as MAPE
FROM ...;
4. Compare Models
-- Run evaluation for multiple models
-- Choose the one with best (lowest) metrics
Evaluation Metrics
AnoFox Forecast provides 12 built-in metrics:
Level 1: Basic Metrics (Start Here)
MAE (Mean Absolute Error)
Average of absolute errors
TS_MAE(actual_list, predicted_list)
Formula: Σ|actual - predicted| / n
Example:
Actuals: [100, 110, 95, 105]
Predictions: [102, 108, 98, 104]
Errors: [2, 2, 3, 1]
MAE = (2 + 2 + 3 + 1) / 4 = 2.0
Interpretation:
- MAE = 2.0 means average forecast error is 2 units
- Lower is better
- Same units as original data (interpretable)
Use when: Easy interpretation needed, outliers shouldn't dominate
Pros: ✅ Simple, interpretable Cons: ❌ Doesn't penalize large errors heavily