Evaluation Metrics Reference
All 12 built-in accuracy metrics for evaluating forecasts.
Quick Reference
| Metric | Formula | Lower Better | Use Case |
|---|---|---|---|
| MAE | Σ|e|/n | ✅ Yes | General purpose |
| RMSE | √(Σe²/n) | ✅ Yes | Penalize large errors |
| MAPE | Σ|e/a|×100/n | ✅ Yes | Percentage error |
| SMAPE | 2Σ|e/(a+p)|×100/n | ✅ Yes | When zeros present |
| MASE | MAE/MAE_naive | ✅ Yes | Relative to baseline |
| ME | Σe/n | ≈ Zero | Detect bias |
| Bias | Weighted ME | ≈ Zero | Over/underforecasting |
| R² | 1 - Σe²/Σ(y-ȳ)² | ✅ No (Higher) | Overall goodness |
| Coverage | % within interval | ≈ Target (95%) | Interval validity |
| Directional | % direction correct | ✅ No (Higher) | Direction accuracy |
Error Metrics
MAE (Mean Absolute Error)
Average absolute difference
SELECT TS_MAE(LIST(actual), LIST(predicted)) as MAE
FROM comparison;
Formula:
MAE = Σ|actual - predicted| / n
Range: 0 to ∞ Interpretation: Average forecast error in original units
Example:
Actual: [100, 110, 95, 105]
Predicted: [102, 108, 98, 104]
Errors: [2, 2, 3, 1]
MAE = (2 + 2 + 3 + 1) / 4 = 2.0
Pros:
- Simple, interpretable
- Same units as data
- Not affected by scale
Cons:
- Doesn't penalize large errors heavily
- Less sensitive to outliers
When to use:
- Easy to explain to stakeholders
- When all errors equally important
- Retail/inventory forecasting
RMSE (Root Mean Squared Error)
Square root of mean squared error
SELECT TS_RMSE(LIST(actual), LIST(predicted)) as RMSE
FROM comparison;
Formula:
RMSE = √(Σ(actual - predicted)² / n)
Range: 0 to ∞ Interpretation: Typical error, penalizes large errors
Example:
Errors: [2, 2, 3, 1]
Squared: [4, 4, 9, 1]
RMSE = √(18 / 4) = √4.5 = 2.12
Pros:
- Penalizes large errors (squared term)
- Standard in academia
- Good for optimization
Cons:
- Units are squared
- Heavily influenced by outliers
- Less interpretable
When to use:
- When large errors costly
- Model optimization
- Financial forecasting
MAPE (Mean Absolute Percentage Error)
Average percentage error
SELECT TS_MAPE(LIST(actual), LIST(predicted)) as MAPE_pct
FROM comparison;
Formula:
MAPE = Σ|actual - predicted| / |actual| × 100 / n
Range: 0 to ∞ (usually 0-100%) Interpretation: Average error as percentage of actual
Example:
Actuals: [100, 200, 50, 150]
Predicted: [105, 190, 55, 145]
% Errors: [5%, 5%, 10%, 3.3%]
MAPE = (5 + 5 + 10 + 3.3) / 4 = 5.8%
Pros:
- Scale-independent
- Percentage-based (easy to report)
- Standard for comparison
Cons:
- Undefined for actual = 0
- Biased (over-/under-forecasting)
- Penalizes overforecasts more
When to use:
- Business reporting
- Comparing across products/scales
- Default choice for managers
SMAPE (Symmetric MAPE)
Symmetric percentage error
SELECT TS_SMAPE(LIST(actual), LIST(predicted)) as SMAPE_pct
FROM comparison;
Formula:
SMAPE = 2 × Σ|actual - predicted| / (|actual| + |predicted|) × 100 / n
Range: 0 to 200% Interpretation: Symmetric percentage, handles zeros better
Pros:
- Symmetric (over/under equal)
- Works with zero values
- Bounded by 200%
Cons:
- Less commonly known
- Always bounded (sometimes limiting)
When to use:
- Data contains many zeros
- Want symmetric penalties
- When MAPE biases present
MASE (Mean Absolute Scaled Error)
Error scaled relative to naive baseline
SELECT TS_MASE(LIST(actual), LIST(predicted), seasonal_period) as MASE
FROM comparison;
Formula:
MASE = MAE / MAE(naive_forecast)
Where MAE(naive) is MAE of SeasonalNaive model
Range: 0 to ∞ Interpretation:
- MASE < 1: Better than naive
- MASE = 1: Same as naive
- MASE > 1: Worse than naive
Example:
MAE(my model) = 8.0
MAE(naive) = 10.0
MASE = 8.0 / 10.0 = 0.80
(My model is 20% better than naive)
Pros:
- Comparison to baseline
- Scale-independent
- Easy interpretation
Cons:
- Depends on naive benchmark
- Undefined if naive is perfect
When to use:
- Comparing to baseline
- Cross-product comparison
- Academic research
ME (Mean Error)
Average signed error (with direction)
SELECT TS_ME(LIST(actual), LIST(predicted)) as ME
FROM comparison;
Formula:
ME = Σ(actual - predicted) / n
Range: -∞ to +∞ Interpretation:
- Positive: Forecast too low (pessimistic)
- Negative: Forecast too high (optimistic)
- Zero: Unbiased
Example:
Errors: [2, 2, 3, 1]
ME = (2 + 2 + 3 + 1) / 4 = 2.0
(Consistently underforecasting)
Pros:
- Simple, shows direction
- Detects systematic bias
Cons:
- Errors cancel (may hide large opposite errors)
- Not for magnitude comparison
When to use:
- Detect bias (over/under)
- Inventory planning (avoid stockouts)
- Pricing decisions
Bias
Weighted measure of systematic over/under forecasting
SELECT TS_BIAS(LIST(actual), LIST(predicted)) as Bias
FROM comparison;
Similar to ME but weighted version
Interpretation:
- Positive: Overforecasting (pessimistic)
- Negative: Underforecasting (optimistic)
- Zero: Unbiased
Use when:
- Systemic bias needs weighting
- Different importance by period
Goodness of Fit
R² (Coefficient of Determination)
Proportion of variance explained
SELECT TS_R_SQUARED(LIST(actual), LIST(predicted)) as R_squared
FROM comparison;
Formula:
R² = 1 - (Σ(actual - predicted)²) / (Σ(actual - mean(actual))²)
Range: -∞ to 1 Interpretation:
- R² = 1.0: Perfect prediction
- R² = 0.8: Explains 80% of variance
- R² = 0.5: Explains 50% of variance
- R² = 0: No better than predicting mean
- R² < 0: Worse than mean
Example:
R² = 0.85
(Model explains 85% of variance in actual values)
Pros:
- Overall model quality
- Bounded interpretation
Cons:
- Can be negative
- Doesn't show error magnitude
- Sensitive to outliers
When to use:
- Overall model assessment
- Comparing different models
- Academic research
Interval Validation
Coverage
Percentage of actuals within prediction intervals
SELECT TS_COVERAGE(LIST(actual), LIST(lower), LIST(upper)) as coverage
FROM comparison;
Formula:
Coverage = % of actuals where: lower ≤ actual ≤ upper
Range: 0 to 1 (0% to 100%) Interpretation:
- Expected: 95% for 95% confidence
- Too low: Intervals too narrow (risky)
- Too high: Intervals too wide (wasteful)
Example:
Set confidence_level = 0.95
Generate 95% prediction intervals
Coverage = 0.93 (93% of actuals in intervals)
Expected = 0.95 (95% of actuals in intervals)
Result: Slightly narrow but close ✓
Ideal range: 92-98% for 95% CI
Pros:
- Direct interval validation
- Actionable feedback
Cons:
- Only valid if correct methodology
- Requires confidence_level specified
When to use:
- Validate prediction intervals
- Risk management
- Confidence interval checking
Directional Accuracy
Directional
Percentage of correct direction predictions
SELECT TS_DIRECTIONAL_ACCURACY(LIST(actual), LIST(predicted)) as directional_pct
FROM comparison;
Interpretation:
- Does forecast direction match actual?
- 50% = Random guessing
- 100% = Perfect direction
Use when:
- Direction matters more than magnitude
- Trading decisions
- Trend following
Complete Metrics Example
-- Calculate all metrics
WITH comparison AS (
SELECT
actual,
forecast,
lower_95,
upper_95
FROM forecast_results
)
SELECT
ROUND(TS_MAE(LIST(actual), LIST(forecast)), 2) as MAE,
ROUND(TS_RMSE(LIST(actual), LIST(forecast)), 2) as RMSE,
ROUND(TS_MAPE(LIST(actual), LIST(forecast)), 2) as MAPE_pct,
ROUND(TS_SMAPE(LIST(actual), LIST(forecast)), 2) as SMAPE_pct,
ROUND(TS_MASE(LIST(actual), LIST(forecast), 7), 2) as MASE,
ROUND(TS_ME(LIST(actual), LIST(forecast)), 2) as ME,
ROUND(TS_R_SQUARED(LIST(actual), LIST(forecast)), 4) as R_squared,
ROUND(TS_COVERAGE(LIST(actual), LIST(lower_95), LIST(upper_95)), 3) as coverage_95,
ROUND(TS_DIRECTIONAL_ACCURACY(LIST(actual), LIST(forecast)), 2) as directional_pct
FROM comparison;
Metric Selection Guide
Choose based on your goal:
| Goal | Primary | Secondary |
|---|---|---|
| Minimize errors | MAE or RMSE | MAPE |
| Executive report | MAPE | MAE |
| Avoid big mistakes | RMSE | MAE |
| Compare products | MAPE or MASE | SMAPE |
| Confidence needed | Coverage | R² |
| Direction matters | Directional | MAE |
| Inventory planning | ME (bias) | MASE |
Typical Metric Ranges
What's "good"?
MAPE: < 5% (Excellent)
5-10% (Good)
10-20% (Fair)
> 20% (Poor)
RMSE: < 10% of mean (Excellent)
10-20% of mean (Good)
20-30% of mean (Fair)
> 30% of mean (Poor)
R²: > 0.95 (Excellent)
0.80-0.95 (Good)
0.60-0.80 (Fair)
< 0.60 (Poor)
MASE: < 0.8 (Better than baseline)
0.8-1.0 (Similar to baseline)
> 1.0 (Worse than baseline)
Coverage: 93-97% (Good for 95% CI)
90-98% (Acceptable range)
< 90% or > 99% (Investigate)
Next Steps
- Production Deployment — Monitor metrics over time
- Model Comparison Guide — Use metrics to select models
- Evaluating Accuracy Concept — Deep dive on metrics
Key Takeaways
- ✅ Use MAE for simple, interpretable errors
- ✅ Use RMSE when large errors costly
- ✅ Use MAPE for percentage/scale comparison
- ✅ Always check multiple metrics
- ✅ Use MASE to compare against baseline
- ✅ Use ME/Bias to detect systematic over/underforecasting
- ✅ Validate coverage for prediction intervals
- ✅ Monitor metrics weekly in production