Skip to main content

Evaluation Metrics Reference

All 12 built-in accuracy metrics for evaluating forecasts.

Quick Reference

MetricFormulaLower BetterUse Case
MAEΣ|e|/n✅ YesGeneral purpose
RMSE√(Σe²/n)✅ YesPenalize large errors
MAPEΣ|e/a|×100/n✅ YesPercentage error
SMAPE2Σ|e/(a+p)|×100/n✅ YesWhen zeros present
MASEMAE/MAE_naive✅ YesRelative to baseline
MEΣe/n≈ ZeroDetect bias
BiasWeighted ME≈ ZeroOver/underforecasting
1 - Σe²/Σ(y-ȳ)²✅ No (Higher)Overall goodness
Coverage% within interval≈ Target (95%)Interval validity
Directional% direction correct✅ No (Higher)Direction accuracy

Error Metrics

MAE (Mean Absolute Error)

Average absolute difference

SELECT TS_MAE(LIST(actual), LIST(predicted)) as MAE
FROM comparison;

Formula:

MAE = Σ|actual - predicted| / n

Range: 0 to ∞ Interpretation: Average forecast error in original units

Example:

Actual:      [100, 110, 95, 105]
Predicted: [102, 108, 98, 104]
Errors: [2, 2, 3, 1]

MAE = (2 + 2 + 3 + 1) / 4 = 2.0

Pros:

  • Simple, interpretable
  • Same units as data
  • Not affected by scale

Cons:

  • Doesn't penalize large errors heavily
  • Less sensitive to outliers

When to use:

  • Easy to explain to stakeholders
  • When all errors equally important
  • Retail/inventory forecasting

RMSE (Root Mean Squared Error)

Square root of mean squared error

SELECT TS_RMSE(LIST(actual), LIST(predicted)) as RMSE
FROM comparison;

Formula:

RMSE = √(Σ(actual - predicted)² / n)

Range: 0 to ∞ Interpretation: Typical error, penalizes large errors

Example:

Errors:     [2, 2, 3, 1]
Squared: [4, 4, 9, 1]

RMSE = √(18 / 4) = √4.5 = 2.12

Pros:

  • Penalizes large errors (squared term)
  • Standard in academia
  • Good for optimization

Cons:

  • Units are squared
  • Heavily influenced by outliers
  • Less interpretable

When to use:

  • When large errors costly
  • Model optimization
  • Financial forecasting

MAPE (Mean Absolute Percentage Error)

Average percentage error

SELECT TS_MAPE(LIST(actual), LIST(predicted)) as MAPE_pct
FROM comparison;

Formula:

MAPE = Σ|actual - predicted| / |actual| × 100 / n

Range: 0 to ∞ (usually 0-100%) Interpretation: Average error as percentage of actual

Example:

Actuals:    [100, 200, 50, 150]
Predicted: [105, 190, 55, 145]
% Errors: [5%, 5%, 10%, 3.3%]

MAPE = (5 + 5 + 10 + 3.3) / 4 = 5.8%

Pros:

  • Scale-independent
  • Percentage-based (easy to report)
  • Standard for comparison

Cons:

  • Undefined for actual = 0
  • Biased (over-/under-forecasting)
  • Penalizes overforecasts more

When to use:

  • Business reporting
  • Comparing across products/scales
  • Default choice for managers

SMAPE (Symmetric MAPE)

Symmetric percentage error

SELECT TS_SMAPE(LIST(actual), LIST(predicted)) as SMAPE_pct
FROM comparison;

Formula:

SMAPE = 2 × Σ|actual - predicted| / (|actual| + |predicted|) × 100 / n

Range: 0 to 200% Interpretation: Symmetric percentage, handles zeros better

Pros:

  • Symmetric (over/under equal)
  • Works with zero values
  • Bounded by 200%

Cons:

  • Less commonly known
  • Always bounded (sometimes limiting)

When to use:

  • Data contains many zeros
  • Want symmetric penalties
  • When MAPE biases present

MASE (Mean Absolute Scaled Error)

Error scaled relative to naive baseline

SELECT TS_MASE(LIST(actual), LIST(predicted), seasonal_period) as MASE
FROM comparison;

Formula:

MASE = MAE / MAE(naive_forecast)

Where MAE(naive) is MAE of SeasonalNaive model

Range: 0 to ∞ Interpretation:

  • MASE < 1: Better than naive
  • MASE = 1: Same as naive
  • MASE > 1: Worse than naive

Example:

MAE(my model) = 8.0
MAE(naive) = 10.0

MASE = 8.0 / 10.0 = 0.80
(My model is 20% better than naive)

Pros:

  • Comparison to baseline
  • Scale-independent
  • Easy interpretation

Cons:

  • Depends on naive benchmark
  • Undefined if naive is perfect

When to use:

  • Comparing to baseline
  • Cross-product comparison
  • Academic research

ME (Mean Error)

Average signed error (with direction)

SELECT TS_ME(LIST(actual), LIST(predicted)) as ME
FROM comparison;

Formula:

ME = Σ(actual - predicted) / n

Range: -∞ to +∞ Interpretation:

  • Positive: Forecast too low (pessimistic)
  • Negative: Forecast too high (optimistic)
  • Zero: Unbiased

Example:

Errors: [2, 2, 3, 1]

ME = (2 + 2 + 3 + 1) / 4 = 2.0
(Consistently underforecasting)

Pros:

  • Simple, shows direction
  • Detects systematic bias

Cons:

  • Errors cancel (may hide large opposite errors)
  • Not for magnitude comparison

When to use:

  • Detect bias (over/under)
  • Inventory planning (avoid stockouts)
  • Pricing decisions

Bias

Weighted measure of systematic over/under forecasting

SELECT TS_BIAS(LIST(actual), LIST(predicted)) as Bias
FROM comparison;

Similar to ME but weighted version

Interpretation:

  • Positive: Overforecasting (pessimistic)
  • Negative: Underforecasting (optimistic)
  • Zero: Unbiased

Use when:

  • Systemic bias needs weighting
  • Different importance by period

Goodness of Fit

R² (Coefficient of Determination)

Proportion of variance explained

SELECT TS_R_SQUARED(LIST(actual), LIST(predicted)) as R_squared
FROM comparison;

Formula:

R² = 1 - (Σ(actual - predicted)²) / (Σ(actual - mean(actual))²)

Range: -∞ to 1 Interpretation:

  • R² = 1.0: Perfect prediction
  • R² = 0.8: Explains 80% of variance
  • R² = 0.5: Explains 50% of variance
  • R² = 0: No better than predicting mean
  • R² < 0: Worse than mean

Example:

R² = 0.85
(Model explains 85% of variance in actual values)

Pros:

  • Overall model quality
  • Bounded interpretation

Cons:

  • Can be negative
  • Doesn't show error magnitude
  • Sensitive to outliers

When to use:

  • Overall model assessment
  • Comparing different models
  • Academic research

Interval Validation

Coverage

Percentage of actuals within prediction intervals

SELECT TS_COVERAGE(LIST(actual), LIST(lower), LIST(upper)) as coverage
FROM comparison;

Formula:

Coverage = % of actuals where: lower ≤ actual ≤ upper

Range: 0 to 1 (0% to 100%) Interpretation:

  • Expected: 95% for 95% confidence
  • Too low: Intervals too narrow (risky)
  • Too high: Intervals too wide (wasteful)

Example:

Set confidence_level = 0.95
Generate 95% prediction intervals

Coverage = 0.93 (93% of actuals in intervals)
Expected = 0.95 (95% of actuals in intervals)

Result: Slightly narrow but close ✓

Ideal range: 92-98% for 95% CI

Pros:

  • Direct interval validation
  • Actionable feedback

Cons:

  • Only valid if correct methodology
  • Requires confidence_level specified

When to use:

  • Validate prediction intervals
  • Risk management
  • Confidence interval checking

Directional Accuracy

Directional

Percentage of correct direction predictions

SELECT TS_DIRECTIONAL_ACCURACY(LIST(actual), LIST(predicted)) as directional_pct
FROM comparison;

Interpretation:

  • Does forecast direction match actual?
  • 50% = Random guessing
  • 100% = Perfect direction

Use when:

  • Direction matters more than magnitude
  • Trading decisions
  • Trend following

Complete Metrics Example

-- Calculate all metrics
WITH comparison AS (
SELECT
actual,
forecast,
lower_95,
upper_95
FROM forecast_results
)
SELECT
ROUND(TS_MAE(LIST(actual), LIST(forecast)), 2) as MAE,
ROUND(TS_RMSE(LIST(actual), LIST(forecast)), 2) as RMSE,
ROUND(TS_MAPE(LIST(actual), LIST(forecast)), 2) as MAPE_pct,
ROUND(TS_SMAPE(LIST(actual), LIST(forecast)), 2) as SMAPE_pct,
ROUND(TS_MASE(LIST(actual), LIST(forecast), 7), 2) as MASE,
ROUND(TS_ME(LIST(actual), LIST(forecast)), 2) as ME,
ROUND(TS_R_SQUARED(LIST(actual), LIST(forecast)), 4) as R_squared,
ROUND(TS_COVERAGE(LIST(actual), LIST(lower_95), LIST(upper_95)), 3) as coverage_95,
ROUND(TS_DIRECTIONAL_ACCURACY(LIST(actual), LIST(forecast)), 2) as directional_pct
FROM comparison;

Metric Selection Guide

Choose based on your goal:

GoalPrimarySecondary
Minimize errorsMAE or RMSEMAPE
Executive reportMAPEMAE
Avoid big mistakesRMSEMAE
Compare productsMAPE or MASESMAPE
Confidence neededCoverage
Direction mattersDirectionalMAE
Inventory planningME (bias)MASE

Typical Metric Ranges

What's "good"?

MAPE:       < 5% (Excellent)
5-10% (Good)
10-20% (Fair)
> 20% (Poor)

RMSE: < 10% of mean (Excellent)
10-20% of mean (Good)
20-30% of mean (Fair)
> 30% of mean (Poor)

R²: > 0.95 (Excellent)
0.80-0.95 (Good)
0.60-0.80 (Fair)
< 0.60 (Poor)

MASE: < 0.8 (Better than baseline)
0.8-1.0 (Similar to baseline)
> 1.0 (Worse than baseline)

Coverage: 93-97% (Good for 95% CI)
90-98% (Acceptable range)
< 90% or > 99% (Investigate)

Next Steps


Key Takeaways

  • ✅ Use MAE for simple, interpretable errors
  • ✅ Use RMSE when large errors costly
  • ✅ Use MAPE for percentage/scale comparison
  • ✅ Always check multiple metrics
  • ✅ Use MASE to compare against baseline
  • ✅ Use ME/Bias to detect systematic over/underforecasting
  • ✅ Validate coverage for prediction intervals
  • ✅ Monitor metrics weekly in production
🍪 Cookie Settings