Evaluation Metrics Reference

All 12 built-in accuracy metrics for evaluating forecasts.

Quick Reference

Metric	Formula	Lower Better	Use Case
MAE	Σ\|e\|/n	✅ Yes	General purpose
RMSE	√(Σe²/n)	✅ Yes	Penalize large errors
MAPE	Σ\|e/a\|×100/n	✅ Yes	Percentage error
SMAPE	2Σ\|e/(a+p)\|×100/n	✅ Yes	When zeros present
MASE	MAE/MAE_naive	✅ Yes	Relative to baseline
ME	Σe/n	≈ Zero	Detect bias
Bias	Weighted ME	≈ Zero	Over/underforecasting
R²	1 - Σe²/Σ(y-ȳ)²	✅ No (Higher)	Overall goodness
Coverage	% within interval	≈ Target (95%)	Interval validity
Directional	% direction correct	✅ No (Higher)	Direction accuracy

Error Metrics

MAE (Mean Absolute Error)

Average absolute difference

SELECT TS_MAE(LIST(actual), LIST(predicted)) as MAE
FROM comparison;

Formula:

MAE = Σ|actual - predicted| / n

Range: 0 to ∞ Interpretation: Average forecast error in original units

Example:

Actual:      [100, 110, 95, 105]
Predicted:   [102, 108, 98, 104]
Errors:      [2, 2, 3, 1]

MAE = (2 + 2 + 3 + 1) / 4 = 2.0

Pros:

Simple, interpretable
Same units as data
Not affected by scale

Cons:

Doesn't penalize large errors heavily
Less sensitive to outliers

When to use:

Easy to explain to stakeholders
When all errors equally important
Retail/inventory forecasting

RMSE (Root Mean Squared Error)

Square root of mean squared error

SELECT TS_RMSE(LIST(actual), LIST(predicted)) as RMSE
FROM comparison;

Formula:

RMSE = √(Σ(actual - predicted)² / n)

Range: 0 to ∞ Interpretation: Typical error, penalizes large errors

Example:

Errors:     [2, 2, 3, 1]
Squared:    [4, 4, 9, 1]

RMSE = √(18 / 4) = √4.5 = 2.12

Pros:

Penalizes large errors (squared term)
Standard in academia
Good for optimization

Cons:

Units are squared
Heavily influenced by outliers
Less interpretable

When to use:

When large errors costly
Model optimization
Financial forecasting

MAPE (Mean Absolute Percentage Error)

Average percentage error

SELECT TS_MAPE(LIST(actual), LIST(predicted)) as MAPE_pct
FROM comparison;

Formula:

MAPE = Σ|actual - predicted| / |actual| × 100 / n

Range: 0 to ∞ (usually 0-100%) Interpretation: Average error as percentage of actual

Example:

Actuals:    [100, 200, 50, 150]
Predicted:  [105, 190, 55, 145]
% Errors:   [5%, 5%, 10%, 3.3%]

MAPE = (5 + 5 + 10 + 3.3) / 4 = 5.8%

Pros:

Scale-independent
Percentage-based (easy to report)
Standard for comparison

Cons:

Undefined for actual = 0
Biased (over-/under-forecasting)
Penalizes overforecasts more

When to use:

Business reporting
Comparing across products/scales
Default choice for managers

SMAPE (Symmetric MAPE)

Symmetric percentage error

SELECT TS_SMAPE(LIST(actual), LIST(predicted)) as SMAPE_pct
FROM comparison;

Formula:

SMAPE = 2 × Σ|actual - predicted| / (|actual| + |predicted|) × 100 / n

Range: 0 to 200% Interpretation: Symmetric percentage, handles zeros better

Pros:

Symmetric (over/under equal)
Works with zero values
Bounded by 200%

Cons:

Less commonly known
Always bounded (sometimes limiting)

When to use:

Data contains many zeros
Want symmetric penalties
When MAPE biases present

MASE (Mean Absolute Scaled Error)

Error scaled relative to naive baseline

SELECT TS_MASE(LIST(actual), LIST(predicted), seasonal_period) as MASE
FROM comparison;

Formula:

MASE = MAE / MAE(naive_forecast)

Where MAE(naive) is MAE of SeasonalNaive model

Range: 0 to ∞ Interpretation:

MASE < 1: Better than naive
MASE = 1: Same as naive
MASE > 1: Worse than naive

Example:

MAE(my model) = 8.0
MAE(naive) = 10.0

MASE = 8.0 / 10.0 = 0.80
(My model is 20% better than naive)

Pros:

Comparison to baseline
Scale-independent
Easy interpretation

Cons:

Depends on naive benchmark
Undefined if naive is perfect

When to use:

Comparing to baseline
Cross-product comparison
Academic research

ME (Mean Error)

Average signed error (with direction)

SELECT TS_ME(LIST(actual), LIST(predicted)) as ME
FROM comparison;

Formula:

ME = Σ(actual - predicted) / n

Range: -∞ to +∞ Interpretation:

Positive: Forecast too low (pessimistic)
Negative: Forecast too high (optimistic)
Zero: Unbiased

Example:

Errors: [2, 2, 3, 1]

ME = (2 + 2 + 3 + 1) / 4 = 2.0
(Consistently underforecasting)

Pros:

Simple, shows direction
Detects systematic bias

Cons:

Errors cancel (may hide large opposite errors)
Not for magnitude comparison

When to use:

Detect bias (over/under)
Inventory planning (avoid stockouts)
Pricing decisions

Bias

Weighted measure of systematic over/under forecasting

SELECT TS_BIAS(LIST(actual), LIST(predicted)) as Bias
FROM comparison;

Similar to ME but weighted version

Interpretation:

Positive: Overforecasting (pessimistic)
Negative: Underforecasting (optimistic)
Zero: Unbiased

Use when:

Systemic bias needs weighting
Different importance by period

Goodness of Fit

R² (Coefficient of Determination)

Proportion of variance explained

SELECT TS_R_SQUARED(LIST(actual), LIST(predicted)) as R_squared
FROM comparison;

Formula:

R² = 1 - (Σ(actual - predicted)²) / (Σ(actual - mean(actual))²)

Range: -∞ to 1 Interpretation:

R² = 1.0: Perfect prediction
R² = 0.8: Explains 80% of variance
R² = 0.5: Explains 50% of variance
R² = 0: No better than predicting mean
R² < 0: Worse than mean

Example:

R² = 0.85
(Model explains 85% of variance in actual values)

Pros:

Overall model quality
Bounded interpretation

Cons:

Can be negative
Doesn't show error magnitude
Sensitive to outliers

When to use:

Overall model assessment
Comparing different models
Academic research

Interval Validation

Coverage

Percentage of actuals within prediction intervals

SELECT TS_COVERAGE(LIST(actual), LIST(lower), LIST(upper)) as coverage
FROM comparison;

Formula:

Coverage = % of actuals where: lower ≤ actual ≤ upper

Range: 0 to 1 (0% to 100%) Interpretation:

Expected: 95% for 95% confidence
Too low: Intervals too narrow (risky)
Too high: Intervals too wide (wasteful)

Example:

Set confidence_level = 0.95
Generate 95% prediction intervals

Coverage = 0.93 (93% of actuals in intervals)
Expected = 0.95 (95% of actuals in intervals)

Result: Slightly narrow but close ✓

Ideal range: 92-98% for 95% CI

Pros:

Direct interval validation
Actionable feedback

Cons:

Only valid if correct methodology
Requires confidence_level specified

When to use:

Validate prediction intervals
Risk management
Confidence interval checking

Directional Accuracy

Directional

Percentage of correct direction predictions

SELECT TS_DIRECTIONAL_ACCURACY(LIST(actual), LIST(predicted)) as directional_pct
FROM comparison;

Interpretation:

Does forecast direction match actual?
50% = Random guessing
100% = Perfect direction

Use when:

Direction matters more than magnitude
Trading decisions
Trend following

Complete Metrics Example

-- Calculate all metrics
WITH comparison AS (
    SELECT
        actual,
        forecast,
        lower_95,
        upper_95
    FROM forecast_results
)
SELECT
    ROUND(TS_MAE(LIST(actual), LIST(forecast)), 2) as MAE,
    ROUND(TS_RMSE(LIST(actual), LIST(forecast)), 2) as RMSE,
    ROUND(TS_MAPE(LIST(actual), LIST(forecast)), 2) as MAPE_pct,
    ROUND(TS_SMAPE(LIST(actual), LIST(forecast)), 2) as SMAPE_pct,
    ROUND(TS_MASE(LIST(actual), LIST(forecast), 7), 2) as MASE,
    ROUND(TS_ME(LIST(actual), LIST(forecast)), 2) as ME,
    ROUND(TS_R_SQUARED(LIST(actual), LIST(forecast)), 4) as R_squared,
    ROUND(TS_COVERAGE(LIST(actual), LIST(lower_95), LIST(upper_95)), 3) as coverage_95,
    ROUND(TS_DIRECTIONAL_ACCURACY(LIST(actual), LIST(forecast)), 2) as directional_pct
FROM comparison;

Metric Selection Guide

Choose based on your goal:

Goal	Primary	Secondary
Minimize errors	MAE or RMSE	MAPE
Executive report	MAPE	MAE
Avoid big mistakes	RMSE	MAE
Compare products	MAPE or MASE	SMAPE
Confidence needed	Coverage	R²
Direction matters	Directional	MAE
Inventory planning	ME (bias)	MASE

Typical Metric Ranges

What's "good"?

MAPE:       < 5% (Excellent)
            5-10% (Good)
            10-20% (Fair)
            > 20% (Poor)

RMSE:       < 10% of mean (Excellent)
            10-20% of mean (Good)
            20-30% of mean (Fair)
            > 30% of mean (Poor)

R²:         > 0.95 (Excellent)
            0.80-0.95 (Good)
            0.60-0.80 (Fair)
            < 0.60 (Poor)

MASE:       < 0.8 (Better than baseline)
            0.8-1.0 (Similar to baseline)
            > 1.0 (Worse than baseline)

Coverage:   93-97% (Good for 95% CI)
            90-98% (Acceptable range)
            < 90% or > 99% (Investigate)

Next Steps

Production Deployment — Monitor metrics over time
Model Comparison Guide — Use metrics to select models
Evaluating Accuracy Concept — Deep dive on metrics

Key Takeaways

✅ Use MAE for simple, interpretable errors
✅ Use RMSE when large errors costly
✅ Use MAPE for percentage/scale comparison
✅ Always check multiple metrics
✅ Use MASE to compare against baseline
✅ Use ME/Bias to detect systematic over/underforecasting
✅ Validate coverage for prediction intervals
✅ Monitor metrics weekly in production

Quick Reference​

Error Metrics​

MAE (Mean Absolute Error)​

RMSE (Root Mean Squared Error)​

MAPE (Mean Absolute Percentage Error)​

SMAPE (Symmetric MAPE)​

MASE (Mean Absolute Scaled Error)​

ME (Mean Error)​

Bias​

Goodness of Fit​

R² (Coefficient of Determination)​

Interval Validation​

Coverage​

Directional Accuracy​

Directional​

Complete Metrics Example​

Metric Selection Guide​

Typical Metric Ranges​

Next Steps​

Key Takeaways​

Quick Reference

Error Metrics

MAE (Mean Absolute Error)

RMSE (Root Mean Squared Error)

MAPE (Mean Absolute Percentage Error)

SMAPE (Symmetric MAPE)

MASE (Mean Absolute Scaled Error)

ME (Mean Error)

Bias

Goodness of Fit

R² (Coefficient of Determination)

Interval Validation

Coverage

Directional Accuracy

Directional

Complete Metrics Example

Metric Selection Guide

Typical Metric Ranges

Next Steps

Key Takeaways