Skip to main content

Diagnostics

Model validation, assumption checking, and residual analysis.

Quick Reference

FunctionDescriptionSQL Signature
VIFMulticollinearity detectionanofox_stats_vif(x) -> LIST(DOUBLE)
VIF (agg)VIF from row-wise dataanofox_stats_vif_agg(x) -> LIST(DOUBLE)
ResidualsComprehensive residual analysisanofox_stats_residuals_diagnostics_agg(y, y_hat, [x]) -> STRUCT
AICAkaike Information Criterionanofox_stats_aic(rss, n, k) -> DOUBLE
BICBayesian Information Criterionanofox_stats_bic(rss, n, k) -> DOUBLE

VIF (Variance Inflation Factor)

Detect multicollinearity among predictors.

Scalar Version

For matrix input.

Parameters

ParameterTypeRequiredDefaultDescription
xLIST(LIST(DOUBLE))Yes-Predictor matrix

Example

SELECT anofox_stats_vif(
[[1.0, 2.0], [1.5, 2.5], [2.0, 3.0]]
) as vif_values;

Aggregate Version

For row-wise input.

Parameters

ParameterTypeRequiredDefaultDescription
xLIST(DOUBLE)Yes-Predictors per row

Output

Returns LIST(DOUBLE) - VIF for each predictor.

Interpretation

VIF RangeMulticollinearity
< 5Acceptable
5-10Moderate
> 10Severe (consider removing/combining)

Example

WITH vif_result AS (
SELECT anofox_stats_vif_agg([price, promotion, competitor_price]) as vif
FROM sales_data
)
SELECT
vif[1] as price_vif,
vif[2] as promotion_vif,
vif[3] as competitor_vif,
CASE WHEN max(v) > 10 THEN 'SEVERE'
WHEN max(v) > 5 THEN 'MODERATE'
ELSE 'OK' END as status
FROM vif_result, (SELECT unnest(vif) as v FROM vif_result) t
GROUP BY vif[1], vif[2], vif[3];

Residual Diagnostics

Comprehensive residual analysis including leverage and influence.

Scalar Version

Parameters

ParameterTypeRequiredDefaultDescription
yLIST(DOUBLE)Yes-Actual values
y_hatLIST(DOUBLE)Yes-Predicted values
xLIST(LIST(DOUBLE))No-Predictor matrix
optionsMAPNo-Configuration options

Aggregate Version

Parameters

ParameterTypeRequiredDefaultDescription
yDOUBLEYes-Actual value
y_hatDOUBLEYes-Predicted value
xLIST(DOUBLE)No-Predictors

Output

FieldTypeDescription
residualsLIST(DOUBLE)Raw residuals (y - y_hat)
standardized_residualsLIST(DOUBLE)Standardized residuals
studentized_residualsLIST(DOUBLE)Studentized residuals
leverageLIST(DOUBLE)Hat values (diagonal of H matrix)
cooks_distanceLIST(DOUBLE)Cook's D (influence measure)
dffitsLIST(DOUBLE)DFFITS (influence on fitted values)
outliersLIST(BOOLEAN)Flagged outliers

Interpretation

MetricThresholdMeaning
Leverage> 2(k+1)/nHigh leverage point
Cook's D> 4/nInfluential observation
|Studentized residual|> 3Potential outlier

Example

WITH model AS (
SELECT
y,
anofox_stats_predict([x1, x2], coefficients, intercept) as y_hat,
[x1, x2] as x
FROM sales_data, fitted_model
),
diag AS (
SELECT anofox_stats_residuals_diagnostics_agg(y, y_hat, x) as d
FROM model
)
SELECT
row_number() OVER () as obs,
d.residuals[obs] as residual,
d.leverage[obs] as leverage,
d.cooks_distance[obs] as cooks_d,
d.outliers[obs] as is_outlier
FROM diag
WHERE d.cooks_distance[obs] > 4.0 / count(*) OVER ();

Information Criteria

Model selection metrics.

AIC (Akaike Information Criterion)

Parameters

ParameterTypeRequiredDefaultDescription
rssDOUBLEYes-Residual sum of squares
nBIGINTYes-Number of observations
kBIGINTYes-Number of parameters

Formula: AIC = n × log(RSS/n) + 2k

Example

SELECT anofox_stats_aic(rss, n, k) as aic;

BIC (Bayesian Information Criterion)

Parameters

ParameterTypeRequiredDefaultDescription
rssDOUBLEYes-Residual sum of squares
nBIGINTYes-Number of observations
kBIGINTYes-Number of parameters

Formula: BIC = n × log(RSS/n) + k × log(n)

Example

SELECT anofox_stats_bic(rss, n, k) as bic;

Usage Guidelines

  • Lower values = better model
  • BIC penalizes complexity more than AIC
  • Use AIC for prediction, BIC for explanation

Model Comparison Example

WITH models AS (
SELECT
'Model 1' as name,
anofox_stats_ols_fit_agg(y, [x1]) as m
FROM data
UNION ALL
SELECT
'Model 2',
anofox_stats_ols_fit_agg(y, [x1, x2])
FROM data
UNION ALL
SELECT
'Model 3',
anofox_stats_ols_fit_agg(y, [x1, x2, x3])
FROM data
)
SELECT
name,
m.r_squared,
m.aic,
m.bic,
ROW_NUMBER() OVER (ORDER BY m.bic) as bic_rank
FROM models
ORDER BY m.bic;

Prediction

Generate predictions from fitted coefficients.

Parameters

ParameterTypeRequiredDefaultDescription
xLIST(LIST(DOUBLE))Yes-New predictor values
coefficientsLIST(DOUBLE)Yes-Fitted coefficients
interceptDOUBLEYes-Intercept term

Returns: LIST(DOUBLE) - predicted values

Example

WITH fitted AS (
SELECT
(anofox_stats_ols_fit_agg(y, [x1, x2])).coefficients as coef,
(anofox_stats_ols_fit_agg(y, [x1, x2])).intercept as intercept
FROM training_data
)
SELECT
anofox_stats_predict([[10, 5], [15, 8], [20, 10]], coef, intercept) as forecasts
FROM fitted;

Normality Tests

For testing residual normality, see Parametric Tests:

  • anofox_stats_shapiro_wilk_agg - Shapiro-Wilk test
  • anofox_stats_jarque_bera_agg - Jarque-Bera test
  • anofox_stats_dagostino_k2_agg - D'Agostino K² test

Complete Diagnostic Workflow

Full model validation workflow:

-- 1. Fit model
WITH fitted AS (
SELECT anofox_stats_ols_fit_agg(
revenue,
[marketing_spend, seasonality, competitor_activity],
true, true, 0.95
) as model
FROM sales_data
),

-- 2. Check multicollinearity
vif_check AS (
SELECT anofox_stats_vif_agg([marketing_spend, seasonality, competitor_activity]) as vif
FROM sales_data
),

-- 3. Get predictions
predictions AS (
SELECT
s.revenue as actual,
anofox_stats_predict(
[[s.marketing_spend, s.seasonality, s.competitor_activity]],
f.model.coefficients,
f.model.intercept
)[1] as predicted
FROM sales_data s, fitted f
),

-- 4. Test residual normality
normality AS (
SELECT anofox_stats_jarque_bera_agg(actual - predicted) as jb
FROM predictions
)

SELECT
'R-squared' as metric, round(model.r_squared, 3)::VARCHAR as value FROM fitted
UNION ALL
SELECT 'Max VIF', round(max(v), 2)::VARCHAR FROM vif_check, unnest(vif) as v
UNION ALL
SELECT 'Normality p-value', round(jb.p_value, 4)::VARCHAR FROM normality
UNION ALL
SELECT 'Model Valid',
CASE WHEN model.r_squared > 0.5
AND (SELECT max(v) FROM vif_check, unnest(vif) as v) < 10
AND jb.p_value > 0.05
THEN 'YES' ELSE 'NO' END
FROM fitted, normality;

🍪 Cookie Settings