Skip to main content

DuckDB Patterns

How the API Works

Every statistics function follows a consistent naming pattern:

anofox_stats_<method>_<variant>

Variants determine how the function integrates with SQL:

SuffixSQL PatternUse Case
_fitScalar functionFit a single model on your entire dataset
_fit_aggGROUP BYFit separate models per group (e.g., per region, per product)
_fit_predictOVER (...)Rolling or expanding window models (e.g., 60-day rolling regression)
_predict_aggGROUP BYBatch predictions per group

All model functions return a STRUCT containing coefficients, p-values, R², and diagnostics. Extract fields using (result).field_name syntax:

SELECT
(model).r_squared,
(model).coefficients[2] as slope,
(model).p_values[2] as p_value
FROM (
SELECT anofox_stats_ols_fit_agg(sales, [advertising]) as model
FROM data
);

Quick Reference

CategoryPatternDescription
Aggregates_agg suffixPer-group models with GROUP BY
Windows_fit_predict suffixRolling/expanding with OVER
Predictionanofox_stats_predictGenerate predictions from coefficients
Info Criteriaanofox_stats_aic, anofox_stats_bicModel selection metrics

Aggregate Function Pattern

All _agg functions work with GROUP BY for per-group models.

Basic GROUP BY

SELECT
group_column,
anofox_stats_ols_fit_agg(y, [x1, x2]) as model
FROM data
GROUP BY group_column;

Extracting Results

SELECT
region,
(model).r_squared as fit,
(model).coefficients[1] as intercept,
(model).coefficients[2] as slope,
(model).p_values[2] as p_value
FROM (
SELECT
region,
anofox_stats_ols_fit_agg(sales, [marketing_spend]) as model
FROM regional_data
GROUP BY region
);

Available Aggregate Functions

Regression Aggregates

FunctionDescription
anofox_stats_ols_fit_aggOLS regression
anofox_stats_ridge_fit_aggRidge regression
anofox_stats_wls_fit_aggWeighted least squares
anofox_stats_rls_fit_aggRecursive least squares
anofox_stats_elasticnet_fit_aggElastic Net
anofox_stats_poisson_fit_aggPoisson GLM
anofox_stats_alm_fit_aggAugmented linear model
anofox_stats_bls_fit_aggBounded least squares
anofox_stats_nnls_fit_aggNon-negative least squares

Prediction Aggregates

FunctionDescription
anofox_stats_ols_predict_aggBatch OLS predictions
anofox_stats_ridge_predict_aggBatch Ridge predictions
anofox_stats_wls_predict_aggBatch WLS predictions
anofox_stats_rls_predict_aggBatch RLS predictions
anofox_stats_elasticnet_predict_aggBatch Elastic Net predictions

Statistical Test Aggregates

FunctionDescription
anofox_stats_t_test_aggt-test
anofox_stats_one_way_anova_aggANOVA
anofox_stats_mann_whitney_u_aggMann-Whitney U
anofox_stats_kruskal_wallis_aggKruskal-Wallis
anofox_stats_pearson_aggPearson correlation
anofox_stats_spearman_aggSpearman correlation
anofox_stats_chisq_test_aggChi-square test

Diagnostic Aggregates

FunctionDescription
anofox_stats_vif_aggVariance Inflation Factor
anofox_stats_shapiro_wilk_aggShapiro-Wilk normality
anofox_stats_jarque_bera_aggJarque-Bera normality
anofox_stats_residuals_diagnostics_aggResidual analysis

Demand Analysis Aggregates

FunctionDescription
anofox_stats_aid_aggAID classification
anofox_stats_aid_anomaly_aggAID anomaly detection

Window Functions

Use _fit_predict functions with OVER for rolling/expanding analyses.

Rolling Regression

SELECT
date,
value,
anofox_stats_ols_fit_predict(y, [x1, x2]) OVER (
ORDER BY date
ROWS BETWEEN 60 PRECEDING AND CURRENT ROW
) as rolling_model
FROM time_series_data;

Expanding Window

SELECT
date,
anofox_stats_ols_fit_predict(y, [x1]) OVER (
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) as expanding_model
FROM data;

Partitioned Rolling

SELECT
region,
date,
anofox_stats_ridge_fit_predict(y, [x1, x2], 0.5) OVER (
PARTITION BY region
ORDER BY date
ROWS BETWEEN 30 PRECEDING AND CURRENT ROW
) as regional_rolling
FROM multi_region_data;

Window Function Variants

FunctionDescription
anofox_stats_ols_fit_predictRolling/expanding OLS
anofox_stats_ridge_fit_predictRolling/expanding Ridge
anofox_stats_wls_fit_predictRolling/expanding WLS
anofox_stats_rls_fit_predictRolling/expanding RLS
anofox_stats_elasticnet_fit_predictRolling/expanding Elastic Net

Window Specifications

Rolling Window

Fixed-size moving window:

ROWS BETWEEN 60 PRECEDING AND CURRENT ROW
-- Uses last 61 observations (including current)

Expanding Window

Growing window from start:

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
-- Uses all observations up to current

Range-Based Window

Time-based window:

RANGE BETWEEN INTERVAL '30 days' PRECEDING AND CURRENT ROW
-- Uses observations within last 30 days

Prediction

Generate predictions from fitted coefficients.

Parameters

ParameterTypeRequiredDefaultDescription
xLIST(LIST(DOUBLE))Yes-New predictor values
coefficientsLIST(DOUBLE)Yes-Fitted coefficients
interceptDOUBLEYes-Intercept term

Returns: LIST(DOUBLE) - predicted values

Example

WITH fitted AS (
SELECT anofox_stats_ols_fit_agg(revenue, [marketing, seasonality]) as model
FROM historical_data
)
SELECT
scenario_name,
anofox_stats_predict(
[[marketing_plan, seasonal_factor]],
model.coefficients,
model.intercept
)[1] as predicted_revenue
FROM scenarios, fitted;

Information Criteria

AIC (Akaike Information Criterion)

SELECT anofox_stats_aic(
rss, -- DOUBLE: residual sum of squares
n, -- BIGINT: number of observations
k -- BIGINT: number of parameters (including intercept)
) as aic;

BIC (Bayesian Information Criterion)

SELECT anofox_stats_bic(
rss, -- DOUBLE: residual sum of squares
n, -- BIGINT: number of observations
k -- BIGINT: number of parameters
) as bic;

Example

WITH residuals AS (
SELECT sum(power(actual - predicted, 2)) as rss,
count(*) as n
FROM predictions
)
SELECT
anofox_stats_aic(rss, n, 3) as aic, -- 3 = intercept + 2 predictors
anofox_stats_bic(rss, n, 3) as bic
FROM residuals;

Working with STRUCT Results

All model functions return STRUCT types. Here's how to extract values.

Direct Field Access

SELECT
(model).r_squared,
(model).coefficients[2],
(model).p_values[2]
FROM (
SELECT anofox_stats_ols_fit_agg(y, [x1, x2]) as model
FROM data
);

Named Extraction

SELECT
result.coefficients[1] as intercept,
result.coefficients[2] as x1_effect,
result.coefficients[3] as x2_effect,
result.p_values[2] as x1_pvalue,
result.r_squared as fit
FROM (
SELECT anofox_stats_ols_fit_agg(y, [x1, x2]) as result
FROM data
);

Unnesting Arrays

WITH model AS (
SELECT anofox_stats_ols_fit_agg(y, [x1, x2, x3]) as m
FROM data
)
SELECT
idx,
m.coefficients[idx] as coefficient,
m.std_errors[idx] as std_error,
m.p_values[idx] as p_value
FROM model, generate_series(1, array_length(m.coefficients)) as idx;

Common Patterns

Filtering by Significance

SELECT
predictor_name,
coefficient,
p_value
FROM model_results
WHERE p_value < 0.05;

Model Comparison Table

WITH models AS (
SELECT 'Simple' as name, anofox_stats_ols_fit_agg(y, [x1]) as m FROM data
UNION ALL
SELECT 'Full', anofox_stats_ols_fit_agg(y, [x1, x2, x3]) FROM data
)
SELECT
name,
round(m.r_squared, 4) as r2,
round(m.adj_r_squared, 4) as adj_r2,
round(m.aic, 2) as aic,
round(m.bic, 2) as bic,
round(m.rmse, 4) as rmse
FROM models
ORDER BY m.bic;

Coefficient Summary Table

WITH model AS (
SELECT anofox_stats_ols_fit_agg(
revenue,
[marketing, seasonality, competitor],
true, true, 0.95
) as m
FROM sales_data
),
predictors AS (
SELECT * FROM (VALUES
(1, 'Intercept'),
(2, 'Marketing'),
(3, 'Seasonality'),
(4, 'Competitor')
) AS t(idx, name)
)
SELECT
p.name,
round(m.coefficients[p.idx], 4) as estimate,
round(m.std_errors[p.idx], 4) as std_error,
round(m.t_statistics[p.idx], 3) as t_stat,
round(m.p_values[p.idx], 4) as p_value,
CASE WHEN m.p_values[p.idx] < 0.001 THEN '***'
WHEN m.p_values[p.idx] < 0.01 THEN '**'
WHEN m.p_values[p.idx] < 0.05 THEN '*'
WHEN m.p_values[p.idx] < 0.1 THEN '.'
ELSE '' END as sig
FROM model, predictors p
ORDER BY p.idx;

Examples

Per-Product Price Elasticity

SELECT
product_id,
product_name,
(model).coefficients[2] as price_elasticity,
(model).r_squared as model_fit,
CASE
WHEN (model).p_values[2] < 0.05 THEN 'Significant'
ELSE 'Not Significant'
END as significance
FROM (
SELECT
product_id,
product_name,
anofox_stats_ols_fit_agg(
log(quantity_sold),
[log(price), log(competitor_price)]
) as model
FROM sales_data
GROUP BY product_id, product_name
)
ORDER BY abs((model).coefficients[2]) DESC;

Rolling Beta Estimation

SELECT
date,
stock_ticker,
(rolling_model).coefficients[2] as beta,
(rolling_model).r_squared as r_squared
FROM (
SELECT
date,
stock_ticker,
anofox_stats_ols_fit_predict(
stock_return,
[market_return]
) OVER (
PARTITION BY stock_ticker
ORDER BY date
ROWS BETWEEN 252 PRECEDING AND CURRENT ROW
) as rolling_model
FROM stock_returns
)
WHERE date >= '2024-01-01';

Regional A/B Test Analysis

SELECT
region,
(result).mean_diff as lift,
(result).p_value,
(result).cohens_d as effect_size,
CASE
WHEN (result).p_value < 0.05 AND (result).mean_diff > 0 THEN 'Winner'
WHEN (result).p_value < 0.05 AND (result).mean_diff < 0 THEN 'Loser'
ELSE 'Inconclusive'
END as decision
FROM (
SELECT
region,
anofox_stats_t_test_agg(conversion_rate, treatment_group) as result
FROM ab_test_data
GROUP BY region
);

Time-Varying Correlation

SELECT
date,
(corr).r as correlation,
(corr).p_value,
CASE
WHEN abs((corr).r) > 0.7 THEN 'Strong'
WHEN abs((corr).r) > 0.4 THEN 'Moderate'
ELSE 'Weak'
END as strength
FROM (
SELECT
date,
anofox_stats_pearson_agg(x, y) OVER (
ORDER BY date
ROWS BETWEEN 60 PRECEDING AND CURRENT ROW
) as corr
FROM paired_series
);

Type Reference

Common Output Types

FieldTypeDescription
coefficientsLIST(DOUBLE)Regression coefficients
std_errorsLIST(DOUBLE)Standard errors
t_statisticsLIST(DOUBLE)t-statistics
p_valuesLIST(DOUBLE)p-values
confidence_intervalsLIST(STRUCT)CI bounds per coefficient
r_squaredDOUBLER² (0-1)
adj_r_squaredDOUBLEAdjusted R²
rmseDOUBLERoot mean squared error
aicDOUBLEAkaike Information Criterion
bicDOUBLEBayesian Information Criterion
nBIGINTNumber of observations
kBIGINTNumber of predictors

Hypothesis Test Output Types

FieldTypeDescription
statisticDOUBLETest statistic
p_valueDOUBLEp-value
dfDOUBLE/BIGINTDegrees of freedom
effect_sizeDOUBLEEffect size measure
ci_lowerDOUBLECI lower bound
ci_upperDOUBLECI upper bound

🍪 Cookie Settings