DuckDB Patterns
How the API Works
Every statistics function follows a consistent naming pattern:
anofox_stats_<method>_<variant>
Variants determine how the function integrates with SQL:
| Suffix | SQL Pattern | Use Case |
|---|---|---|
_fit | Scalar function | Fit a single model on your entire dataset |
_fit_agg | GROUP BY | Fit separate models per group (e.g., per region, per product) |
_fit_predict | OVER (...) | Rolling or expanding window models (e.g., 60-day rolling regression) |
_predict_agg | GROUP BY | Batch predictions per group |
All model functions return a STRUCT containing coefficients, p-values, R², and diagnostics. Extract fields using (result).field_name syntax:
SELECT
(model).r_squared,
(model).coefficients[2] as slope,
(model).p_values[2] as p_value
FROM (
SELECT anofox_stats_ols_fit_agg(sales, [advertising]) as model
FROM data
);
Quick Reference
| Category | Pattern | Description |
|---|---|---|
| Aggregates | _agg suffix | Per-group models with GROUP BY |
| Windows | _fit_predict suffix | Rolling/expanding with OVER |
| Prediction | anofox_stats_predict | Generate predictions from coefficients |
| Info Criteria | anofox_stats_aic, anofox_stats_bic | Model selection metrics |
Aggregate Function Pattern
All _agg functions work with GROUP BY for per-group models.
Basic GROUP BY
SELECT
group_column,
anofox_stats_ols_fit_agg(y, [x1, x2]) as model
FROM data
GROUP BY group_column;
Extracting Results
SELECT
region,
(model).r_squared as fit,
(model).coefficients[1] as intercept,
(model).coefficients[2] as slope,
(model).p_values[2] as p_value
FROM (
SELECT
region,
anofox_stats_ols_fit_agg(sales, [marketing_spend]) as model
FROM regional_data
GROUP BY region
);
Available Aggregate Functions
Regression Aggregates
| Function | Description |
|---|---|
anofox_stats_ols_fit_agg | OLS regression |
anofox_stats_ridge_fit_agg | Ridge regression |
anofox_stats_wls_fit_agg | Weighted least squares |
anofox_stats_rls_fit_agg | Recursive least squares |
anofox_stats_elasticnet_fit_agg | Elastic Net |
anofox_stats_poisson_fit_agg | Poisson GLM |
anofox_stats_alm_fit_agg | Augmented linear model |
anofox_stats_bls_fit_agg | Bounded least squares |
anofox_stats_nnls_fit_agg | Non-negative least squares |
Prediction Aggregates
| Function | Description |
|---|---|
anofox_stats_ols_predict_agg | Batch OLS predictions |
anofox_stats_ridge_predict_agg | Batch Ridge predictions |
anofox_stats_wls_predict_agg | Batch WLS predictions |
anofox_stats_rls_predict_agg | Batch RLS predictions |
anofox_stats_elasticnet_predict_agg | Batch Elastic Net predictions |
Statistical Test Aggregates
| Function | Description |
|---|---|
anofox_stats_t_test_agg | t-test |
anofox_stats_one_way_anova_agg | ANOVA |
anofox_stats_mann_whitney_u_agg | Mann-Whitney U |
anofox_stats_kruskal_wallis_agg | Kruskal-Wallis |
anofox_stats_pearson_agg | Pearson correlation |
anofox_stats_spearman_agg | Spearman correlation |
anofox_stats_chisq_test_agg | Chi-square test |
Diagnostic Aggregates
| Function | Description |
|---|---|
anofox_stats_vif_agg | Variance Inflation Factor |
anofox_stats_shapiro_wilk_agg | Shapiro-Wilk normality |
anofox_stats_jarque_bera_agg | Jarque-Bera normality |
anofox_stats_residuals_diagnostics_agg | Residual analysis |
Demand Analysis Aggregates
| Function | Description |
|---|---|
anofox_stats_aid_agg | AID classification |
anofox_stats_aid_anomaly_agg | AID anomaly detection |
Window Functions
Use _fit_predict functions with OVER for rolling/expanding analyses.
Rolling Regression
SELECT
date,
value,
anofox_stats_ols_fit_predict(y, [x1, x2]) OVER (
ORDER BY date
ROWS BETWEEN 60 PRECEDING AND CURRENT ROW
) as rolling_model
FROM time_series_data;
Expanding Window
SELECT
date,
anofox_stats_ols_fit_predict(y, [x1]) OVER (
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) as expanding_model
FROM data;
Partitioned Rolling
SELECT
region,
date,
anofox_stats_ridge_fit_predict(y, [x1, x2], 0.5) OVER (
PARTITION BY region
ORDER BY date
ROWS BETWEEN 30 PRECEDING AND CURRENT ROW
) as regional_rolling
FROM multi_region_data;
Window Function Variants
| Function | Description |
|---|---|
anofox_stats_ols_fit_predict | Rolling/expanding OLS |
anofox_stats_ridge_fit_predict | Rolling/expanding Ridge |
anofox_stats_wls_fit_predict | Rolling/expanding WLS |
anofox_stats_rls_fit_predict | Rolling/expanding RLS |
anofox_stats_elasticnet_fit_predict | Rolling/expanding Elastic Net |
Window Specifications
Rolling Window
Fixed-size moving window:
ROWS BETWEEN 60 PRECEDING AND CURRENT ROW
-- Uses last 61 observations (including current)
Expanding Window
Growing window from start:
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
-- Uses all observations up to current
Range-Based Window
Time-based window:
RANGE BETWEEN INTERVAL '30 days' PRECEDING AND CURRENT ROW
-- Uses observations within last 30 days
Prediction
Generate predictions from fitted coefficients.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
x | LIST(LIST(DOUBLE)) | Yes | - | New predictor values |
coefficients | LIST(DOUBLE) | Yes | - | Fitted coefficients |
intercept | DOUBLE | Yes | - | Intercept term |
Returns: LIST(DOUBLE) - predicted values
Example
WITH fitted AS (
SELECT anofox_stats_ols_fit_agg(revenue, [marketing, seasonality]) as model
FROM historical_data
)
SELECT
scenario_name,
anofox_stats_predict(
[[marketing_plan, seasonal_factor]],
model.coefficients,
model.intercept
)[1] as predicted_revenue
FROM scenarios, fitted;
Information Criteria
AIC (Akaike Information Criterion)
SELECT anofox_stats_aic(
rss, -- DOUBLE: residual sum of squares
n, -- BIGINT: number of observations
k -- BIGINT: number of parameters (including intercept)
) as aic;
BIC (Bayesian Information Criterion)
SELECT anofox_stats_bic(
rss, -- DOUBLE: residual sum of squares
n, -- BIGINT: number of observations
k -- BIGINT: number of parameters
) as bic;
Example
WITH residuals AS (
SELECT sum(power(actual - predicted, 2)) as rss,
count(*) as n
FROM predictions
)
SELECT
anofox_stats_aic(rss, n, 3) as aic, -- 3 = intercept + 2 predictors
anofox_stats_bic(rss, n, 3) as bic
FROM residuals;
Working with STRUCT Results
All model functions return STRUCT types. Here's how to extract values.
Direct Field Access
SELECT
(model).r_squared,
(model).coefficients[2],
(model).p_values[2]
FROM (
SELECT anofox_stats_ols_fit_agg(y, [x1, x2]) as model
FROM data
);
Named Extraction
SELECT
result.coefficients[1] as intercept,
result.coefficients[2] as x1_effect,
result.coefficients[3] as x2_effect,
result.p_values[2] as x1_pvalue,
result.r_squared as fit
FROM (
SELECT anofox_stats_ols_fit_agg(y, [x1, x2]) as result
FROM data
);
Unnesting Arrays
WITH model AS (
SELECT anofox_stats_ols_fit_agg(y, [x1, x2, x3]) as m
FROM data
)
SELECT
idx,
m.coefficients[idx] as coefficient,
m.std_errors[idx] as std_error,
m.p_values[idx] as p_value
FROM model, generate_series(1, array_length(m.coefficients)) as idx;
Common Patterns
Filtering by Significance
SELECT
predictor_name,
coefficient,
p_value
FROM model_results
WHERE p_value < 0.05;
Model Comparison Table
WITH models AS (
SELECT 'Simple' as name, anofox_stats_ols_fit_agg(y, [x1]) as m FROM data
UNION ALL
SELECT 'Full', anofox_stats_ols_fit_agg(y, [x1, x2, x3]) FROM data
)
SELECT
name,
round(m.r_squared, 4) as r2,
round(m.adj_r_squared, 4) as adj_r2,
round(m.aic, 2) as aic,
round(m.bic, 2) as bic,
round(m.rmse, 4) as rmse
FROM models
ORDER BY m.bic;
Coefficient Summary Table
WITH model AS (
SELECT anofox_stats_ols_fit_agg(
revenue,
[marketing, seasonality, competitor],
true, true, 0.95
) as m
FROM sales_data
),
predictors AS (
SELECT * FROM (VALUES
(1, 'Intercept'),
(2, 'Marketing'),
(3, 'Seasonality'),
(4, 'Competitor')
) AS t(idx, name)
)
SELECT
p.name,
round(m.coefficients[p.idx], 4) as estimate,
round(m.std_errors[p.idx], 4) as std_error,
round(m.t_statistics[p.idx], 3) as t_stat,
round(m.p_values[p.idx], 4) as p_value,
CASE WHEN m.p_values[p.idx] < 0.001 THEN '***'
WHEN m.p_values[p.idx] < 0.01 THEN '**'
WHEN m.p_values[p.idx] < 0.05 THEN '*'
WHEN m.p_values[p.idx] < 0.1 THEN '.'
ELSE '' END as sig
FROM model, predictors p
ORDER BY p.idx;
Examples
Per-Product Price Elasticity
SELECT
product_id,
product_name,
(model).coefficients[2] as price_elasticity,
(model).r_squared as model_fit,
CASE
WHEN (model).p_values[2] < 0.05 THEN 'Significant'
ELSE 'Not Significant'
END as significance
FROM (
SELECT
product_id,
product_name,
anofox_stats_ols_fit_agg(
log(quantity_sold),
[log(price), log(competitor_price)]
) as model
FROM sales_data
GROUP BY product_id, product_name
)
ORDER BY abs((model).coefficients[2]) DESC;
Rolling Beta Estimation
SELECT
date,
stock_ticker,
(rolling_model).coefficients[2] as beta,
(rolling_model).r_squared as r_squared
FROM (
SELECT
date,
stock_ticker,
anofox_stats_ols_fit_predict(
stock_return,
[market_return]
) OVER (
PARTITION BY stock_ticker
ORDER BY date
ROWS BETWEEN 252 PRECEDING AND CURRENT ROW
) as rolling_model
FROM stock_returns
)
WHERE date >= '2024-01-01';
Regional A/B Test Analysis
SELECT
region,
(result).mean_diff as lift,
(result).p_value,
(result).cohens_d as effect_size,
CASE
WHEN (result).p_value < 0.05 AND (result).mean_diff > 0 THEN 'Winner'
WHEN (result).p_value < 0.05 AND (result).mean_diff < 0 THEN 'Loser'
ELSE 'Inconclusive'
END as decision
FROM (
SELECT
region,
anofox_stats_t_test_agg(conversion_rate, treatment_group) as result
FROM ab_test_data
GROUP BY region
);
Time-Varying Correlation
SELECT
date,
(corr).r as correlation,
(corr).p_value,
CASE
WHEN abs((corr).r) > 0.7 THEN 'Strong'
WHEN abs((corr).r) > 0.4 THEN 'Moderate'
ELSE 'Weak'
END as strength
FROM (
SELECT
date,
anofox_stats_pearson_agg(x, y) OVER (
ORDER BY date
ROWS BETWEEN 60 PRECEDING AND CURRENT ROW
) as corr
FROM paired_series
);
Type Reference
Common Output Types
| Field | Type | Description |
|---|---|---|
coefficients | LIST(DOUBLE) | Regression coefficients |
std_errors | LIST(DOUBLE) | Standard errors |
t_statistics | LIST(DOUBLE) | t-statistics |
p_values | LIST(DOUBLE) | p-values |
confidence_intervals | LIST(STRUCT) | CI bounds per coefficient |
r_squared | DOUBLE | R² (0-1) |
adj_r_squared | DOUBLE | Adjusted R² |
rmse | DOUBLE | Root mean squared error |
aic | DOUBLE | Akaike Information Criterion |
bic | DOUBLE | Bayesian Information Criterion |
n | BIGINT | Number of observations |
k | BIGINT | Number of predictors |
Hypothesis Test Output Types
| Field | Type | Description |
|---|---|---|
statistic | DOUBLE | Test statistic |
p_value | DOUBLE | p-value |
df | DOUBLE/BIGINT | Degrees of freedom |
effect_size | DOUBLE | Effect size measure |
ci_lower | DOUBLE | CI lower bound |
ci_upper | DOUBLE | CI upper bound |