Skip to main content

AnoFox Statistics Extension

In-Database Regression Analysis & Statistical Inference in SQL

Build predictive models directly in DuckDB with 5 regression types, comprehensive diagnostics, and hypothesis testing—zero Python overhead.

30-Second Example

-- Load the extension
LOAD anofox_statistics;

-- Simple linear regression
SELECT
coefficient,
std_error,
t_statistic,
p_value,
r_squared,
rmse
FROM anofox_statistics_ols(
'sales_data',
'revenue',
ARRAY['marketing_spend', 'team_size']
);

Output: Coefficients with statistical significance tests, goodness-of-fit metrics, and standard errors.


What's Included

ComponentCoverageHighlights
Regression Models5 typesOLS, Ridge, WLS, RLS, Elastic Net
Inference10+ functionst-tests, p-values, confidence intervals, prediction intervals
Diagnostics5 typesResiduals, VIF, normality tests, outlier detection, AIC/BIC
AggregatesGROUP BY & OVERPer-group models, rolling regression, expanding windows
UtilitiesBasic metricsR², RMSE, MSE, information criteria

Function Finder

What I Want to DoFunction to Use

GoalFunctionGuide
Fit simple linear regressionanofox_statistics_olsBasic Workflow
Test coefficient significanceanofox_statistics_ols_inferenceInference & Testing
Detect multicollinearityanofox_statistics_vifHandling Multicollinearity
Make predictions with uncertaintyanofox_statistics_ols_predict_intervalPrediction Intervals
Analyze per segment/groupanofox_statistics_ols_aggGrouped Analysis
Rolling regression over timeanofox_statistics_ols_agg OVER (...)Rolling Regression
Handle correlated predictorsanofox_statistics_ridgeModel Types
Adaptive online learninganofox_statistics_rlsModel Types
Select best modelanofox_statistics_information_criteriaModel Selection
Check regression assumptionsDiagnostics functionsDiagnostics

Why AnoFox Statistics?

Native In-Database Processing

  • No data export/import cycles
  • Direct SQL integration with your data pipeline
  • Zero Python dependency overhead

Production-Ready Algorithms

  • OLS: The statistical workhorse (BLUE property guaranteed)
  • Ridge: Handle multicollinearity with L2 regularization
  • WLS: Heteroscedastic data with weighted observations
  • RLS: Online adaptive learning with forgetting factor
  • Elastic Net: Combined L1+L2 for feature selection

Comprehensive Statistical Inference

  • Coefficient significance testing (t-tests, p-values)
  • Confidence intervals for parameters
  • Prediction intervals (individual vs. mean)
  • Hypothesis testing framework built-in
  • Multiple comparison corrections

Enterprise Diagnostics

  • Residual analysis (leverage, Cook's distance)
  • Multicollinearity detection (VIF)
  • Normality testing (Jarque-Bera)
  • Information criteria (AIC, BIC)
  • Per-group analysis via GROUP BY

Getting Started Paths

👤 New to Statistics?

Start with Understanding RegressionQuickstartBasic Workflow

👨‍💼 Business Analyst

QuickstartBasic WorkflowProduction Deployment

👨‍🔬 Data Scientist

Model TypesGrouped AnalysisAdvanced Guides

🏭 Production Focus

InstallationQuickstartProduction Deployment


Key Concepts at a Glance

Linear Regression Equation

y = β₀ + β₁x₁ + β₂x₂ + ... + βₚxₚ + ε
  • β₀: Intercept
  • βⱼ: Coefficient (marginal effect of xⱼ)
  • ε: Error term (unobserved noise)

R² (Coefficient of Determination)

Proportion of variance explained: 0 (no fit) to 1 (perfect fit)

p-Value

Probability of observing this coefficient under null hypothesis (H₀: β = 0). p < 0.05 = significant.

Prediction Intervals vs. Confidence Intervals

  • Confidence Interval: Uncertainty about the mean prediction
  • Prediction Interval: Wider; includes individual variation

Multicollinearity

Correlated predictors inflate standard errors, weaken inference. VIF > 10 = problematic.


Next Steps


Key Takeaways

  • ✅ 5 regression models for different data types
  • ✅ Full statistical inference (tests, intervals, significance)
  • ✅ Comprehensive diagnostics (residuals, VIF, normality)
  • ✅ GROUP BY and window functions for complex analyses
  • ✅ Production-grade in-database processing
  • ✅ Zero Python/R integration friction
🍪 Cookie Settings