Skip to main content

Linear Models

AnoFox implements 3 classical least squares variants -- OLS, WLS (Weighted), and RLS (Recursive) -- each available in 4 SQL integration patterns: scalar fit, aggregate fit with GROUP BY, window predict with OVER, and batch predict. OLS provides full inference output including coefficients, t-statistics, p-values, confidence intervals, R-squared, and information criteria (AIC/BIC). WLS handles heteroscedastic data where variance differs across observations. RLS enables real-time streaming analysis with a configurable forgetting factor between 0.9 and 1.0.

Linear regression models estimate the relationship between a dependent variable and one or more independent variables by fitting a linear equation to observed data. The three variants here differ in how they weight observations: OLS (Ordinary Least Squares) weights all observations equally, WLS (Weighted Least Squares) assigns different weights to handle heteroscedasticity, and RLS (Recursive Least Squares) adapts weights over time for streaming data.


OLS - Ordinary Least Squares

The statistical workhorse for linear regression with full inference support.

Variants

  • Scalar Fit: anofox_stats_ols_fit(y, x, [options]) -> STRUCT
  • Aggregate Fit: anofox_stats_ols_fit_agg(y, x, [options]) -> STRUCT
  • Window Predict: anofox_stats_ols_fit_predict(y, x, [options]) OVER (...) -> STRUCT
  • Batch Predict: anofox_stats_ols_predict_agg(y, x, [options]) -> LIST(STRUCT)

Parameters

ParameterTypeRequiredDefaultDescription
yLIST(DOUBLE) / DOUBLEYes-Target values
xLIST(LIST(DOUBLE)) / LIST(DOUBLE)Yes-Predictor matrix
optionsMAPNo-Configuration options

Options MAP:

OptionTypeDefaultDescription
fit_interceptBOOLEANtrueInclude intercept term
compute_inferenceBOOLEANfalseCompute t-stats, p-values, CIs
confidence_levelDOUBLE0.95Confidence level for intervals

Example

SELECT
(model).coefficients[2] as marketing_effect,
(model).p_values[2] as p_value,
(model).r_squared as fit
FROM (
SELECT anofox_stats_ols_fit_agg(
revenue,
[marketing_spend, seasonality_index],
MAP {
'fit_intercept': 'true',
'compute_inference': 'true',
'confidence_level': '0.95'
}
) as model
FROM sales_data
);

WLS - Weighted Least Squares

Weighted Least Squares is a regression method that assigns different weights to observations, giving more influence to observations with lower variance. Heteroscedasticity refers to the condition where the variance of residuals is not constant across observations, which violates an OLS assumption and can lead to inefficient estimates.

Variants

  • Scalar Fit: anofox_stats_wls_fit(y, x, weights, [options]) -> STRUCT
  • Aggregate Fit: anofox_stats_wls_fit_agg(y, x, weight, [options]) -> STRUCT
  • Window Predict: anofox_stats_wls_fit_predict(y, x, weight, [options]) OVER (...) -> STRUCT
  • Batch Predict: anofox_stats_wls_predict_agg(y, x, weight, [options]) -> LIST(STRUCT)

Parameters

ParameterTypeRequiredDefaultDescription
yLIST(DOUBLE) / DOUBLEYes-Target values
xLIST(LIST(DOUBLE)) / LIST(DOUBLE)Yes-Predictor matrix
weights / weightLIST(DOUBLE) / DOUBLEYes-Observation weights
optionsMAPNo-Configuration options

Options MAP:

OptionTypeDefaultDescription
fit_interceptBOOLEANtrueInclude intercept term
compute_inferenceBOOLEANfalseCompute inference statistics
confidence_levelDOUBLE0.95Confidence level

Example

SELECT anofox_stats_wls_fit_agg(
y,
[x1, x2],
sample_size,
MAP {'compute_inference': 'true'}
) as model
FROM aggregated_data;

When to use WLS:

  • Residual variance changes with predictor values
  • Aggregated data with different sample sizes
  • Known measurement precision differences

RLS - Recursive Least Squares

Recursive Least Squares is an adaptive regression method that updates coefficient estimates incrementally as new observations arrive, without needing to re-fit the entire model. The forgetting factor controls how quickly old observations lose influence, making RLS suitable for streaming data and non-stationary relationships where the true coefficients change over time (concept drift).

Variants

  • Scalar Fit: anofox_stats_rls_fit(y, x, [options]) -> STRUCT
  • Aggregate Fit: anofox_stats_rls_fit_agg(y, x, [options]) -> STRUCT
  • Window Predict: anofox_stats_rls_fit_predict(y, x, [options]) OVER (...) -> STRUCT
  • Batch Predict: anofox_stats_rls_predict_agg(y, x, [options]) -> LIST(STRUCT)

Parameters

ParameterTypeRequiredDefaultDescription
yLIST(DOUBLE) / DOUBLEYes-Target values
xLIST(LIST(DOUBLE)) / LIST(DOUBLE)Yes-Predictor matrix
optionsMAPNo-Configuration options

Options MAP:

OptionTypeDefaultDescription
forgetting_factorDOUBLE1.0Weight decay (0.9-1.0)
fit_interceptBOOLEANtrueInclude intercept term
initial_p_diagonalDOUBLE100.0Initial covariance diagonal

Example

SELECT
timestamp,
(adaptive_model).coefficients[2] as current_beta
FROM (
SELECT
timestamp,
anofox_stats_rls_fit_predict(
y,
[x],
MAP {'forgetting_factor': '0.95'}
) OVER (
ORDER BY timestamp
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) as adaptive_model
FROM sensor_data
);

When to use RLS:

  • Real-time model updates
  • Non-stationary relationships
  • Concept drift detection

The 3 linear model types cover distinct use cases in the S&OP pipeline: OLS is the standard choice for batch regression with full inference output. WLS is essential when aggregating data from sources with different sample sizes or measurement precision -- common when combining regional sales data where some regions have 10x more observations. RLS with a forgetting factor of 0.95 gives a 20-observation effective memory window, enabling real-time coefficient tracking for sensor data, pricing models, or any relationship that shifts over time.


🍪 Cookie Settings