Linear Models
AnoFox implements 3 classical least squares variants -- OLS, WLS (Weighted), and RLS (Recursive) -- each available in 4 SQL integration patterns: scalar fit, aggregate fit with GROUP BY, window predict with OVER, and batch predict. OLS provides full inference output including coefficients, t-statistics, p-values, confidence intervals, R-squared, and information criteria (AIC/BIC). WLS handles heteroscedastic data where variance differs across observations. RLS enables real-time streaming analysis with a configurable forgetting factor between 0.9 and 1.0.
Linear regression models estimate the relationship between a dependent variable and one or more independent variables by fitting a linear equation to observed data. The three variants here differ in how they weight observations: OLS (Ordinary Least Squares) weights all observations equally, WLS (Weighted Least Squares) assigns different weights to handle heteroscedasticity, and RLS (Recursive Least Squares) adapts weights over time for streaming data.
OLS - Ordinary Least Squares
The statistical workhorse for linear regression with full inference support.
Variants
- Scalar Fit:
anofox_stats_ols_fit(y, x, [options]) -> STRUCT - Aggregate Fit:
anofox_stats_ols_fit_agg(y, x, [options]) -> STRUCT - Window Predict:
anofox_stats_ols_fit_predict(y, x, [options]) OVER (...) -> STRUCT - Batch Predict:
anofox_stats_ols_predict_agg(y, x, [options]) -> LIST(STRUCT)
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
y | LIST(DOUBLE) / DOUBLE | Yes | - | Target values |
x | LIST(LIST(DOUBLE)) / LIST(DOUBLE) | Yes | - | Predictor matrix |
options | MAP | No | - | Configuration options |
Options MAP:
| Option | Type | Default | Description |
|---|---|---|---|
fit_intercept | BOOLEAN | true | Include intercept term |
compute_inference | BOOLEAN | false | Compute t-stats, p-values, CIs |
confidence_level | DOUBLE | 0.95 | Confidence level for intervals |
Example
SELECT
(model).coefficients[2] as marketing_effect,
(model).p_values[2] as p_value,
(model).r_squared as fit
FROM (
SELECT anofox_stats_ols_fit_agg(
revenue,
[marketing_spend, seasonality_index],
MAP {
'fit_intercept': 'true',
'compute_inference': 'true',
'confidence_level': '0.95'
}
) as model
FROM sales_data
);
WLS - Weighted Least Squares
Weighted Least Squares is a regression method that assigns different weights to observations, giving more influence to observations with lower variance. Heteroscedasticity refers to the condition where the variance of residuals is not constant across observations, which violates an OLS assumption and can lead to inefficient estimates.
Variants
- Scalar Fit:
anofox_stats_wls_fit(y, x, weights, [options]) -> STRUCT - Aggregate Fit:
anofox_stats_wls_fit_agg(y, x, weight, [options]) -> STRUCT - Window Predict:
anofox_stats_wls_fit_predict(y, x, weight, [options]) OVER (...) -> STRUCT - Batch Predict:
anofox_stats_wls_predict_agg(y, x, weight, [options]) -> LIST(STRUCT)
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
y | LIST(DOUBLE) / DOUBLE | Yes | - | Target values |
x | LIST(LIST(DOUBLE)) / LIST(DOUBLE) | Yes | - | Predictor matrix |
weights / weight | LIST(DOUBLE) / DOUBLE | Yes | - | Observation weights |
options | MAP | No | - | Configuration options |
Options MAP:
| Option | Type | Default | Description |
|---|---|---|---|
fit_intercept | BOOLEAN | true | Include intercept term |
compute_inference | BOOLEAN | false | Compute inference statistics |
confidence_level | DOUBLE | 0.95 | Confidence level |
Example
SELECT anofox_stats_wls_fit_agg(
y,
[x1, x2],
sample_size,
MAP {'compute_inference': 'true'}
) as model
FROM aggregated_data;
When to use WLS:
- Residual variance changes with predictor values
- Aggregated data with different sample sizes
- Known measurement precision differences
RLS - Recursive Least Squares
Recursive Least Squares is an adaptive regression method that updates coefficient estimates incrementally as new observations arrive, without needing to re-fit the entire model. The forgetting factor controls how quickly old observations lose influence, making RLS suitable for streaming data and non-stationary relationships where the true coefficients change over time (concept drift).
Variants
- Scalar Fit:
anofox_stats_rls_fit(y, x, [options]) -> STRUCT - Aggregate Fit:
anofox_stats_rls_fit_agg(y, x, [options]) -> STRUCT - Window Predict:
anofox_stats_rls_fit_predict(y, x, [options]) OVER (...) -> STRUCT - Batch Predict:
anofox_stats_rls_predict_agg(y, x, [options]) -> LIST(STRUCT)
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
y | LIST(DOUBLE) / DOUBLE | Yes | - | Target values |
x | LIST(LIST(DOUBLE)) / LIST(DOUBLE) | Yes | - | Predictor matrix |
options | MAP | No | - | Configuration options |
Options MAP:
| Option | Type | Default | Description |
|---|---|---|---|
forgetting_factor | DOUBLE | 1.0 | Weight decay (0.9-1.0) |
fit_intercept | BOOLEAN | true | Include intercept term |
initial_p_diagonal | DOUBLE | 100.0 | Initial covariance diagonal |
Example
SELECT
timestamp,
(adaptive_model).coefficients[2] as current_beta
FROM (
SELECT
timestamp,
anofox_stats_rls_fit_predict(
y,
[x],
MAP {'forgetting_factor': '0.95'}
) OVER (
ORDER BY timestamp
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) as adaptive_model
FROM sensor_data
);
When to use RLS:
- Real-time model updates
- Non-stationary relationships
- Concept drift detection
The 3 linear model types cover distinct use cases in the S&OP pipeline: OLS is the standard choice for batch regression with full inference output. WLS is essential when aggregating data from sources with different sample sizes or measurement precision -- common when combining regional sales data where some regions have 10x more observations. RLS with a forgetting factor of 0.95 gives a 20-observation effective memory window, enabling real-time coefficient tracking for sensor data, pricing models, or any relationship that shifts over time.