Skip to main content

Generalized Linear Models

AnoFox provides 2 generalized model types: Poisson GLM for count data (defects, arrivals, event counts) and the Augmented Linear Model (ALM) supporting 24 error distributions including Student-t, Cauchy, Laplace, Huber, Weibull, Gamma, and Log-Normal. Poisson GLM coefficients are on the log scale -- exp(coefficient) gives the rate ratio for a one-unit change in the predictor. ALM is the go-to choice when residuals violate the normality assumption, offering robust estimation that resists outlier contamination.

A Generalized Linear Model (GLM) extends ordinary linear regression to handle response variables that follow non-Gaussian distributions, such as counts, binary outcomes, or positive continuous values. It uses a link function to connect the linear predictor to the expected value of the response. The GLM framework unifies regression for exponential family distributions.


Poisson GLM

The Poisson GLM is a generalized linear model that uses a log link function and Poisson distribution for modeling count data -- non-negative integers such as event counts, defects per unit, or arrivals per hour.

Parameters

ParameterTypeRequiredDefaultDescription
yDOUBLEYes-Count target (non-negative integers)
xLIST(DOUBLE)Yes-Predictors
optionsMAPNo-fit_intercept, max_iterations, tolerance

Example

SELECT
production_line,
(model).coefficients[2] as temperature_effect,
exp((model).coefficients[2]) as rate_ratio,
(model).p_values[2] as pvalue
FROM (
SELECT
production_line,
anofox_stats_poisson_fit_agg(
defect_count,
[temperature, humidity, shift_hours]
) as model
FROM quality_data
GROUP BY production_line
);

Interpretation: Coefficients are on log scale. exp(coefficient) = rate ratio (multiplicative effect).


ALM - Augmented Linear Model

The Augmented Linear Model (ALM) is a robust regression framework that replaces the Gaussian error assumption with any of 24 alternative distributions. By fitting the model under the correct error distribution (e.g., Student-t for heavy tails, Huber for outlier resistance, Weibull for survival data), ALM produces more efficient and reliable estimates than OLS when the normality assumption is violated.

Parameters

ParameterTypeRequiredDefaultDescription
yDOUBLEYes-Target values
xLIST(DOUBLE)Yes-Predictors
optionsMAPYes-distribution, fit_intercept, max_iterations, tolerance

Supported Distributions

DistributionUse Case
normalStandard regression
student_tHeavy tails, outliers
cauchyExtreme outliers
laplaceLAD (median) regression
huberRobust with breakdown
weibullSurvival, reliability
gammaPositive, right-skewed
log_normalMultiplicative errors

Example

SELECT anofox_stats_alm_fit_agg(
revenue,
[marketing_spend, competitor_activity],
MAP {
'distribution': 'student_t',
'fit_intercept': 'true'
}
) as model
FROM sales_with_outliers;

When to use ALM:

  • Data has heavy tails or outliers
  • Non-normal error distributions
  • Robust estimation needed

🍪 Cookie Settings