Skip to main content

Regression Model Types

Compare OLS, Ridge, WLS, RLS, and Elastic Net—understand when to use each.

1. OLS (Ordinary Least Squares)

The Foundation - Linear regression workhorse

SELECT * FROM anofox_statistics_ols(
'sales_data',
'revenue',
ARRAY['marketing_spend', 'team_size']
);

Properties:

  • ✅ BLUE (Best Linear Unbiased Estimator) under Gauss-Markov assumptions
  • ✅ Simple, interpretable, fast
  • ✅ Standard statistical inference (t-tests, p-values)
  • ❌ Sensitive to multicollinearity
  • ❌ Sensitive to outliers

When to use:

  • First choice for most problems
  • Clean data without extreme outliers
  • Few correlated predictors

2. Ridge Regression

Handle Multicollinearity - L2 Regularization

SELECT * FROM anofox_statistics_ridge(
'sales_data',
'revenue',
ARRAY['marketing_spend', 'team_size'],
MAP_CREATE(ARRAY['lambda'], ARRAY['0.5'])
);

Properties:

  • ✅ Handles correlated predictors
  • ✅ Reduces overfitting
  • ✅ Stable coefficient estimates
  • ❌ Coefficients biased (but lower variance)
  • ❌ Less interpretable than OLS

When to use:

  • VIF > 10 (multicollinearity detected)
  • Many predictors relative to observations
  • Prediction priority over interpretation

3. WLS (Weighted Least Squares)

Handle Heteroscedasticity - Variable Reliability

SELECT * FROM anofox_statistics_wls(
'sales_data',
'revenue',
ARRAY['marketing_spend'],
'reliability_weight'
);

Properties:

  • ✅ Handles unequal variance
  • ✅ Emphasizes reliable observations
  • ✅ More efficient than OLS with heteroscedastic data
  • ❌ Requires weight specification
  • ❌ More complex interpretation

When to use:

  • Data quality varies across observations
  • Financial data (volatility clustering)
  • When some measurements are more reliable

4. RLS (Recursive Least Squares)

Online Learning - Adaptive Updates

SELECT * FROM anofox_statistics_rls(
'streaming_data',
'revenue',
ARRAY['marketing_spend'],
MAP_CREATE(ARRAY['forgetting_factor'], ARRAY['0.98'])
);

Properties:

  • ✅ Real-time streaming updates
  • ✅ Captures concept drift
  • ✅ Memory efficient
  • ❌ Requires initial window
  • ❌ Less statistical precision than batch

When to use:

  • Streaming data (online learning)
  • Changing relationships over time
  • Real-time model updates

5. Elastic Net

Combined Regularization - L1 + L2

SELECT * FROM anofox_statistics_elastic_net(
'sales_data',
'revenue',
ARRAY['marketing_spend', 'team_size', 'has_campaign'],
MAP_CREATE(
ARRAY['alpha', 'lambda'],
ARRAY['0.5', '0.1']
)
);

Properties:

  • ✅ Feature selection (like LASSO)
  • ✅ Handles multicollinearity (like Ridge)
  • ✅ Automatic variable selection
  • ❌ More complex tuning
  • ❌ Interpretability reduced

When to use:

  • High-dimensional data (many predictors)
  • Want automatic feature selection
  • Combination of ridge and LASSO benefits

Model Comparison Matrix

FeatureOLSRidgeWLSRLSElastic Net
Multicollinearity
Outlier Robust
Feature Selection
Streaming Data
Interpretability⚠️⚠️
Speed⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡
Inference⚠️⚠️

Decision Tree

START: Choosing a Regression Model

├─ Is data streaming/online?
│ └─ YES → Use RLS (Recursive Least Squares)
│ └─ NO → Continue

├─ Do you have many correlated predictors?
│ └─ YES → Use Ridge or Elastic Net
│ └─ NO → Continue

├─ Is variance unequal across observations?
│ └─ YES → Use WLS (Weighted Least Squares)
│ └─ NO → Continue

├─ Want automatic feature selection?
│ └─ YES → Use Elastic Net
│ └─ NO → Use OLS (default)

Example: Choosing for Marketing ROI

Scenario: Model revenue from 5 marketing channels + team size

Check 1: Multicollinearity?

SELECT * FROM anofox_statistics_vif(
'marketing_data',
ARRAY['channel1', 'channel2', 'channel3', 'channel4', 'channel5', 'team_size']
);
-- If any VIF > 10 → Use Ridge

Check 2: Outliers?

SELECT COUNT(*) FROM marketing_data
WHERE revenue > (SELECT AVG(revenue) + 3*STDDEV(revenue) FROM marketing_data);
-- If many outliers → Use WLS with outlier weights

Decision: VIF=12 (Ridge) + No outliers (don't need WLS)

SELECT * FROM anofox_statistics_ridge(...);

Next Steps

🍪 Cookie Settings