Regression Model Types
Compare OLS, Ridge, WLS, RLS, and Elastic Net—understand when to use each.
1. OLS (Ordinary Least Squares)
The Foundation - Linear regression workhorse
SELECT * FROM anofox_statistics_ols(
'sales_data',
'revenue',
ARRAY['marketing_spend', 'team_size']
);
Properties:
- ✅ BLUE (Best Linear Unbiased Estimator) under Gauss-Markov assumptions
- ✅ Simple, interpretable, fast
- ✅ Standard statistical inference (t-tests, p-values)
- ❌ Sensitive to multicollinearity
- ❌ Sensitive to outliers
When to use:
- First choice for most problems
- Clean data without extreme outliers
- Few correlated predictors
2. Ridge Regression
Handle Multicollinearity - L2 Regularization
SELECT * FROM anofox_statistics_ridge(
'sales_data',
'revenue',
ARRAY['marketing_spend', 'team_size'],
MAP_CREATE(ARRAY['lambda'], ARRAY['0.5'])
);
Properties:
- ✅ Handles correlated predictors
- ✅ Reduces overfitting
- ✅ Stable coefficient estimates
- ❌ Coefficients biased (but lower variance)
- ❌ Less interpretable than OLS
When to use:
- VIF > 10 (multicollinearity detected)
- Many predictors relative to observations
- Prediction priority over interpretation
3. WLS (Weighted Least Squares)
Handle Heteroscedasticity - Variable Reliability
SELECT * FROM anofox_statistics_wls(
'sales_data',
'revenue',
ARRAY['marketing_spend'],
'reliability_weight'
);
Properties:
- ✅ Handles unequal variance
- ✅ Emphasizes reliable observations
- ✅ More efficient than OLS with heteroscedastic data
- ❌ Requires weight specification
- ❌ More complex interpretation
When to use:
- Data quality varies across observations
- Financial data (volatility clustering)
- When some measurements are more reliable
4. RLS (Recursive Least Squares)
Online Learning - Adaptive Updates
SELECT * FROM anofox_statistics_rls(
'streaming_data',
'revenue',
ARRAY['marketing_spend'],
MAP_CREATE(ARRAY['forgetting_factor'], ARRAY['0.98'])
);
Properties:
- ✅ Real-time streaming updates
- ✅ Captures concept drift
- ✅ Memory efficient
- ❌ Requires initial window
- ❌ Less statistical precision than batch
When to use:
- Streaming data (online learning)
- Changing relationships over time
- Real-time model updates
5. Elastic Net
Combined Regularization - L1 + L2
SELECT * FROM anofox_statistics_elastic_net(
'sales_data',
'revenue',
ARRAY['marketing_spend', 'team_size', 'has_campaign'],
MAP_CREATE(
ARRAY['alpha', 'lambda'],
ARRAY['0.5', '0.1']
)
);
Properties:
- ✅ Feature selection (like LASSO)
- ✅ Handles multicollinearity (like Ridge)
- ✅ Automatic variable selection
- ❌ More complex tuning
- ❌ Interpretability reduced
When to use:
- High-dimensional data (many predictors)
- Want automatic feature selection
- Combination of ridge and LASSO benefits
Model Comparison Matrix
| Feature | OLS | Ridge | WLS | RLS | Elastic Net |
|---|---|---|---|---|---|
| Multicollinearity | ❌ | ✅ | ❌ | ❌ | ✅ |
| Outlier Robust | ❌ | ✅ | ✅ | ❌ | ✅ |
| Feature Selection | ❌ | ❌ | ❌ | ❌ | ✅ |
| Streaming Data | ❌ | ❌ | ❌ | ✅ | ❌ |
| Interpretability | ✅ | ⚠️ | ✅ | ✅ | ⚠️ |
| Speed | ⚡⚡⚡ | ⚡⚡ | ⚡⚡ | ⚡⚡⚡ | ⚡ |
| Inference | ✅ | ⚠️ | ✅ | ✅ | ⚠️ |
Decision Tree
START: Choosing a Regression Model
│
├─ Is data streaming/online?
│ └─ YES → Use RLS (Recursive Least Squares)
│ └─ NO → Continue
│
├─ Do you have many correlated predictors?
│ └─ YES → Use Ridge or Elastic Net
│ └─ NO → Continue
│
├─ Is variance unequal across observations?
│ └─ YES → Use WLS (Weighted Least Squares)
│ └─ NO → Continue
│
├─ Want automatic feature selection?
│ └─ YES → Use Elastic Net
│ └─ NO → Use OLS (default)
Example: Choosing for Marketing ROI
Scenario: Model revenue from 5 marketing channels + team size
Check 1: Multicollinearity?
SELECT * FROM anofox_statistics_vif(
'marketing_data',
ARRAY['channel1', 'channel2', 'channel3', 'channel4', 'channel5', 'team_size']
);
-- If any VIF > 10 → Use Ridge
Check 2: Outliers?
SELECT COUNT(*) FROM marketing_data
WHERE revenue > (SELECT AVG(revenue) + 3*STDDEV(revenue) FROM marketing_data);
-- If many outliers → Use WLS with outlier weights
Decision: VIF=12 (Ridge) + No outliers (don't need WLS)
SELECT * FROM anofox_statistics_ridge(...);
Next Steps
- Inference & Testing — Statistical significance
- Basic Workflow — Hands-on example
- Handling Multicollinearity — Ridge in practice