Regularized Models
Penalty-based regression for multicollinearity and high-dimensional data: Ridge and Elastic Net.
Ridge - L2 Regularization
Handle multicollinearity by shrinking coefficients toward zero.
Variants
- Scalar Fit:
anofox_stats_ridge_fit(y, x, alpha, [options]) -> STRUCT - Aggregate Fit:
anofox_stats_ridge_fit_agg(y, x, alpha, [options]) -> STRUCT - Window Predict:
anofox_stats_ridge_fit_predict(y, x, [options]) OVER (...) -> STRUCT - Batch Predict:
anofox_stats_ridge_predict_agg(y, x, [options]) -> LIST(STRUCT)
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
y | LIST(DOUBLE) / DOUBLE | Yes | - | Target values |
x | LIST(LIST(DOUBLE)) / LIST(DOUBLE) | Yes | - | Predictor matrix |
alpha | DOUBLE | Yes | - | Regularization strength (0.01-10.0) |
options | MAP | No | - | Configuration options |
Options MAP:
| Option | Type | Default | Description |
|---|---|---|---|
fit_intercept | BOOLEAN | true | Include intercept term |
compute_inference | BOOLEAN | false | Compute inference statistics |
confidence_level | DOUBLE | 0.95 | Confidence level |
Example
SELECT
region,
(model).r_squared as fit,
(model).coefficients[2] as price_elasticity
FROM (
SELECT
region,
anofox_stats_ridge_fit_agg(
sales,
[price, promotion],
0.5,
MAP {'compute_inference': 'true'}
) as model
FROM regional_sales
GROUP BY region
);
When to use Ridge:
- VIF > 5 for any predictor
- More predictors than observations
- Coefficients unstable across samples
Elastic Net - Combined L1+L2
Feature selection with regularization for high-dimensional data.
Variants
- Scalar Fit:
anofox_stats_elasticnet_fit(y, x, alpha, l1_ratio, [options]) -> STRUCT - Aggregate Fit:
anofox_stats_elasticnet_fit_agg(y, x, alpha, l1_ratio, [options]) -> STRUCT - Window Predict:
anofox_stats_elasticnet_fit_predict(y, x, [options]) OVER (...) -> STRUCT - Batch Predict:
anofox_stats_elasticnet_predict_agg(y, x, [options]) -> LIST(STRUCT)
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
y | LIST(DOUBLE) / DOUBLE | Yes | - | Target values |
x | LIST(LIST(DOUBLE)) / LIST(DOUBLE) | Yes | - | Predictor matrix |
alpha | DOUBLE | Yes | - | Overall regularization (0.01-10.0) |
l1_ratio | DOUBLE | Yes | - | L1/L2 balance (0=Ridge, 1=Lasso) |
options | MAP | No | - | Configuration options |
Options MAP:
| Option | Type | Default | Description |
|---|---|---|---|
fit_intercept | BOOLEAN | true | Include intercept term |
max_iterations | INTEGER | 1000 | Convergence limit |
tolerance | DOUBLE | 1e-4 | Convergence threshold |
Example
SELECT anofox_stats_elasticnet_fit_agg(
y,
[x1, x2, x3, x4, x5],
0.5, -- alpha: regularization strength
0.7 -- l1_ratio: 70% L1, 30% L2
) as model
FROM high_dim_data;
When to use Elastic Net:
- High-dimensional data (many predictors)
- Feature selection needed (sparse solutions)
- Correlated predictors with variable selection