Feature Engineering & Transformations
Create and engineer features for advanced time-series analysis and machine learning.
Lag Features
Create lagged versions of your series for autoregressive models.
TS_LAG
Create lagged features (past values)
SELECT
date,
value,
TS_LAG(value, 1) OVER (ORDER BY date) as lag_1,
TS_LAG(value, 7) OVER (ORDER BY date) as lag_7,
TS_LAG(value, 14) OVER (ORDER BY date) as lag_14
FROM sales_data;
Use for:
- ARIMA/ARIMAX models
- Feature engineering
- Autoregressive analysis
Parameters:
n: Number of periods to lag (1, 7, 14, etc.)
Rolling Statistics
Calculate statistics over sliding windows.
TS_ROLLING_MEAN
Moving average over window
SELECT
date,
value,
TS_ROLLING_MEAN(value, 7) OVER (ORDER BY date) as ma_7,
TS_ROLLING_MEAN(value, 30) OVER (ORDER BY date) as ma_30
FROM sales_data;
Use for:
- Trend identification
- Noise reduction
- Feature engineering
TS_ROLLING_STD
Rolling standard deviation (volatility)
SELECT
date,
value,
TS_ROLLING_STD(value, 7) OVER (ORDER BY date) as std_7,
TS_ROLLING_STD(value, 14) OVER (ORDER BY date) as std_14
FROM sales_data;
Use for:
- Volatility measurement
- Confidence interval width
- Risk assessment
TS_ROLLING_MIN / TS_ROLLING_MAX
Rolling minimum and maximum
SELECT
date,
value,
TS_ROLLING_MIN(value, 7) OVER (ORDER BY date) as min_7,
TS_ROLLING_MAX(value, 7) OVER (ORDER BY date) as max_7,
TS_ROLLING_MAX(value, 7) - TS_ROLLING_MIN(value, 7) as range_7
FROM sales_data;
Use for:
- Range tracking
- Support/resistance levels
- Anomaly detection
Seasonal Features
Extract seasonal patterns as features.
TS_FOURIER
Fourier terms for seasonality
SELECT
date,
TS_FOURIER(date, 7, 2) OVER () as fourier_weekly,
TS_FOURIER(date, 365, 3) OVER () as fourier_yearly
FROM calendar_dates;
Use for:
- Capturing seasonal patterns
- Machine learning models
- Non-linear seasonality
Parameters:
period: 7 (weekly), 365 (yearly), etc.K: Number of Fourier terms (higher = more detail)
TS_SEASONAL_DUMMY
Dummy variables for seasons
SELECT
date,
MONTH(date) as month,
TS_SEASONAL_DUMMY(date, 12) as is_dec_or_jan
FROM sales_data;
Use for:
- Categorical seasonal indicators
- Holiday dummies
- Month/quarter dummies
Trend Features
Extract and model trend components.
TS_TREND
Extract trend line
SELECT
date,
value,
TS_TREND(value) OVER (ORDER BY date) as trend
FROM sales_data;
Use for:
- Trend identification
- Detrending
- Growth analysis
TS_DETREND
Remove trend from series
SELECT
date,
value,
TS_DETREND(value) OVER (ORDER BY date) as detrended
FROM sales_data;
Use for:
- Removing trend before seasonality analysis
- Stationarizing data
- Pure seasonal component
Statistical Features
Extract statistical characteristics of the series.
TS_TSFRESH_FEATURES
Automatically calculate 76+ statistical features
SELECT * FROM TS_TSFRESH_FEATURES(
'sales_data',
'date',
'sales',
{
'features': 'all', -- or specific list
'rolling_windows': [7, 14, 30],
'lags': [1, 7, 14]
}
);
Returns 76+ features including:
- Mean, std, min, max
- Lag autocorrelation
- Trend strength
- Seasonality strength
- Entropy, complexity
- Many more...
Use for:
- Machine learning models
- Feature engineering at scale
- Automatic feature extraction
TS_AUTOCORR
Autocorrelation at specific lag
SELECT
CAST(TS_AUTOCORR(sales, 1) AS DOUBLE) as acf_lag1,
CAST(TS_AUTOCORR(sales, 7) AS DOUBLE) as acf_lag7,
CAST(TS_AUTOCORR(sales, 14) AS DOUBLE) as acf_lag14
FROM sales_data;
Use for:
- Seasonality detection
- Dependency analysis
- Model diagnostics
Transformations
Transform data for better model performance.
TS_DIFFERENCE
First differences (detrending)
SELECT
date,
value,
TS_DIFFERENCE(value, 1) OVER (ORDER BY date) as diff_1,
TS_DIFFERENCE(value, 7) OVER (ORDER BY date) as diff_7
FROM sales_data;
Use for:
- Making series stationary
- ARIMA preprocessing
- Removing trend
TS_LOG_TRANSFORM
Log transformation (variance stabilization)
SELECT
date,
value,
TS_LOG_TRANSFORM(value) as log_value
FROM sales_data
WHERE value > 0; -- Log undefined for ≤0
Use for:
- Reducing variance
- Multiplicative seasonality
- Financial data
Feature Engineering Example
Create a feature set for ML model:
CREATE TABLE feature_engineered AS
WITH base_data AS (
SELECT
date,
sales,
-- Lag features
TS_LAG(sales, 1) OVER (ORDER BY date) as lag_1,
TS_LAG(sales, 7) OVER (ORDER BY date) as lag_7,
TS_LAG(sales, 14) OVER (ORDER BY date) as lag_14,
-- Rolling statistics
TS_ROLLING_MEAN(sales, 7) OVER (ORDER BY date) as ma_7,
TS_ROLLING_STD(sales, 7) OVER (ORDER BY date) as std_7,
-- Seasonal features
MONTH(date) as month,
DAYOFWEEK(date) as day_of_week,
-- Trend
TS_TREND(sales) OVER (ORDER BY date) as trend
FROM sales_data
)
SELECT *
FROM base_data
WHERE date >= DATE '2023-02-01'; -- Remove NAs from lags
Multi-Series Feature Engineering
Create features for multiple series in parallel:
CREATE TABLE features_by_product AS
SELECT
product_id,
date,
sales,
TS_LAG(sales, 7) OVER (PARTITION BY product_id ORDER BY date) as lag_7,
TS_ROLLING_MEAN(sales, 7) OVER (PARTITION BY product_id ORDER BY date) as ma_7,
TS_ROLLING_STD(sales, 7) OVER (PARTITION BY product_id ORDER BY date) as std_7
FROM sales_by_product;
Feature Selection Guide
| Feature | When to Use | Example |
|---|---|---|
| Lag 1 | Previous value predictor | AutoRegressive models |
| Lag 7 | Weekly dependency | Retail with weekend effect |
| Lag 14 | Bi-weekly cycle | Pay-period patterns |
| MA 7 | Weekly trend | Smooth underlying direction |
| MA 30 | Monthly trend | Longer-term movement |
| STD 7 | Volatility | Risk assessment |
| Fourier terms | Seasonality | When period not fixed |
| Month dummy | Holiday effects | Retail, travel |
| Day of week | Weekly cycle | Service demand |
| Trend | Long-term direction | Growth analysis |
Feature Engineering Workflow
-- 1. Create base features
CREATE TABLE step1_lags AS
SELECT
date,
value,
LAG(value, 1) OVER (ORDER BY date) as lag_1,
LAG(value, 7) OVER (ORDER BY date) as lag_7
FROM raw_data;
-- 2. Add rolling statistics
CREATE TABLE step2_rolling AS
SELECT
date,
value,
lag_1,
lag_7,
AVG(value) OVER (ORDER BY date ROWS 6 PRECEDING) as ma_7,
STDDEV(value) OVER (ORDER BY date ROWS 6 PRECEDING) as std_7
FROM step1_lags;
-- 3. Add temporal features
CREATE TABLE step3_temporal AS
SELECT
date,
value,
lag_1,
lag_7,
ma_7,
std_7,
DAYOFWEEK(date) as day_of_week,
MONTH(date) as month
FROM step2_rolling;
-- 4. Use for forecasting
SELECT * FROM TS_FORECAST(
'step3_temporal',
'date',
'value',
'AutoETS',
30,
{}
);
Performance Considerations
Memory usage by feature type:
- Lag features: O(n)
- Rolling statistics: O(n)
- Fourier terms: O(n)
- Dummy variables: O(1)
For 1M+ observations:
- Use sparse implementations
- Select key lags only
- Aggregate to lower frequency
Common Patterns
Weekly Seasonality
SELECT
date,
value,
TS_LAG(value, 7) OVER (ORDER BY date) as lag_weekly,
TS_ROLLING_MEAN(value, 7) OVER (ORDER BY date) as ma_weekly,
DAYOFWEEK(date) as day_of_week
FROM sales_data;
Yearly Seasonality
SELECT
date,
value,
TS_LAG(value, 365) OVER (ORDER BY date) as lag_yearly,
MONTH(date) as month,
QUARTER(date) as quarter
FROM sales_data;
Multiple Seasonality
SELECT
date,
value,
TS_LAG(value, 24) OVER (ORDER BY date) as lag_daily,
TS_LAG(value, 168) OVER (ORDER BY date) as lag_weekly,
TS_ROLLING_MEAN(value, 24) OVER (ORDER BY date) as ma_daily,
TS_ROLLING_MEAN(value, 168) OVER (ORDER BY date) as ma_weekly
FROM hourly_data;
Next Steps
- Diagnostics Reference — Detect patterns
- Metrics Reference — Evaluate models
- Data Preparation Guide — Clean data first
Key Takeaways
- ✅ Lag features capture autoregressive patterns
- ✅ Rolling statistics reveal trends and volatility
- ✅ Fourier terms encode seasonality
- ✅ Use TS_TSFRESH_FEATURES for automatic extraction
- ✅ Combine features strategically (not all at once)
- ✅ Always remove NAs from lagged features
- ✅ Use PARTITION BY for multi-series features