Skip to main content

Feature Engineering & Transformations

Create and engineer features for advanced time-series analysis and machine learning.

Lag Features

Create lagged versions of your series for autoregressive models.

TS_LAG

Create lagged features (past values)

SELECT
date,
value,
TS_LAG(value, 1) OVER (ORDER BY date) as lag_1,
TS_LAG(value, 7) OVER (ORDER BY date) as lag_7,
TS_LAG(value, 14) OVER (ORDER BY date) as lag_14
FROM sales_data;

Use for:

  • ARIMA/ARIMAX models
  • Feature engineering
  • Autoregressive analysis

Parameters:

  • n: Number of periods to lag (1, 7, 14, etc.)

Rolling Statistics

Calculate statistics over sliding windows.

TS_ROLLING_MEAN

Moving average over window

SELECT
date,
value,
TS_ROLLING_MEAN(value, 7) OVER (ORDER BY date) as ma_7,
TS_ROLLING_MEAN(value, 30) OVER (ORDER BY date) as ma_30
FROM sales_data;

Use for:

  • Trend identification
  • Noise reduction
  • Feature engineering

TS_ROLLING_STD

Rolling standard deviation (volatility)

SELECT
date,
value,
TS_ROLLING_STD(value, 7) OVER (ORDER BY date) as std_7,
TS_ROLLING_STD(value, 14) OVER (ORDER BY date) as std_14
FROM sales_data;

Use for:

  • Volatility measurement
  • Confidence interval width
  • Risk assessment

TS_ROLLING_MIN / TS_ROLLING_MAX

Rolling minimum and maximum

SELECT
date,
value,
TS_ROLLING_MIN(value, 7) OVER (ORDER BY date) as min_7,
TS_ROLLING_MAX(value, 7) OVER (ORDER BY date) as max_7,
TS_ROLLING_MAX(value, 7) - TS_ROLLING_MIN(value, 7) as range_7
FROM sales_data;

Use for:

  • Range tracking
  • Support/resistance levels
  • Anomaly detection

Seasonal Features

Extract seasonal patterns as features.

TS_FOURIER

Fourier terms for seasonality

SELECT
date,
TS_FOURIER(date, 7, 2) OVER () as fourier_weekly,
TS_FOURIER(date, 365, 3) OVER () as fourier_yearly
FROM calendar_dates;

Use for:

  • Capturing seasonal patterns
  • Machine learning models
  • Non-linear seasonality

Parameters:

  • period: 7 (weekly), 365 (yearly), etc.
  • K: Number of Fourier terms (higher = more detail)

TS_SEASONAL_DUMMY

Dummy variables for seasons

SELECT
date,
MONTH(date) as month,
TS_SEASONAL_DUMMY(date, 12) as is_dec_or_jan
FROM sales_data;

Use for:

  • Categorical seasonal indicators
  • Holiday dummies
  • Month/quarter dummies

Trend Features

Extract and model trend components.

TS_TREND

Extract trend line

SELECT
date,
value,
TS_TREND(value) OVER (ORDER BY date) as trend
FROM sales_data;

Use for:

  • Trend identification
  • Detrending
  • Growth analysis

TS_DETREND

Remove trend from series

SELECT
date,
value,
TS_DETREND(value) OVER (ORDER BY date) as detrended
FROM sales_data;

Use for:

  • Removing trend before seasonality analysis
  • Stationarizing data
  • Pure seasonal component

Statistical Features

Extract statistical characteristics of the series.

TS_TSFRESH_FEATURES

Automatically calculate 76+ statistical features

SELECT * FROM TS_TSFRESH_FEATURES(
'sales_data',
'date',
'sales',
{
'features': 'all', -- or specific list
'rolling_windows': [7, 14, 30],
'lags': [1, 7, 14]
}
);

Returns 76+ features including:

  • Mean, std, min, max
  • Lag autocorrelation
  • Trend strength
  • Seasonality strength
  • Entropy, complexity
  • Many more...

Use for:

  • Machine learning models
  • Feature engineering at scale
  • Automatic feature extraction

TS_AUTOCORR

Autocorrelation at specific lag

SELECT
CAST(TS_AUTOCORR(sales, 1) AS DOUBLE) as acf_lag1,
CAST(TS_AUTOCORR(sales, 7) AS DOUBLE) as acf_lag7,
CAST(TS_AUTOCORR(sales, 14) AS DOUBLE) as acf_lag14
FROM sales_data;

Use for:

  • Seasonality detection
  • Dependency analysis
  • Model diagnostics

Transformations

Transform data for better model performance.

TS_DIFFERENCE

First differences (detrending)

SELECT
date,
value,
TS_DIFFERENCE(value, 1) OVER (ORDER BY date) as diff_1,
TS_DIFFERENCE(value, 7) OVER (ORDER BY date) as diff_7
FROM sales_data;

Use for:

  • Making series stationary
  • ARIMA preprocessing
  • Removing trend

TS_LOG_TRANSFORM

Log transformation (variance stabilization)

SELECT
date,
value,
TS_LOG_TRANSFORM(value) as log_value
FROM sales_data
WHERE value > 0; -- Log undefined for ≤0

Use for:

  • Reducing variance
  • Multiplicative seasonality
  • Financial data

Feature Engineering Example

Create a feature set for ML model:

CREATE TABLE feature_engineered AS
WITH base_data AS (
SELECT
date,
sales,
-- Lag features
TS_LAG(sales, 1) OVER (ORDER BY date) as lag_1,
TS_LAG(sales, 7) OVER (ORDER BY date) as lag_7,
TS_LAG(sales, 14) OVER (ORDER BY date) as lag_14,

-- Rolling statistics
TS_ROLLING_MEAN(sales, 7) OVER (ORDER BY date) as ma_7,
TS_ROLLING_STD(sales, 7) OVER (ORDER BY date) as std_7,

-- Seasonal features
MONTH(date) as month,
DAYOFWEEK(date) as day_of_week,

-- Trend
TS_TREND(sales) OVER (ORDER BY date) as trend
FROM sales_data
)
SELECT *
FROM base_data
WHERE date >= DATE '2023-02-01'; -- Remove NAs from lags

Multi-Series Feature Engineering

Create features for multiple series in parallel:

CREATE TABLE features_by_product AS
SELECT
product_id,
date,
sales,
TS_LAG(sales, 7) OVER (PARTITION BY product_id ORDER BY date) as lag_7,
TS_ROLLING_MEAN(sales, 7) OVER (PARTITION BY product_id ORDER BY date) as ma_7,
TS_ROLLING_STD(sales, 7) OVER (PARTITION BY product_id ORDER BY date) as std_7
FROM sales_by_product;

Feature Selection Guide

FeatureWhen to UseExample
Lag 1Previous value predictorAutoRegressive models
Lag 7Weekly dependencyRetail with weekend effect
Lag 14Bi-weekly cyclePay-period patterns
MA 7Weekly trendSmooth underlying direction
MA 30Monthly trendLonger-term movement
STD 7VolatilityRisk assessment
Fourier termsSeasonalityWhen period not fixed
Month dummyHoliday effectsRetail, travel
Day of weekWeekly cycleService demand
TrendLong-term directionGrowth analysis

Feature Engineering Workflow

-- 1. Create base features
CREATE TABLE step1_lags AS
SELECT
date,
value,
LAG(value, 1) OVER (ORDER BY date) as lag_1,
LAG(value, 7) OVER (ORDER BY date) as lag_7
FROM raw_data;

-- 2. Add rolling statistics
CREATE TABLE step2_rolling AS
SELECT
date,
value,
lag_1,
lag_7,
AVG(value) OVER (ORDER BY date ROWS 6 PRECEDING) as ma_7,
STDDEV(value) OVER (ORDER BY date ROWS 6 PRECEDING) as std_7
FROM step1_lags;

-- 3. Add temporal features
CREATE TABLE step3_temporal AS
SELECT
date,
value,
lag_1,
lag_7,
ma_7,
std_7,
DAYOFWEEK(date) as day_of_week,
MONTH(date) as month
FROM step2_rolling;

-- 4. Use for forecasting
SELECT * FROM TS_FORECAST(
'step3_temporal',
'date',
'value',
'AutoETS',
30,
{}
);

Performance Considerations

Memory usage by feature type:

  • Lag features: O(n)
  • Rolling statistics: O(n)
  • Fourier terms: O(n)
  • Dummy variables: O(1)

For 1M+ observations:

  • Use sparse implementations
  • Select key lags only
  • Aggregate to lower frequency

Common Patterns

Weekly Seasonality

SELECT
date,
value,
TS_LAG(value, 7) OVER (ORDER BY date) as lag_weekly,
TS_ROLLING_MEAN(value, 7) OVER (ORDER BY date) as ma_weekly,
DAYOFWEEK(date) as day_of_week
FROM sales_data;

Yearly Seasonality

SELECT
date,
value,
TS_LAG(value, 365) OVER (ORDER BY date) as lag_yearly,
MONTH(date) as month,
QUARTER(date) as quarter
FROM sales_data;

Multiple Seasonality

SELECT
date,
value,
TS_LAG(value, 24) OVER (ORDER BY date) as lag_daily,
TS_LAG(value, 168) OVER (ORDER BY date) as lag_weekly,
TS_ROLLING_MEAN(value, 24) OVER (ORDER BY date) as ma_daily,
TS_ROLLING_MEAN(value, 168) OVER (ORDER BY date) as ma_weekly
FROM hourly_data;

Next Steps


Key Takeaways

  • ✅ Lag features capture autoregressive patterns
  • ✅ Rolling statistics reveal trends and volatility
  • ✅ Fourier terms encode seasonality
  • ✅ Use TS_TSFRESH_FEATURES for automatic extraction
  • ✅ Combine features strategically (not all at once)
  • ✅ Always remove NAs from lagged features
  • ✅ Use PARTITION BY for multi-series features
🍪 Cookie Settings