Feature Engineering

Create and engineer features for advanced time-series analysis and machine learning.

Calculation Modes

Each feature can be calculated in three ways:

Whole Series - Calculate the feature for the entire time series (default)
Rolling - Calculate the feature over a sliding window for temporal patterns
Lag - Calculate the feature with a time lag for predictive modeling

Extract Time Series Features

Use anofox_fcst_ts_features to extract 117 statistical features from time series data.

Parameters

Parameter	Type	Required	Default	Description
`values`	DOUBLE[]	Yes	-	Time series values
`dates`	DATE[]/TIMESTAMP[]/INTEGER[]	Yes	-	Corresponding dates

Output

Returns MAP containing 117 statistical features across multiple categories.

Example

SELECT anofox_fcst_ts_features(
    LIST(value),
    LIST(date)
) as features
FROM sales_data;

Use for:

Machine learning models
Feature engineering at scale
Automatic feature extraction

List Available Features

SELECT * FROM anofox_fcst_ts_features_list();

Returns a table listing all available features that can be extracted.

All Available Features (117)

Feature	Parameters	Category
`abs_energy`	-	Basic Statistics
`absolute_maximum`	-	Basic Statistics
`absolute_sum_of_changes`	-	Basic Statistics
`mean`	-	Basic Statistics
`median`	-	Basic Statistics
`minimum`	-	Basic Statistics
`maximum`	-	Basic Statistics
`standard_deviation`	-	Basic Statistics
`variance`	-	Basic Statistics
`skewness`	-	Basic Statistics
`kurtosis`	-	Basic Statistics
`root_mean_square`	-	Basic Statistics
`sum_values`	-	Basic Statistics
`length`	-	Basic Statistics
`mean_abs_change`	-	Basic Statistics
`mean_change`	-	Basic Statistics
`mean_second_derivative_central`	-	Basic Statistics
`variation_coefficient`	-	Basic Statistics
`variance_larger_than_standard_deviation`	-	Basic Statistics
`quantile`	q: 0.1	Distribution
`count_above`	t: 0.0	Distribution
`count_above_mean`	-	Distribution
`count_below`	t: 0.0	Distribution
`count_below_mean`	-	Distribution
`range_count`	min: -1.0, max: 1.0	Distribution
`ratio_beyond_r_sigma`	r: 0.5	Distribution
`ratio_value_number_to_time_series_length`	-	Distribution
`large_standard_deviation`	r: 0.05	Distribution
`symmetry_looking`	r: 0.0	Distribution
`value_count`	value: 0	Distribution
`index_mass_quantile`	q: 0.1	Distribution
`autocorrelation`	lag: 1	Autocorrelation
`partial_autocorrelation`	lag: 1	Autocorrelation
`agg_autocorrelation`	f_agg: mean, maxlag: 40	Autocorrelation
`c3`	lag: 1	Autocorrelation
`time_reversal_asymmetry_statistic`	lag: 1	Autocorrelation
`linear_trend`	attr: pvalue	Trend
`linear_trend_timewise`	attr: pvalue	Trend
`agg_linear_trend`	attr: rvalue, chunk_len: 5, f_agg: max	Trend
`ar_coefficient`	coeff: 0, k: 10	Trend
`augmented_dickey_fuller`	attr: teststat	Trend
`friedrich_coefficients`	coeff: 0, m: 3, r: 30	Trend
`max_langevin_fixed_point`	m: 3, r: 30	Trend
`approximate_entropy`	m: 2, r: 0.1	Entropy
`sample_entropy`	-	Entropy
`permutation_entropy`	dimension: 3, tau: 1	Entropy
`binned_entropy`	max_bins: 10	Entropy
`fourier_entropy`	bins: 2	Entropy
`lempel_ziv_complexity`	bins: 2	Complexity
`cid_ce`	-	Complexity
`benford_correlation`	-	Complexity
`fft_coefficient`	attr: real, coeff: 0	Frequency
`fft_aggregated`	aggtype: centroid	Frequency
`spkt_welch_density`	coeff: 2	Frequency
`cwt_coefficients`	coeff: 0, w: 2, widths: [2,5,10,20]	Frequency
`energy_ratio_by_chunks`	num_segments: 10, segment_focus: 0	Frequency
`first_location_of_maximum`	-	Location
`first_location_of_minimum`	-	Location
`last_location_of_maximum`	-	Location
`last_location_of_minimum`	-	Location
`longest_strike_above_mean`	-	Runs
`longest_strike_below_mean`	-	Runs
`number_crossing_m`	m: 0	Runs
`has_duplicate`	-	Duplicates
`has_duplicate_max`	-	Duplicates
`has_duplicate_min`	-	Duplicates
`percentage_of_reoccurring_datapoints_to_all_datapoints`	-	Duplicates
`percentage_of_reoccurring_values_to_all_values`	-	Duplicates
`sum_of_reoccurring_data_points`	-	Duplicates
`sum_of_reoccurring_values`	-	Duplicates
`number_peaks`	n: 1	Peaks
`number_cwt_peaks`	n: 1	Peaks
`mean_n_absolute_max`	number_of_maxima: 3	Peaks
`change_quantiles`	f_agg: mean, isabs: false, qh: 0.2, ql: 0.0	Change
`matrix_profile`	feature: min, threshold: 0.98	Pattern
`query_similarity_count`	threshold: 0.0	Pattern

Showing 76 of 76

Load Feature Configuration

From JSON

SELECT anofox_fcst_ts_features_config_from_json('/path/to/config.json') as config;

From CSV

SELECT anofox_fcst_ts_features_config_from_csv('/path/to/config.csv') as config;

Feature Engineering Example

Extract features for machine learning models:

SELECT
    product_id,
    anofox_fcst_ts_features(
        LIST(sales ORDER BY date),
        LIST(date ORDER BY date)
    ) as features
FROM sales_by_product
GROUP BY product_id;

Feature Engineering Workflow

-- 1. Extract time series features
CREATE TABLE step1_features AS
SELECT
    product_id,
    anofox_fcst_ts_features(
        LIST(sales ORDER BY date),
        LIST(date ORDER BY date)
    ) as features
FROM sales_by_product
GROUP BY product_id;

-- 2. Use standard SQL for temporal features
CREATE TABLE step2_temporal AS
SELECT
    date,
    sales,
    DAYOFWEEK(date) as day_of_week,
    MONTH(date) as month,
    LAG(sales, 7) OVER (ORDER BY date) as lag_7,
    AVG(sales) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as ma_7
FROM sales_data;

-- 3. Use for forecasting
SELECT * FROM ts_forecast_by(
    'step2_temporal', NULL, date, sales,
    'AutoETS', 30, '1d',
    MAP{}
);

Standard SQL for Lag and Rolling

Use DuckDB window functions for lag and rolling statistics:

SELECT
    date,
    sales,
    -- Lag features
    LAG(sales, 1) OVER (ORDER BY date) as lag_1,
    LAG(sales, 7) OVER (ORDER BY date) as lag_7,
    LAG(sales, 28) OVER (ORDER BY date) as lag_28,

    -- Rolling features
    AVG(sales) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as ma_7,
    AVG(sales) OVER (ORDER BY date ROWS BETWEEN 27 PRECEDING AND CURRENT ROW) as ma_28,

    -- Other features
    DAYOFWEEK(date) as day_of_week,
    MONTH(date) as month,
    QUARTER(date) as quarter
FROM sales_data;

Extract Time Series Features​

Parameters​

Output​

Example​

List Available Features​

All Available Features (117)​

Load Feature Configuration​

From JSON​

From CSV​

Feature Engineering Example​

Feature Engineering Workflow​

Standard SQL for Lag and Rolling​