Feature Engineering
Create and engineer features for advanced time-series analysis and machine learning.
Calculation Modes
Each feature can be calculated in three ways:
- Whole Series - Calculate the feature for the entire time series (default)
- Rolling - Calculate the feature over a sliding window for temporal patterns
- Lag - Calculate the feature with a time lag for predictive modeling
Extract Time Series Features
Use anofox_fcst_ts_features to extract 87 statistical features from time series data.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
values | DOUBLE[] | Yes | - | Time series values |
dates | DATE[]/TIMESTAMP[]/INTEGER[] | Yes | - | Corresponding dates |
Output
Returns MAP containing 87 statistical features across multiple categories.
Example
SELECT anofox_fcst_ts_features(
LIST(value),
LIST(date)
) as features
FROM sales_data;
Use for:
- Machine learning models
- Feature engineering at scale
- Automatic feature extraction
List Available Features
SELECT * FROM anofox_fcst_ts_features_list();
Returns a table listing all available features that can be extracted.
All Available Features (87)
| Feature | Parameters | Category |
|---|---|---|
abs_energy | - | Basic Statistics |
absolute_maximum | - | Basic Statistics |
absolute_sum_of_changes | - | Basic Statistics |
mean | - | Basic Statistics |
median | - | Basic Statistics |
minimum | - | Basic Statistics |
maximum | - | Basic Statistics |
standard_deviation | - | Basic Statistics |
variance | - | Basic Statistics |
skewness | - | Basic Statistics |
kurtosis | - | Basic Statistics |
root_mean_square | - | Basic Statistics |
sum_values | - | Basic Statistics |
length | - | Basic Statistics |
mean_abs_change | - | Basic Statistics |
mean_change | - | Basic Statistics |
mean_second_derivative_central | - | Basic Statistics |
variation_coefficient | - | Basic Statistics |
variance_larger_than_standard_deviation | - | Basic Statistics |
quantile | q: 0.1 | Distribution |
count_above | t: 0.0 | Distribution |
count_above_mean | - | Distribution |
count_below | t: 0.0 | Distribution |
count_below_mean | - | Distribution |
range_count | min: -1.0, max: 1.0 | Distribution |
ratio_beyond_r_sigma | r: 0.5 | Distribution |
ratio_value_number_to_time_series_length | - | Distribution |
large_standard_deviation | r: 0.05 | Distribution |
symmetry_looking | r: 0.0 | Distribution |
value_count | value: 0 | Distribution |
index_mass_quantile | q: 0.1 | Distribution |
autocorrelation | lag: 1 | Autocorrelation |
partial_autocorrelation | lag: 1 | Autocorrelation |
agg_autocorrelation | f_agg: mean, maxlag: 40 | Autocorrelation |
c3 | lag: 1 | Autocorrelation |
time_reversal_asymmetry_statistic | lag: 1 | Autocorrelation |
linear_trend | attr: pvalue | Trend |
linear_trend_timewise | attr: pvalue | Trend |
agg_linear_trend | attr: rvalue, chunk_len: 5, f_agg: max | Trend |
ar_coefficient | coeff: 0, k: 10 | Trend |
augmented_dickey_fuller | attr: teststat | Trend |
friedrich_coefficients | coeff: 0, m: 3, r: 30 | Trend |
max_langevin_fixed_point | m: 3, r: 30 | Trend |
approximate_entropy | m: 2, r: 0.1 | Entropy |
sample_entropy | - | Entropy |
permutation_entropy | dimension: 3, tau: 1 | Entropy |
binned_entropy | max_bins: 10 | Entropy |
fourier_entropy | bins: 2 | Entropy |
lempel_ziv_complexity | bins: 2 | Complexity |
cid_ce | - | Complexity |
benford_correlation | - | Complexity |
fft_coefficient | attr: real, coeff: 0 | Frequency |
fft_aggregated | aggtype: centroid | Frequency |
spkt_welch_density | coeff: 2 | Frequency |
cwt_coefficients | coeff: 0, w: 2, widths: [2,5,10,20] | Frequency |
energy_ratio_by_chunks | num_segments: 10, segment_focus: 0 | Frequency |
first_location_of_maximum | - | Location |
first_location_of_minimum | - | Location |
last_location_of_maximum | - | Location |
last_location_of_minimum | - | Location |
longest_strike_above_mean | - | Runs |
longest_strike_below_mean | - | Runs |
number_crossing_m | m: 0 | Runs |
has_duplicate | - | Duplicates |
has_duplicate_max | - | Duplicates |
has_duplicate_min | - | Duplicates |
percentage_of_reoccurring_datapoints_to_all_datapoints | - | Duplicates |
percentage_of_reoccurring_values_to_all_values | - | Duplicates |
sum_of_reoccurring_data_points | - | Duplicates |
sum_of_reoccurring_values | - | Duplicates |
number_peaks | n: 1 | Peaks |
number_cwt_peaks | n: 1 | Peaks |
mean_n_absolute_max | number_of_maxima: 3 | Peaks |
change_quantiles | f_agg: mean, isabs: false, qh: 0.2, ql: 0.0 | Change |
matrix_profile | feature: min, threshold: 0.98 | Pattern |
query_similarity_count | threshold: 0.0 | Pattern |
Showing 76 of 76
Load Feature Configuration
From JSON
SELECT anofox_fcst_ts_features_config_from_json('/path/to/config.json') as config;
From CSV
SELECT anofox_fcst_ts_features_config_from_csv('/path/to/config.csv') as config;
Feature Engineering Example
Extract features for machine learning models:
SELECT
product_id,
anofox_fcst_ts_features(
LIST(sales ORDER BY date),
LIST(date ORDER BY date)
) as features
FROM sales_by_product
GROUP BY product_id;
Feature Engineering Workflow
-- 1. Extract time series features
CREATE TABLE step1_features AS
SELECT
product_id,
anofox_fcst_ts_features(
LIST(sales ORDER BY date),
LIST(date ORDER BY date)
) as features
FROM sales_by_product
GROUP BY product_id;
-- 2. Use standard SQL for temporal features
CREATE TABLE step2_temporal AS
SELECT
date,
sales,
DAYOFWEEK(date) as day_of_week,
MONTH(date) as month,
LAG(sales, 7) OVER (ORDER BY date) as lag_7,
AVG(sales) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as ma_7
FROM sales_data;
-- 3. Use for forecasting
SELECT * FROM anofox_fcst_ts_forecast(
'step2_temporal',
date,
sales,
'AutoETS',
30,
MAP{}
);
Standard SQL for Lag and Rolling
Use DuckDB window functions for lag and rolling statistics:
SELECT
date,
sales,
-- Lag features
LAG(sales, 1) OVER (ORDER BY date) as lag_1,
LAG(sales, 7) OVER (ORDER BY date) as lag_7,
LAG(sales, 28) OVER (ORDER BY date) as lag_28,
-- Rolling features
AVG(sales) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as ma_7,
AVG(sales) OVER (ORDER BY date ROWS BETWEEN 27 PRECEDING AND CURRENT ROW) as ma_28,
-- Other features
DAYOFWEEK(date) as day_of_week,
MONTH(date) as month,
QUARTER(date) as quarter
FROM sales_data;