Skip to main content

Feature Engineering

Create and engineer features for advanced time-series analysis and machine learning.

Calculation Modes

Each feature can be calculated in three ways:

  • Whole Series - Calculate the feature for the entire time series (default)
  • Rolling - Calculate the feature over a sliding window for temporal patterns
  • Lag - Calculate the feature with a time lag for predictive modeling

Extract Time Series Features

Use anofox_fcst_ts_features to extract 87 statistical features from time series data.

Parameters

ParameterTypeRequiredDefaultDescription
valuesDOUBLE[]Yes-Time series values
datesDATE[]/TIMESTAMP[]/INTEGER[]Yes-Corresponding dates

Output

Returns MAP containing 87 statistical features across multiple categories.

Example

SELECT anofox_fcst_ts_features(
LIST(value),
LIST(date)
) as features
FROM sales_data;

Use for:

  • Machine learning models
  • Feature engineering at scale
  • Automatic feature extraction

List Available Features

SELECT * FROM anofox_fcst_ts_features_list();

Returns a table listing all available features that can be extracted.


All Available Features (87)

FeatureParametersCategory
abs_energy-Basic Statistics
absolute_maximum-Basic Statistics
absolute_sum_of_changes-Basic Statistics
mean-Basic Statistics
median-Basic Statistics
minimum-Basic Statistics
maximum-Basic Statistics
standard_deviation-Basic Statistics
variance-Basic Statistics
skewness-Basic Statistics
kurtosis-Basic Statistics
root_mean_square-Basic Statistics
sum_values-Basic Statistics
length-Basic Statistics
mean_abs_change-Basic Statistics
mean_change-Basic Statistics
mean_second_derivative_central-Basic Statistics
variation_coefficient-Basic Statistics
variance_larger_than_standard_deviation-Basic Statistics
quantileq: 0.1Distribution
count_abovet: 0.0Distribution
count_above_mean-Distribution
count_belowt: 0.0Distribution
count_below_mean-Distribution
range_countmin: -1.0, max: 1.0Distribution
ratio_beyond_r_sigmar: 0.5Distribution
ratio_value_number_to_time_series_length-Distribution
large_standard_deviationr: 0.05Distribution
symmetry_lookingr: 0.0Distribution
value_countvalue: 0Distribution
index_mass_quantileq: 0.1Distribution
autocorrelationlag: 1Autocorrelation
partial_autocorrelationlag: 1Autocorrelation
agg_autocorrelationf_agg: mean, maxlag: 40Autocorrelation
c3lag: 1Autocorrelation
time_reversal_asymmetry_statisticlag: 1Autocorrelation
linear_trendattr: pvalueTrend
linear_trend_timewiseattr: pvalueTrend
agg_linear_trendattr: rvalue, chunk_len: 5, f_agg: maxTrend
ar_coefficientcoeff: 0, k: 10Trend
augmented_dickey_fullerattr: teststatTrend
friedrich_coefficientscoeff: 0, m: 3, r: 30Trend
max_langevin_fixed_pointm: 3, r: 30Trend
approximate_entropym: 2, r: 0.1Entropy
sample_entropy-Entropy
permutation_entropydimension: 3, tau: 1Entropy
binned_entropymax_bins: 10Entropy
fourier_entropybins: 2Entropy
lempel_ziv_complexitybins: 2Complexity
cid_ce-Complexity
benford_correlation-Complexity
fft_coefficientattr: real, coeff: 0Frequency
fft_aggregatedaggtype: centroidFrequency
spkt_welch_densitycoeff: 2Frequency
cwt_coefficientscoeff: 0, w: 2, widths: [2,5,10,20]Frequency
energy_ratio_by_chunksnum_segments: 10, segment_focus: 0Frequency
first_location_of_maximum-Location
first_location_of_minimum-Location
last_location_of_maximum-Location
last_location_of_minimum-Location
longest_strike_above_mean-Runs
longest_strike_below_mean-Runs
number_crossing_mm: 0Runs
has_duplicate-Duplicates
has_duplicate_max-Duplicates
has_duplicate_min-Duplicates
percentage_of_reoccurring_datapoints_to_all_datapoints-Duplicates
percentage_of_reoccurring_values_to_all_values-Duplicates
sum_of_reoccurring_data_points-Duplicates
sum_of_reoccurring_values-Duplicates
number_peaksn: 1Peaks
number_cwt_peaksn: 1Peaks
mean_n_absolute_maxnumber_of_maxima: 3Peaks
change_quantilesf_agg: mean, isabs: false, qh: 0.2, ql: 0.0Change
matrix_profilefeature: min, threshold: 0.98Pattern
query_similarity_countthreshold: 0.0Pattern
Showing 76 of 76

Load Feature Configuration

From JSON

SELECT anofox_fcst_ts_features_config_from_json('/path/to/config.json') as config;

From CSV

SELECT anofox_fcst_ts_features_config_from_csv('/path/to/config.csv') as config;

Feature Engineering Example

Extract features for machine learning models:

SELECT
product_id,
anofox_fcst_ts_features(
LIST(sales ORDER BY date),
LIST(date ORDER BY date)
) as features
FROM sales_by_product
GROUP BY product_id;

Feature Engineering Workflow

-- 1. Extract time series features
CREATE TABLE step1_features AS
SELECT
product_id,
anofox_fcst_ts_features(
LIST(sales ORDER BY date),
LIST(date ORDER BY date)
) as features
FROM sales_by_product
GROUP BY product_id;

-- 2. Use standard SQL for temporal features
CREATE TABLE step2_temporal AS
SELECT
date,
sales,
DAYOFWEEK(date) as day_of_week,
MONTH(date) as month,
LAG(sales, 7) OVER (ORDER BY date) as lag_7,
AVG(sales) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as ma_7
FROM sales_data;

-- 3. Use for forecasting
SELECT * FROM anofox_fcst_ts_forecast(
'step2_temporal',
date,
sales,
'AutoETS',
30,
MAP{}
);

Standard SQL for Lag and Rolling

Use DuckDB window functions for lag and rolling statistics:

SELECT
date,
sales,
-- Lag features
LAG(sales, 1) OVER (ORDER BY date) as lag_1,
LAG(sales, 7) OVER (ORDER BY date) as lag_7,
LAG(sales, 28) OVER (ORDER BY date) as lag_28,

-- Rolling features
AVG(sales) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as ma_7,
AVG(sales) OVER (ORDER BY date ROWS BETWEEN 27 PRECEDING AND CURRENT ROW) as ma_28,

-- Other features
DAYOFWEEK(date) as day_of_week,
MONTH(date) as month,
QUARTER(date) as quarter
FROM sales_data;

🍪 Cookie Settings