Skip to main content

Feature Engineering

Feature engineering is the process of extracting meaningful numerical summaries (features) from raw time series data to serve as inputs for machine learning models or downstream analysis. AnoFox computes 117 tsfresh-compatible statistical features -- spanning basic statistics, autocorrelation, trend, entropy, complexity, frequency domain, and pattern detection -- directly from your time series data in SQL.

Calculation Modes

Each feature can be calculated in three ways:

  • Whole Series - Calculate the feature for the entire time series (default)
  • Rolling - Calculate the feature over a sliding window for temporal patterns
  • Lag - Calculate the feature with a time lag for predictive modeling

Extract Time Series Features

Use anofox_fcst_ts_features to extract 117 statistical features from time series data.

Parameters

ParameterTypeRequiredDefaultDescription
valuesDOUBLE[]Yes-Time series values
datesDATE[]/TIMESTAMP[]/INTEGER[]Yes-Corresponding dates

Output

Returns MAP containing 117 statistical features across multiple categories.

Example

SELECT anofox_fcst_ts_features(
LIST(value),
LIST(date)
) as features
FROM sales_data;

Use for:

  • Machine learning models
  • Feature engineering at scale
  • Automatic feature extraction

List Available Features

SELECT * FROM anofox_fcst_ts_features_list();

Returns a table listing all available features that can be extracted.


All Available Features (117)

FeatureParametersCategory
abs_energy-Basic Statistics
absolute_maximum-Basic Statistics
absolute_sum_of_changes-Basic Statistics
mean-Basic Statistics
median-Basic Statistics
minimum-Basic Statistics
maximum-Basic Statistics
standard_deviation-Basic Statistics
variance-Basic Statistics
skewness-Basic Statistics
kurtosis-Basic Statistics
root_mean_square-Basic Statistics
sum_values-Basic Statistics
length-Basic Statistics
mean_abs_change-Basic Statistics
mean_change-Basic Statistics
mean_second_derivative_central-Basic Statistics
variation_coefficient-Basic Statistics
variance_larger_than_standard_deviation-Basic Statistics
quantileq: 0.1Distribution
count_abovet: 0.0Distribution
count_above_mean-Distribution
count_belowt: 0.0Distribution
count_below_mean-Distribution
range_countmin: -1.0, max: 1.0Distribution
ratio_beyond_r_sigmar: 0.5Distribution
ratio_value_number_to_time_series_length-Distribution
large_standard_deviationr: 0.05Distribution
symmetry_lookingr: 0.0Distribution
value_countvalue: 0Distribution
index_mass_quantileq: 0.1Distribution
autocorrelationlag: 1Autocorrelation
partial_autocorrelationlag: 1Autocorrelation
agg_autocorrelationf_agg: mean, maxlag: 40Autocorrelation
c3lag: 1Autocorrelation
time_reversal_asymmetry_statisticlag: 1Autocorrelation
linear_trendattr: pvalueTrend
linear_trend_timewiseattr: pvalueTrend
agg_linear_trendattr: rvalue, chunk_len: 5, f_agg: maxTrend
ar_coefficientcoeff: 0, k: 10Trend
augmented_dickey_fullerattr: teststatTrend
friedrich_coefficientscoeff: 0, m: 3, r: 30Trend
max_langevin_fixed_pointm: 3, r: 30Trend
approximate_entropym: 2, r: 0.1Entropy
sample_entropy-Entropy
permutation_entropydimension: 3, tau: 1Entropy
binned_entropymax_bins: 10Entropy
fourier_entropybins: 2Entropy
lempel_ziv_complexitybins: 2Complexity
cid_ce-Complexity
benford_correlation-Complexity
fft_coefficientattr: real, coeff: 0Frequency
fft_aggregatedaggtype: centroidFrequency
spkt_welch_densitycoeff: 2Frequency
cwt_coefficientscoeff: 0, w: 2, widths: [2,5,10,20]Frequency
energy_ratio_by_chunksnum_segments: 10, segment_focus: 0Frequency
first_location_of_maximum-Location
first_location_of_minimum-Location
last_location_of_maximum-Location
last_location_of_minimum-Location
longest_strike_above_mean-Runs
longest_strike_below_mean-Runs
number_crossing_mm: 0Runs
has_duplicate-Duplicates
has_duplicate_max-Duplicates
has_duplicate_min-Duplicates
percentage_of_reoccurring_datapoints_to_all_datapoints-Duplicates
percentage_of_reoccurring_values_to_all_values-Duplicates
sum_of_reoccurring_data_points-Duplicates
sum_of_reoccurring_values-Duplicates
number_peaksn: 1Peaks
number_cwt_peaksn: 1Peaks
mean_n_absolute_maxnumber_of_maxima: 3Peaks
change_quantilesf_agg: mean, isabs: false, qh: 0.2, ql: 0.0Change
matrix_profilefeature: min, threshold: 0.98Pattern
query_similarity_countthreshold: 0.0Pattern
Showing 76 of 76

Load Feature Configuration

From JSON

SELECT anofox_fcst_ts_features_config_from_json('/path/to/config.json') as config;

From CSV

SELECT anofox_fcst_ts_features_config_from_csv('/path/to/config.csv') as config;

Feature Engineering Example

Extract features for machine learning models:

SELECT
product_id,
anofox_fcst_ts_features(
LIST(sales ORDER BY date),
LIST(date ORDER BY date)
) as features
FROM sales_by_product
GROUP BY product_id;

Feature Engineering Workflow

-- 1. Extract time series features
CREATE TABLE step1_features AS
SELECT
product_id,
anofox_fcst_ts_features(
LIST(sales ORDER BY date),
LIST(date ORDER BY date)
) as features
FROM sales_by_product
GROUP BY product_id;

-- 2. Use standard SQL for temporal features
CREATE TABLE step2_temporal AS
SELECT
date,
sales,
DAYOFWEEK(date) as day_of_week,
MONTH(date) as month,
LAG(sales, 7) OVER (ORDER BY date) as lag_7,
AVG(sales) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as ma_7
FROM sales_data;

-- 3. Use for forecasting
SELECT * FROM ts_forecast_by(
'step2_temporal', NULL, date, sales,
'AutoETS', 30, '1d',
MAP{}
);

Standard SQL for Lag and Rolling

Use DuckDB window functions for lag and rolling statistics:

SELECT
date,
sales,
-- Lag features
LAG(sales, 1) OVER (ORDER BY date) as lag_1,
LAG(sales, 7) OVER (ORDER BY date) as lag_7,
LAG(sales, 28) OVER (ORDER BY date) as lag_28,

-- Rolling features
AVG(sales) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as ma_7,
AVG(sales) OVER (ORDER BY date ROWS BETWEEN 27 PRECEDING AND CURRENT ROW) as ma_28,

-- Other features
DAYOFWEEK(date) as day_of_week,
MONTH(date) as month,
QUARTER(date) as quarter
FROM sales_data;

🍪 Cookie Settings