Feature Engineering
Feature engineering is the process of extracting meaningful numerical summaries (features) from raw time series data to serve as inputs for machine learning models or downstream analysis. AnoFox computes 117 tsfresh-compatible statistical features -- spanning basic statistics, autocorrelation, trend, entropy, complexity, frequency domain, and pattern detection -- directly from your time series data in SQL.
Each feature can be calculated in three ways:
- Whole Series - Calculate the feature for the entire time series (default)
- Rolling - Calculate the feature over a sliding window for temporal patterns
- Lag - Calculate the feature with a time lag for predictive modeling
Extract Time Series Features
Use anofox_fcst_ts_features to extract 117 statistical features from time series data.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
values | DOUBLE[] | Yes | - | Time series values |
dates | DATE[]/TIMESTAMP[]/INTEGER[] | Yes | - | Corresponding dates |
Output
Returns MAP containing 117 statistical features across multiple categories.
Example
SELECT anofox_fcst_ts_features(
LIST(value),
LIST(date)
) as features
FROM sales_data;
Use for:
- Machine learning models
- Feature engineering at scale
- Automatic feature extraction
List Available Features
SELECT * FROM anofox_fcst_ts_features_list();
Returns a table listing all available features that can be extracted.
All Available Features (117)
| Feature | Parameters | Category |
|---|---|---|
abs_energy | - | Basic Statistics |
absolute_maximum | - | Basic Statistics |
absolute_sum_of_changes | - | Basic Statistics |
mean | - | Basic Statistics |
median | - | Basic Statistics |
minimum | - | Basic Statistics |
maximum | - | Basic Statistics |
standard_deviation | - | Basic Statistics |
variance | - | Basic Statistics |
skewness | - | Basic Statistics |
kurtosis | - | Basic Statistics |
root_mean_square | - | Basic Statistics |
sum_values | - | Basic Statistics |
length | - | Basic Statistics |
mean_abs_change | - | Basic Statistics |
mean_change | - | Basic Statistics |
mean_second_derivative_central | - | Basic Statistics |
variation_coefficient | - | Basic Statistics |
variance_larger_than_standard_deviation | - | Basic Statistics |
quantile | q: 0.1 | Distribution |
count_above | t: 0.0 | Distribution |
count_above_mean | - | Distribution |
count_below | t: 0.0 | Distribution |
count_below_mean | - | Distribution |
range_count | min: -1.0, max: 1.0 | Distribution |
ratio_beyond_r_sigma | r: 0.5 | Distribution |
ratio_value_number_to_time_series_length | - | Distribution |
large_standard_deviation | r: 0.05 | Distribution |
symmetry_looking | r: 0.0 | Distribution |
value_count | value: 0 | Distribution |
index_mass_quantile | q: 0.1 | Distribution |
autocorrelation | lag: 1 | Autocorrelation |
partial_autocorrelation | lag: 1 | Autocorrelation |
agg_autocorrelation | f_agg: mean, maxlag: 40 | Autocorrelation |
c3 | lag: 1 | Autocorrelation |
time_reversal_asymmetry_statistic | lag: 1 | Autocorrelation |
linear_trend | attr: pvalue | Trend |
linear_trend_timewise | attr: pvalue | Trend |
agg_linear_trend | attr: rvalue, chunk_len: 5, f_agg: max | Trend |
ar_coefficient | coeff: 0, k: 10 | Trend |
augmented_dickey_fuller | attr: teststat | Trend |
friedrich_coefficients | coeff: 0, m: 3, r: 30 | Trend |
max_langevin_fixed_point | m: 3, r: 30 | Trend |
approximate_entropy | m: 2, r: 0.1 | Entropy |
sample_entropy | - | Entropy |
permutation_entropy | dimension: 3, tau: 1 | Entropy |
binned_entropy | max_bins: 10 | Entropy |
fourier_entropy | bins: 2 | Entropy |
lempel_ziv_complexity | bins: 2 | Complexity |
cid_ce | - | Complexity |
benford_correlation | - | Complexity |
fft_coefficient | attr: real, coeff: 0 | Frequency |
fft_aggregated | aggtype: centroid | Frequency |
spkt_welch_density | coeff: 2 | Frequency |
cwt_coefficients | coeff: 0, w: 2, widths: [2,5,10,20] | Frequency |
energy_ratio_by_chunks | num_segments: 10, segment_focus: 0 | Frequency |
first_location_of_maximum | - | Location |
first_location_of_minimum | - | Location |
last_location_of_maximum | - | Location |
last_location_of_minimum | - | Location |
longest_strike_above_mean | - | Runs |
longest_strike_below_mean | - | Runs |
number_crossing_m | m: 0 | Runs |
has_duplicate | - | Duplicates |
has_duplicate_max | - | Duplicates |
has_duplicate_min | - | Duplicates |
percentage_of_reoccurring_datapoints_to_all_datapoints | - | Duplicates |
percentage_of_reoccurring_values_to_all_values | - | Duplicates |
sum_of_reoccurring_data_points | - | Duplicates |
sum_of_reoccurring_values | - | Duplicates |
number_peaks | n: 1 | Peaks |
number_cwt_peaks | n: 1 | Peaks |
mean_n_absolute_max | number_of_maxima: 3 | Peaks |
change_quantiles | f_agg: mean, isabs: false, qh: 0.2, ql: 0.0 | Change |
matrix_profile | feature: min, threshold: 0.98 | Pattern |
query_similarity_count | threshold: 0.0 | Pattern |
Load Feature Configuration
From JSON
SELECT anofox_fcst_ts_features_config_from_json('/path/to/config.json') as config;
From CSV
SELECT anofox_fcst_ts_features_config_from_csv('/path/to/config.csv') as config;
Feature Engineering Example
Extract features for machine learning models:
SELECT
product_id,
anofox_fcst_ts_features(
LIST(sales ORDER BY date),
LIST(date ORDER BY date)
) as features
FROM sales_by_product
GROUP BY product_id;
Feature Engineering Workflow
-- 1. Extract time series features
CREATE TABLE step1_features AS
SELECT
product_id,
anofox_fcst_ts_features(
LIST(sales ORDER BY date),
LIST(date ORDER BY date)
) as features
FROM sales_by_product
GROUP BY product_id;
-- 2. Use standard SQL for temporal features
CREATE TABLE step2_temporal AS
SELECT
date,
sales,
DAYOFWEEK(date) as day_of_week,
MONTH(date) as month,
LAG(sales, 7) OVER (ORDER BY date) as lag_7,
AVG(sales) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as ma_7
FROM sales_data;
-- 3. Use for forecasting
SELECT * FROM ts_forecast_by(
'step2_temporal', NULL, date, sales,
'AutoETS', 30, '1d',
MAP{}
);
Standard SQL for Lag and Rolling
Use DuckDB window functions for lag and rolling statistics:
SELECT
date,
sales,
-- Lag features
LAG(sales, 1) OVER (ORDER BY date) as lag_1,
LAG(sales, 7) OVER (ORDER BY date) as lag_7,
LAG(sales, 28) OVER (ORDER BY date) as lag_28,
-- Rolling features
AVG(sales) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as ma_7,
AVG(sales) OVER (ORDER BY date ROWS BETWEEN 27 PRECEDING AND CURRENT ROW) as ma_28,
-- Other features
DAYOFWEEK(date) as day_of_week,
MONTH(date) as month,
QUARTER(date) as quarter
FROM sales_data;