Data Quality
Assess time series data quality across multiple dimensions before forecasting.
| Function | Description |
|---|---|
ts_data_quality | Multi-dimensional quality assessment per series |
ts_data_quality_summary | Aggregate quality metrics across all series |
Showing 2 of 2
anofox_fcst_ts_data_quality
Evaluates time series data across four quality dimensions: Structural, Temporal, Magnitude, and Behavioural.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
table_name | VARCHAR | Yes | Source table |
unique_id_col | ANY | Yes | Series identifier column |
date_col | DATE/TIMESTAMP/INTEGER | Yes | Temporal column |
value_col | DOUBLE | Yes | Value column |
n_short | INTEGER | No | Short series threshold (default: 30) |
frequency | VARCHAR/INTEGER | Yes | Time frequency ('1d', '1w') or integer step |
Output
| Column | Type | Description |
|---|---|---|
unique_id | ANY | Series identifier |
dimension | VARCHAR | Quality category |
metric | VARCHAR | Specific quality measure |
value | BIGINT | Count or numeric result |
value_pct | DOUBLE | Percentage (0-100) |
Quality Dimensions
Structural:
key_uniqueness- Duplicate (id, date) pairsid_cardinality- Total distinct series IDs
Temporal:
series_length- Row count per seriestimestamp_gaps- Missing timestamps based on frequencyseries_alignment- Count of distinct start/end datesfrequency_inference- Different inferred frequencies
Magnitude:
missing_values- NULL count and percentagevalue_bounds- Negative value countstatic_values- Flag for no variation (1=constant)
Behavioural:
intermittency- Zero/NULL percentage (sparseness)seasonality_check- Flag for seasonality detectedtrend_detection- Trend strength correlation (0-1)
Example
-- Assess quality with daily frequency
SELECT * FROM anofox_fcst_ts_data_quality(
'sales',
product_id,
date,
amount,
30,
'1d'
);
-- Find series with timestamp gaps
SELECT unique_id, value, value_pct
FROM anofox_fcst_ts_data_quality(
'sales', product_id, date, amount, 30, '1d'
)
WHERE dimension = 'Temporal'
AND metric = 'timestamp_gaps'
AND value > 0;
-- Find series with missing values
SELECT unique_id, value_pct
FROM anofox_fcst_ts_data_quality(
'sales', product_id, date, amount, 30, '1d'
)
WHERE dimension = 'Magnitude'
AND metric = 'missing_values'
AND value_pct > 5;
Use when:
- Auditing data before forecasting
- Identifying series requiring data preparation
- Building data quality dashboards
anofox_fcst_ts_data_quality_summary
Aggregates quality metrics across all series, grouped by dimension and metric, for dataset-level insights.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
table_name | VARCHAR | Yes | Source table |
unique_id_col | ANY | Yes | Series identifier column |
date_col | DATE/TIMESTAMP/INTEGER | Yes | Temporal column |
value_col | DOUBLE | Yes | Value column |
n_short | INTEGER | No | Short series threshold (default: 30) |
Example
-- Get dataset-level quality summary
SELECT * FROM anofox_fcst_ts_data_quality_summary(
'sales',
product_id,
date,
amount,
30
);
Use when:
- Identifying systemic data quality issues
- Reporting on overall dataset health
- Comparing quality across different datasets
Quality Assessment Workflow
-- 1. Run quality assessment
CREATE TABLE quality AS
SELECT * FROM anofox_fcst_ts_data_quality(
'sales_raw',
'product_id',
'date',
'amount',
60,
'1d'
);
-- 2. Get summary by dimension
SELECT dimension, metric, SUM(value) as total_issues
FROM quality
GROUP BY dimension, metric
ORDER BY dimension, total_issues DESC;
-- 3. Identify series needing attention
SELECT DISTINCT unique_id
FROM quality
WHERE (dimension = 'Temporal' AND metric = 'timestamp_gaps' AND value_pct > 10)
OR (dimension = 'Magnitude' AND metric = 'missing_values' AND value_pct > 5)
OR (dimension = 'Magnitude' AND metric = 'static_values' AND value = 1);
-- 4. Get dataset-level summary
SELECT * FROM anofox_fcst_ts_data_quality_summary(
'sales_raw', 'product_id', 'date', 'amount', 60
);
Quality Dimension Guide
| Dimension | What It Checks | Action If Issues Found |
|---|---|---|
| Structural | Duplicate keys | Deduplicate data |
| Temporal | Missing timestamps | Use gap filling functions |
| Magnitude | NULLs, constants | Impute or filter |
| Behavioural | Intermittency | Consider intermittent models |