Data Quality
Assess time series data quality across multiple dimensions before forecasting.
| Function | Description |
|---|---|
ts_data_quality_by | Multi-dimensional quality scores per series |
ts_data_quality_summary | Aggregate quality metrics across all series |
Showing 2 of 2
ts_data_quality_by
Evaluates time series data across four quality dimensions, returning per-series quality scores. Also aliased as ts_data_quality.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
source | VARCHAR | Yes | Source table name |
group_col | COLUMN | Yes | Series identifier column (unquoted) |
date_col | COLUMN | Yes | Temporal column (unquoted) |
value_col | COLUMN | Yes | Value column (unquoted) |
n_short | INTEGER | No | Short series threshold (default: 30) |
frequency | VARCHAR | Yes | Time frequency ('1d', '1w', etc.) |
Output
Returns one row per series with quality scores:
| Column | Type | Description |
|---|---|---|
unique_id | (input) | Series identifier |
structural_score | DOUBLE | Data completeness (0-1, higher=better) |
temporal_score | DOUBLE | Timestamp regularity (0-1, higher=better) |
magnitude_score | DOUBLE | Value distribution health (0-1, higher=better) |
behavioral_score | DOUBLE | Predictability (0-1, higher=better) |
overall_score | DOUBLE | Overall quality (0-1, higher=better) |
n_gaps | UBIGINT | Gap count |
n_missing | UBIGINT | Missing values |
is_constant | BOOLEAN | Constant series? |
Quality Dimensions
| Dimension | Score | What It Checks |
|---|---|---|
| Structural | structural_score | Data completeness, duplicate keys |
| Temporal | temporal_score | Timestamp regularity, gaps, alignment |
| Magnitude | magnitude_score | NULLs, constants, value distribution |
| Behavioural | behavioral_score | Intermittency, seasonality, trend |
| Overall | overall_score | Weighted combination of all dimensions |
Examples
-- Assess quality with daily frequency
SELECT * FROM ts_data_quality_by(
'sales',
product_id,
date,
amount,
30,
'1d'
);
-- Find high-quality series only
SELECT unique_id
FROM ts_data_quality_by('sales', product_id, date, amount, 10, '1d')
WHERE overall_score > 0.8;
-- Find series with temporal issues
SELECT unique_id, temporal_score, n_gaps
FROM ts_data_quality_by('sales', product_id, date, amount, 30, '1d')
WHERE temporal_score < 0.7;
-- Find series needing attention
SELECT unique_id, overall_score, n_gaps, n_missing, is_constant
FROM ts_data_quality_by('sales', product_id, date, amount, 30, '1d')
WHERE overall_score < 0.5
OR is_constant
OR n_gaps > 10;
Use when:
- Auditing data before forecasting
- Identifying series requiring data preparation
- Building data quality dashboards
ts_data_quality_summary
Aggregates quality metrics across all series for dataset-level insights.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
source | VARCHAR | Yes | Source table name |
group_col | COLUMN | Yes | Series identifier column (unquoted) |
date_col | COLUMN | Yes | Temporal column (unquoted) |
value_col | COLUMN | Yes | Value column (unquoted) |
n_short | INTEGER | No | Short series threshold (default: 30) |
Example
-- Get dataset-level quality summary
SELECT * FROM ts_data_quality_summary(
'sales',
product_id,
date,
amount,
30
);
Use when:
- Identifying systemic data quality issues
- Reporting on overall dataset health
- Comparing quality across different datasets
Quality Assessment Workflow
-- 1. Run quality assessment
CREATE TABLE quality AS
SELECT * FROM ts_data_quality_by(
'sales_raw',
product_id,
date,
amount,
60,
'1d'
);
-- 2. Summarize by score dimension
SELECT
AVG(structural_score) AS avg_structural,
AVG(temporal_score) AS avg_temporal,
AVG(magnitude_score) AS avg_magnitude,
AVG(behavioral_score) AS avg_behavioral,
AVG(overall_score) AS avg_overall
FROM quality;
-- 3. Identify series needing attention
SELECT unique_id, overall_score, n_gaps, n_missing
FROM quality
WHERE overall_score < 0.5
ORDER BY overall_score;
-- 4. Get dataset-level summary
SELECT * FROM ts_data_quality_summary(
'sales_raw', product_id, date, amount, 60
);
Quality Dimension Guide
| Dimension | What It Checks | Action If Issues Found |
|---|---|---|
| Structural | Duplicate keys, completeness | Deduplicate data |
| Temporal | Missing timestamps, gaps | Use gap filling functions |
| Magnitude | NULLs, constants | Impute or filter |
| Behavioural | Intermittency, patterns | Consider intermittent models |