Skip to main content

Data Quality

Assess time series data quality across multiple dimensions before forecasting.

FunctionDescription
ts_data_quality_byMulti-dimensional quality scores per series
ts_data_quality_summaryAggregate quality metrics across all series
Showing 2 of 2

ts_data_quality_by

Evaluates time series data across four quality dimensions, returning per-series quality scores. Also aliased as ts_data_quality.

Parameters

ParameterTypeRequiredDescription
sourceVARCHARYesSource table name
group_colCOLUMNYesSeries identifier column (unquoted)
date_colCOLUMNYesTemporal column (unquoted)
value_colCOLUMNYesValue column (unquoted)
n_shortINTEGERNoShort series threshold (default: 30)
frequencyVARCHARYesTime frequency ('1d', '1w', etc.)

Output

Returns one row per series with quality scores:

ColumnTypeDescription
unique_id(input)Series identifier
structural_scoreDOUBLEData completeness (0-1, higher=better)
temporal_scoreDOUBLETimestamp regularity (0-1, higher=better)
magnitude_scoreDOUBLEValue distribution health (0-1, higher=better)
behavioral_scoreDOUBLEPredictability (0-1, higher=better)
overall_scoreDOUBLEOverall quality (0-1, higher=better)
n_gapsUBIGINTGap count
n_missingUBIGINTMissing values
is_constantBOOLEANConstant series?

Quality Dimensions

DimensionScoreWhat It Checks
Structuralstructural_scoreData completeness, duplicate keys
Temporaltemporal_scoreTimestamp regularity, gaps, alignment
Magnitudemagnitude_scoreNULLs, constants, value distribution
Behaviouralbehavioral_scoreIntermittency, seasonality, trend
Overalloverall_scoreWeighted combination of all dimensions

Examples

-- Assess quality with daily frequency
SELECT * FROM ts_data_quality_by(
'sales',
product_id,
date,
amount,
30,
'1d'
);

-- Find high-quality series only
SELECT unique_id
FROM ts_data_quality_by('sales', product_id, date, amount, 10, '1d')
WHERE overall_score > 0.8;

-- Find series with temporal issues
SELECT unique_id, temporal_score, n_gaps
FROM ts_data_quality_by('sales', product_id, date, amount, 30, '1d')
WHERE temporal_score < 0.7;

-- Find series needing attention
SELECT unique_id, overall_score, n_gaps, n_missing, is_constant
FROM ts_data_quality_by('sales', product_id, date, amount, 30, '1d')
WHERE overall_score < 0.5
OR is_constant
OR n_gaps > 10;

Use when:

  • Auditing data before forecasting
  • Identifying series requiring data preparation
  • Building data quality dashboards

ts_data_quality_summary

Aggregates quality metrics across all series for dataset-level insights.

Parameters

ParameterTypeRequiredDescription
sourceVARCHARYesSource table name
group_colCOLUMNYesSeries identifier column (unquoted)
date_colCOLUMNYesTemporal column (unquoted)
value_colCOLUMNYesValue column (unquoted)
n_shortINTEGERNoShort series threshold (default: 30)

Example

-- Get dataset-level quality summary
SELECT * FROM ts_data_quality_summary(
'sales',
product_id,
date,
amount,
30
);

Use when:

  • Identifying systemic data quality issues
  • Reporting on overall dataset health
  • Comparing quality across different datasets

Quality Assessment Workflow

-- 1. Run quality assessment
CREATE TABLE quality AS
SELECT * FROM ts_data_quality_by(
'sales_raw',
product_id,
date,
amount,
60,
'1d'
);

-- 2. Summarize by score dimension
SELECT
AVG(structural_score) AS avg_structural,
AVG(temporal_score) AS avg_temporal,
AVG(magnitude_score) AS avg_magnitude,
AVG(behavioral_score) AS avg_behavioral,
AVG(overall_score) AS avg_overall
FROM quality;

-- 3. Identify series needing attention
SELECT unique_id, overall_score, n_gaps, n_missing
FROM quality
WHERE overall_score < 0.5
ORDER BY overall_score;

-- 4. Get dataset-level summary
SELECT * FROM ts_data_quality_summary(
'sales_raw', product_id, date, amount, 60
);

Quality Dimension Guide

DimensionWhat It ChecksAction If Issues Found
StructuralDuplicate keys, completenessDeduplicate data
TemporalMissing timestamps, gapsUse gap filling functions
MagnitudeNULLs, constantsImpute or filter
BehaviouralIntermittency, patternsConsider intermittent models

🍪 Cookie Settings