Skip to main content

Understanding Data Quality

Data quality is the foundation of reliable business decisions.


What is Data Quality?

Data quality measures how fit your data is for its intended purpose. Poor data leads to poor decisions—bad forecasts, wrong analyses, failed campaigns, regulatory failures.

The six dimensions of data quality:


1. Accuracy

Definition: Data correctly represents the real-world entity it describes.

Examples:

  • Email "john@exampl.com" (typo) = inaccurate
  • Phone "+1-555-0123" (valid format, carrier verified) = accurate
  • VAT "DE999999999" (syntactically correct but not issued) = inaccurate

Check with Tabular:

SELECT email, anofox_email_validate(email, 'smtp') as smtp_verified
FROM customers;

2. Completeness

Definition: All required data is present; no critical fields are missing.

Examples:

  • Customer table missing 20% of phone numbers = 80% complete
  • All invoices have amounts = 100% complete
  • Required email field is NULL = incomplete

Check with Tabular:

SELECT
COUNT(*) as total,
COUNT(*) FILTER (WHERE phone IS NOT NULL) as phone_count,
ROUND(COUNT(*) FILTER (WHERE phone IS NOT NULL) * 100.0 / COUNT(*), 1) as completeness_pct
FROM contacts;

3. Consistency

Definition: Data is uniform in format and structure across the system.

Examples:

  • Phone numbers in different formats: "+1 555 0123", "555-0123", "5550123" = inconsistent
  • Currency always in EUR, but one record in USD = inconsistent
  • Customer IDs with leading zeros sometimes, sometimes without = inconsistent

Check with Tabular:

SELECT phone, anofox_phone_format(phone, 'US') as normalized
FROM contacts
GROUP BY phone;

4. Validity

Definition: Data conforms to required syntax and business rules.

Examples:

  • Email "john@example.com" (valid syntax, valid domain) = valid
  • Phone "+1-555-0123" (valid US number) = valid
  • VAT "DE123456789" (passes EU validation rules) = valid
  • Amount "-100.00" when amounts must be positive = invalid

Check with Tabular:

SELECT
vat_id,
anofox_vat_is_valid(vat_id) as vat_valid
FROM vendors;

5. Timeliness

Definition: Data is fresh, current, and available when needed.

Examples:

  • Customer address updated yesterday = timely
  • Inventory count from 6 months ago = stale, not timely
  • Real-time stock prices vs. end-of-day prices = different timeliness requirements

Check with Tabular:

SELECT
COUNT(*) as total,
COUNT(*) FILTER (WHERE last_updated > now() - interval '7 days') as updated_past_week,
MAX(last_updated) as most_recent
FROM customer_data;

6. Uniqueness

Definition: No unwanted duplicates; unique records are actually unique.

Examples:

  • Customer table with duplicate customer_id rows = not unique
  • Two invoices with the same invoice_number = not unique
  • Deduped email list (one email per person) = unique

Check with Tabular:

SELECT
COUNT(*) as total,
COUNT(DISTINCT customer_id) as unique_ids,
COUNT(*) - COUNT(DISTINCT customer_id) as duplicate_count
FROM customers;

The Cost of Poor Data

Business Impact

DimensionImpactExample
AccuracyWrong decisions, failed campaignsMarketing spend on fake emails → 0 ROI
CompletenessBiased analysis, missing segments20% missing phone numbers → incomplete customer profile
ConsistencyIntegration failures, schema mismatchesPhone format inconsistency → matching failures
ValidityRegulatory failures, system errorsInvalid VAT IDs → can't invoice EU customers
TimelinessStale insights, missed opportunitiesOld inventory data → stockouts or overstock
UniquenessReporting errors, double-countingDuplicate customer records → inflated churn metrics

The Cascade Effect

Poor Data → Bad Forecasts → Wrong Inventory → Stockouts/Overstock
→ Bad Analysis → Wrong Strategy → Lost Revenue
→ System Failures → Lost Trust → Churn

The Data Quality Workflow

Prevention → Detection → Correction → Monitoring

  1. Prevention: Validate data at the source (regex, DNS, VAT rules)
  2. Detection: Detect anomalies after ingestion (Z-Score, Isolation Forest)
  3. Correction: Flag and quarantine bad records
  4. Monitoring: Track quality metrics over time

Tabular helps with all four stages.


Next Steps

🍪 Cookie Settings