Skip to main content

Quality Metrics

Functions for profiling and measuring data quality.


Overview

Quality metrics provide insight into the health and usability of your data. They answer fundamental questions:

  • How complete is my data?
  • How unique are my records?
  • How fresh is my data?
  • Are my columns consistent?

Core Metrics (7 functions)

anofox_metric_nullness

Measure percentage of NULL values in a column.

Signature:

anofox_metric_nullness(column)DECIMAL

Returns: Percentage of NULLs (0.0-100.0)

Speed: O(n) full table scan

Example:

SELECT
'email' as column_name,
anofox_metric_nullness(email) as null_percentage
FROM customers;

-- Output: 2.5 (2.5% of emails are NULL)

Interpretation:

  • 0%: No missing values (complete)
  • 1-5%: Mostly complete
  • 5-20%: Significant gaps
  • > 20%: Unusable column

anofox_metric_distinctness

Measure percentage of unique values in a column.

Signature:

anofox_metric_distinctness(column)DECIMAL

Returns: Percentage of unique values (0.0-100.0)

Speed: O(n) with dedup

Example:

SELECT
anofox_metric_distinctness(customer_id) as id_uniqueness,
anofox_metric_distinctness(country) as country_uniqueness
FROM customers;

-- Output:
-- id_uniqueness: 100.0 (each customer has unique ID)
-- country_uniqueness: 5.2 (only 5 countries represented)

Interpretation:

  • 100%: All unique (key column)
  • 50-100%: High diversity
  • 10-50%: Moderate categorization
  • < 10%: Heavily concentrated

anofox_metric_freshness

Measure days since most recent update.

Signature:

anofox_metric_freshness(date_column)DECIMAL

Returns: Age in days since MAX(date_column)

Speed: O(n)

Example:

SELECT
anofox_metric_freshness(updated_at) as days_since_update
FROM customers;

-- Output: 3 (most recent update was 3 days ago)

Interpretation:

  • 0-1 days: Real-time (fresh)
  • 1-7 days: Current (acceptable)
  • 7-30 days: Stale (aging)
  • > 30 days: Very stale (likely unused)

anofox_metric_volume

Get row count of table.

Signature:

anofox_metric_volume(table_name VARCHAR)INTEGER

Returns: Number of rows

Speed: O(1)

Example:

SELECT
anofox_metric_volume('customers') as customer_count,
anofox_metric_volume('transactions') as transaction_count;

-- Output:
-- customer_count: 125000
-- transaction_count: 8500000

anofox_metric_consistency

Measure format consistency across column.

Signature:

anofox_metric_consistency(column)DECIMAL

Returns: Percentage of values matching dominant format (0.0-100.0)

Speed: O(n)

Example:

SELECT
anofox_metric_consistency(phone) as phone_format_consistency
FROM customers;

-- Output: 75.5 (75.5% of phones follow same format)

Interpretation:

  • > 95%: Highly consistent
  • 80-95%: Mostly consistent
  • 50-80%: Inconsistent (may need normalization)
  • < 50%: Very inconsistent (data quality issue)

anofox_metric_schema_match

Check if two tables have compatible schemas.

Signature:

anofox_metric_schema_match(table1 VARCHAR, table2 VARCHAR)BOOLEAN

Returns: TRUE if schemas match

Speed: O(1)

Example:

SELECT anofox_metric_schema_match('customers', 'customers_backup');
-- Output: true (both have same columns and types)

Use Case: Validate before ETL or migration


anofox_metric_row_count

Get row count (alias for volume).

Signature:

anofox_metric_row_count(table_name VARCHAR)INTEGER

Returns: Number of rows

Speed: O(1)

Example:

SELECT anofox_metric_row_count('transactions');
-- Output: 8500000

Data Quality Profile

Create a comprehensive health check:

CREATE VIEW data_quality_profile AS
SELECT
'customers' as table_name,
anofox_metric_row_count('customers') as row_count,
anofox_metric_nullness(email) as email_nullness_pct,
anofox_metric_nullness(phone) as phone_nullness_pct,
anofox_metric_distinctness(customer_id) as id_uniqueness_pct,
anofox_metric_distinctness(country) as country_diversity_pct,
anofox_metric_freshness(updated_at) as days_since_update,
anofox_metric_consistency(phone) as phone_format_consistency_pct,
CURRENT_TIMESTAMP as profiled_at
FROM customers;

SELECT * FROM data_quality_profile;

Output:

table_name | row_count | email_nullness | phone_nullness | id_uniqueness | country_diversity | days_since_update | phone_consistency | profiled_at
customers | 125000 | 2.5 | 3.1 | 100.0 | 5.2 | 3 | 75.5 | 2025-01-22 14:30:00

Quality Monitoring Dashboard

Track quality over time:

-- Daily quality tracking
CREATE TABLE quality_metrics_daily (
profile_date DATE,
table_name VARCHAR,
row_count INTEGER,
null_rate_pct DECIMAL,
uniqueness_pct DECIMAL,
age_days DECIMAL,
consistency_pct DECIMAL
);

-- Insert daily snapshots
INSERT INTO quality_metrics_daily
SELECT
CURRENT_DATE as profile_date,
'customers' as table_name,
anofox_metric_row_count('customers') as row_count,
anofox_metric_nullness(email) as null_rate_pct,
anofox_metric_distinctness(customer_id) as uniqueness_pct,
anofox_metric_freshness(updated_at) as age_days,
anofox_metric_consistency(phone) as consistency_pct
FROM customers
LIMIT 1;

-- Monitor trends
SELECT
profile_date,
row_count,
null_rate_pct,
LAG(row_count) OVER (ORDER BY profile_date) as prior_row_count,
LAG(row_count) OVER (ORDER BY profile_date) - row_count as row_count_change
FROM quality_metrics_daily
ORDER BY profile_date DESC;

Quality Scoring

Create a composite quality score:

-- Calculate quality score (0-100)
CREATE VIEW data_quality_score AS
SELECT
'customers' as table_name,
ROUND(
(
-- Completeness: 25 points
CASE WHEN anofox_metric_nullness(email) &lt; 5 THEN 25 ELSE 0 END +
-- Uniqueness: 25 points
CASE WHEN anofox_metric_distinctness(customer_id) &gt; 95 THEN 25 ELSE 0 END +
-- Freshness: 25 points
CASE WHEN anofox_metric_freshness(updated_at) &lt; 7 THEN 25 ELSE 0 END +
-- Consistency: 25 points
CASE WHEN anofox_metric_consistency(phone) &gt; 80 THEN 25 ELSE 0 END
) / 100 * 100,
1
) as quality_score
FROM customers;

-- Grade the data
SELECT
table_name,
quality_score,
CASE
WHEN quality_score >= 90 THEN 'A (Excellent)'
WHEN quality_score >= 80 THEN 'B (Good)'
WHEN quality_score >= 70 THEN 'C (Fair)'
WHEN quality_score >= 60 THEN 'D (Poor)'
ELSE 'F (Failing)'
END as grade
FROM data_quality_score;

Alerting Rules

Set thresholds for automated alerts:

-- Create alerts for quality degradation
CREATE VIEW quality_alerts AS
SELECT
CASE
WHEN anofox_metric_nullness(email) &gt; 10 THEN 'Alert: Email nullness &gt; 10%'
ELSE 'OK'
END as email_alert,
CASE
WHEN anofox_metric_freshness(updated_at) &gt; 30 THEN 'Alert: Data is &gt; 30 days old'
ELSE 'OK'
END as freshness_alert,
CASE
WHEN anofox_metric_consistency(phone) &lt; 70 THEN 'Alert: Phone format &lt; 70% consistent'
ELSE 'OK'
END as consistency_alert
FROM customers
WHERE 1=1
AND anofox_metric_nullness(email) &gt; 10 OR anofox_metric_freshness(updated_at) &gt; 30
OR anofox_metric_consistency(phone) &lt; 70;

Quality Thresholds

Recommended thresholds:

MetricGreenYellowRed
Nullness< 5%5-10%> 10%
Uniqueness (ID)> 99%95-99%< 95%
Freshness< 1 day1-7 days> 7 days
Consistency> 90%70-90%< 70%
Row Count Growth0-10% daily10-50% daily> 50% daily

Best Practices

✅ Profile data before analysis
✅ Track metrics over time (trend analysis)
✅ Set quality thresholds appropriate for your domain
✅ Alert on degradation (not just absolute values)
✅ Review quality scores before using data for decisions


Next Steps

🍪 Cookie Settings