Data Operations
Dataset comparison and diffing functions.
Quick Reference
| Function | Description | SQL Signature |
|---|---|---|
anofox_table_diff_hash | Hash-based diff | (table1, table2) -> TABLE |
anofox_table_diff_join | Join-based diff | (table1, table2, key_columns[]) -> TABLE |
Diffing Functions (2)
anofox_table_diff_hash
Hash-based table diff for change detection.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
table1 | VARCHAR | Yes | - | Source table name |
table2 | VARCHAR | Yes | - | Comparison table name |
Speed: O(n) per table, hash-based comparison
Output
| Column | Type | Description |
|---|---|---|
table_name | VARCHAR | Which table |
row_number | INTEGER | Row identifier |
change_type | VARCHAR | ADDED, MODIFIED, DELETED |
reason | VARCHAR | Explanation |
Example
SELECT * FROM anofox_table_diff_hash('customers', 'customers_backup');
Pros:
- Fast (hash-based)
- Works with any column types
- Good for "what changed?" queries
Cons:
- Doesn't show which columns changed
- Can't see detailed differences
anofox_table_diff_join
Join-based table diff with detailed column-level differences.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
table1 | VARCHAR | Yes | - | Source table |
table2 | VARCHAR | Yes | - | Comparison table |
key_columns | VARCHAR[] | Yes | - | Columns to join on |
Speed: O(n log n) join-based
Output
| Column | Type | Description |
|---|---|---|
key_column | ANY | Key value |
*_match | BOOLEAN | TRUE if column matches |
status | VARCHAR | SAME, DIFFERENT, ONLY_IN_TABLE1, ONLY_IN_TABLE2 |
Example
SELECT * FROM anofox_table_diff_join(
'customers',
'customers_backup',
ARRAY['customer_id']
);
Pros:
- Column-level detail
- See exactly which columns differ
- Better for data validation
Practical Patterns
Validate Before ETL
SELECT COUNT(*) as mismatch_count
FROM anofox_table_diff_join(
'raw_import',
'customers',
ARRAY['customer_id']
)
WHERE status != 'SAME';
Change Tracking
SELECT
customer_id,
'EMAIL_CHANGED' as change_type
FROM anofox_table_diff_join(
'customers_today',
'customers_yesterday',
ARRAY['customer_id']
)
WHERE email_match = FALSE;
Data Sync Verification
SELECT
COUNT(*) as total_rows,
COUNT(*) FILTER (WHERE status = 'SAME') as matching_rows,
COUNT(*) FILTER (WHERE status = 'DIFFERENT') as different_rows,
ROUND(100.0 * COUNT(*) FILTER (WHERE status = 'SAME') / COUNT(*), 2) as sync_percentage
FROM anofox_table_diff_join(
'customers_primary',
'customers_replica',
ARRAY['customer_id']
);
Migration Validation
SELECT
COUNT(*) as total_differences,
COUNT(*) FILTER (WHERE status = 'ONLY_IN_TABLE1') as lost_in_migration,
COUNT(*) FILTER (WHERE status = 'ONLY_IN_TABLE2') as added_in_migration,
COUNT(*) FILTER (WHERE status = 'DIFFERENT') as data_corruption
FROM anofox_table_diff_join(
'customers_legacy_system',
'customers_new_system',
ARRAY['customer_id']
)
WHERE status != 'SAME';
Combining with Validation
SELECT
diff.*,
anofox_vat_is_valid(new.vat_id) as vat_valid
FROM anofox_table_diff_join(
'customers_old',
'customers_new',
ARRAY['customer_id']
) diff
LEFT JOIN customers_new new USING (customer_id)
WHERE diff.status = 'DIFFERENT'
AND NOT anofox_vat_is_valid(new.vat_id);
Performance Considerations
Hash-Based (Faster)
-- Use when you just need "are they different?"
SELECT * FROM anofox_table_diff_hash('t1', 't2');
Best for: Large tables (10M+ rows), any column types
Join-Based (More Detail)
-- Use when you need to know what changed
SELECT * FROM anofox_table_diff_join('t1', 't2', ARRAY['id']);
Best for: Medium tables (< 10M rows), detailed reports