Anofox Tabular
AnoFox Tabular packs 78 SQL functions across 9 modules into a single DuckDB extension for data validation, PII detection, quality profiling, and anomaly detection. It validates emails via RFC 5322, DNS, and SMTP checks, verifies VAT IDs across 29 European countries, detects and masks 17 types of personally identifiable information, and runs 5 anomaly detection algorithms including Isolation Forest, DBSCAN, and explainable OutlierTree. In the AnoFox unified workflow, Tabular is the first stage -- it stops bad data before it reaches your forecasting and analytics pipelines.
What is Anofox Tabular?
Anofox Tabular is the data validation engine for enterprise workflows. Data validation refers to the systematic process of checking data against defined rules and constraints to ensure accuracy, completeness, and consistency. Validate business data, detect PII, enforce quality rules, and detect anomalies directly in DuckDB - before it reaches your forecasting and analytics pipelines.
Functions now use the anofox_tab_* prefix (e.g., anofox_tab_email_validate). Short aliases without the prefix are also available (e.g., email_validate).
Key Features
- 78 SQL Functions across 9 modules
- Validation - Email, phone, address, VAT, and financial validation
- PII Detection - Detect and mask 17 types of personal data (SSN, credit cards, IBAN, names, etc.)
- Data Quality Metrics - Nullness, distinctness, freshness, and consistency
- Anomaly Detection - Z-Score, IQR, Isolation Forest, DBSCAN, and OutlierTree (explainable)
- Financial Operations - Currency conversion, amount validation, and arithmetic
- Dataset Comparison - Hash-based and join-based table diffing
Quick Links
| Documentation | Description |
|---|---|
| Installation | Setup and prerequisites |
| Function Finder | Find the right function for your task |
| Validation | Email, phone, address, VAT validation |
| Financial | Money and currency functions |
| Quality | Data profiling and metrics |
| Anomaly | Outlier detection methods |
| PII Detection | Detect and mask personal data |
| Operations | Dataset comparison and diffing |
Module Summary
| Module | Functions | Description |
|---|---|---|
| Email Validation | 3 | RFC 5322, DNS, SMTP verification |
| Address Parsing | 4 | International address normalization (libpostal) |
| Phone Numbers | 9 | Google libphonenumber integration |
| Money & Currency | 17 | Multi-currency operations, 10 currencies |
| VAT Validation | 10 | European VAT compliance, 29 countries |
| PII Detection | 20 | Detect & mask 17 PII types |
| Quality Metrics | 8 | Volume, nulls, freshness, schema checks |
| Anomaly Detection | 5 | Isolation Forest, DBSCAN, OutlierTree |
| Data Diffing | 2 | Table comparison, migration validation |
Basic Usage
-- Load the extension
LOAD anofox_tabular;
-- Validate emails, VAT IDs, and monetary amounts
SELECT
customer_id,
email,
vat_id,
amount,
anofox_tab_email_validate(email, 'dns') as email_valid,
anofox_tab_vat_is_valid(vat_id) as vat_valid,
anofox_tab_money_is_positive(amount) as amount_valid
FROM customers
WHERE anofox_tab_email_validate(email, 'dns') IS TRUE
AND anofox_tab_vat_is_valid(vat_id) IS TRUE;
-- Detect PII in customer notes
SELECT
customer_id,
anofox_tab_pii_detect(notes) as pii_found,
anofox_tab_pii_mask(notes, 'redact') as safe_notes
FROM customers
WHERE anofox_tab_pii_contains(notes);
Only valid records pass all validation gates. Invalid data is flagged.
Why Anofox Tabular?
In the AnoFox unified workflow, Tabular is the "Audit & Guard" stage. Before you forecast demand or analyze trends, your data must be trustworthy.
Poor data upstream breaks everything downstream:
- Invalid emails = Failed notifications
- Incorrect VAT IDs = Legal compliance failures
- Exposed PII = Privacy violations and GDPR fines
- Anomalous values = Biased forecasts and wrong coefficients
Tabular stops bad data at the gate.
The module breakdown tells the story: 3 email functions (regex in under 1ms, DNS in ~100ms, SMTP in ~500ms), 4 address parsing functions powered by libpostal, 9 phone number functions built on Google's libphonenumber, 17 money and currency functions covering 10 currencies, 10 VAT functions for 29 EU countries, 20 PII detection functions across 13 pattern-based and 4 NER-based entity types, 8 quality metrics for nullness/distinctness/freshness/consistency, 5 anomaly detectors, and 2 dataset comparison functions. Every function runs in-database with no external service dependencies for offline-capable validation.
Frequently Asked Questions
Can I use Tabular without the other AnoFox extensions?
Yes. Each AnoFox extension is independent. You can install and use Tabular on its own for data validation and quality checks. The extensions complement each other in a unified workflow, but none require the others to function.
Does email validation require an internet connection?
It depends on the validation level. RFC 5322 syntax validation ('syntax') runs entirely offline. DNS validation ('dns') requires network access to verify MX records. SMTP validation ('smtp') connects to the mail server for full deliverability checks. Choose the level that matches your environment.
How does PII detection work and what types does it catch?
PII detection uses a combination of pattern-based matching (regex for SSNs, credit card numbers, IBANs, phone numbers, etc.) and named entity recognition (NER) for names, addresses, and organizations. It detects 17 PII types in total. You can detect PII with anofox_tab_pii_detect() and mask it with anofox_tab_pii_mask() in a single SQL query.
Which anomaly detection method should I use?
Use Z-Score or IQR for fast univariate outlier detection on normally distributed data. Use Isolation Forest for multivariate anomalies in high-dimensional datasets. Use DBSCAN for density-based clustering of anomalies. Use OutlierTree when you need explainable anomaly detection with human-readable reasoning for each flagged record.