Tabular Extension
Data validation, quality metrics, and anomaly detection for enterprise workflows.
30-Second Example
-- Validate emails, VAT IDs, and monetary amounts
SELECT
customer_id,
email,
vat_id,
amount,
anofox_email_validate(email, 'dns') as email_valid,
anofox_vat_is_valid(vat_id) as vat_valid,
anofox_money_is_positive(amount) as amount_valid
FROM customers
WHERE anofox_email_validate(email, 'dns') IS TRUE
AND anofox_vat_is_valid(vat_id) IS TRUE
AND anofox_money_is_positive(amount) IS TRUE;
Output: Only valid records pass all three validation gates. Invalid data is flagged.
What You Get
Tabular is the data validation engine for the AnoFox unified workflow. It validates critical business data before it reaches forecasting and statistical analysis.
Use Tabular when you need to:
- Validate business identifiers (emails, phone numbers, postal codes, VAT IDs)
- Enforce financial data rules (currency formatting, positive amounts, exchange rates)
- Detect anomalies and outliers in tabular data
- Calculate data quality metrics (completeness, freshness, consistency)
- Compare datasets to identify changes
The 8 Validation Modules
| Module | Function Count | Purpose |
|---|---|---|
| 3 | Syntax, DNS MX records, SMTP verification | |
| Postal & Address | 4 | Street validation, city/postal code matching |
| Phone | 9 | International number parsing, carrier verification |
| Money & Currency | 17 | Amount validation, currency conversion, arithmetic |
| VAT | 10 | International VAT/GST validation, EU verification |
| Data Quality | 7 | Nullness, distinctness, freshness, volume metrics |
| Anomaly Detection | 4 | Isolation Forest, DBSCAN, Z-Score, IQR methods |
| Data Diffing | 2 | Hash-based and join-based dataset comparison |
| Total: 57 Functions |
Quick Navigation
Getting Started
- Installation — Prerequisites, build from source, verification
- Quickstart — 5-minute hands-on introduction
Understand the Concepts
- Understanding Data Quality — Six dimensions of data quality
- Validation Strategies — Syntax vs. semantic vs. external
- Anomaly Detection Methods — Statistical and ML approaches
- Data Profiling — Essential metrics for data health
Learn by Doing
- Basic Workflow — End-to-end validation pipeline
- Email Validation — Three modes: regex, DNS, SMTP
- Financial Compliance — B2B validation, VAT, multi-currency
- Production Deployment — Scaling, monitoring, CI/CD integration
Reference & API
- Function Finder — All 57 functions with quick lookup
- Validation Functions — Email, address, phone, VAT APIs
- Financial Functions — Money, currency, arithmetic functions
- Quality Metrics — Data profiling and health indicators
- Anomaly Detection — ML-based outlier methods
- Data Operations — Diffing and comparison functions
Why Tabular Matters
In the AnoFox unified workflow, Tabular is the "Audit & Guard" stage. Before you forecast demand or analyze trends, your data must be trustworthy.
Poor data upstream breaks everything downstream:
- Invalid emails → Failed notifications
- Incorrect VAT IDs → Legal compliance failures
- Anomalous values → Biased forecasts and wrong coefficients
Tabular stops bad data at the gate, so your forecasting and analytics pipelines run on clean, validated facts.
Next Steps
- Install Tabular on your DuckDB instance
- Run the Quickstart to validate your first dataset
- Explore Guides for your use case (email, VAT, anomalies, etc.)
Common Questions
Q: How does Tabular differ from DuckDB's built-in functions? A: Tabular provides domain-specific validation (VAT syntax, email MX records, phone carriers) that generic SQL can't do. It's purpose-built for business data.
Q: Can I use Tabular without Forecast or Statistics? A: Yes. Tabular is independent. Use it standalone for data quality gates.
Q: What's the performance overhead? A: DNS and SMTP checks add latency (~100-500ms per record). Use for critical validations; regex validation is instant.
Performance & Limits
- Email validation (regex): <1ms per record, unlimited scale
- Email validation (DNS): ~100ms per record, batch-friendly
- Email validation (SMTP): ~500ms per record, use sparingly
- Phone validation: ~5-10ms per record
- VAT validation: ~2-5ms per record
- Anomaly detection: Linear in dataset size, vectorized
Community & Support
- GitHub Discussions — Ask questions
- Discord Community — Chat with users
- Documentation Issues — Report docs bugs