Skip to main content

Tabular Extension

Data validation, quality metrics, and anomaly detection for enterprise workflows.

30-Second Example

-- Validate emails, VAT IDs, and monetary amounts
SELECT
customer_id,
email,
vat_id,
amount,
anofox_email_validate(email, 'dns') as email_valid,
anofox_vat_is_valid(vat_id) as vat_valid,
anofox_money_is_positive(amount) as amount_valid
FROM customers
WHERE anofox_email_validate(email, 'dns') IS TRUE
AND anofox_vat_is_valid(vat_id) IS TRUE
AND anofox_money_is_positive(amount) IS TRUE;

Output: Only valid records pass all three validation gates. Invalid data is flagged.


What You Get

Tabular is the data validation engine for the AnoFox unified workflow. It validates critical business data before it reaches forecasting and statistical analysis.

Use Tabular when you need to:

  • Validate business identifiers (emails, phone numbers, postal codes, VAT IDs)
  • Enforce financial data rules (currency formatting, positive amounts, exchange rates)
  • Detect anomalies and outliers in tabular data
  • Calculate data quality metrics (completeness, freshness, consistency)
  • Compare datasets to identify changes

The 8 Validation Modules

ModuleFunction CountPurpose
Email3Syntax, DNS MX records, SMTP verification
Postal & Address4Street validation, city/postal code matching
Phone9International number parsing, carrier verification
Money & Currency17Amount validation, currency conversion, arithmetic
VAT10International VAT/GST validation, EU verification
Data Quality7Nullness, distinctness, freshness, volume metrics
Anomaly Detection4Isolation Forest, DBSCAN, Z-Score, IQR methods
Data Diffing2Hash-based and join-based dataset comparison
Total: 57 Functions

Quick Navigation

Getting Started

  • Installation — Prerequisites, build from source, verification
  • Quickstart — 5-minute hands-on introduction

Understand the Concepts

Learn by Doing

Reference & API


Why Tabular Matters

In the AnoFox unified workflow, Tabular is the "Audit & Guard" stage. Before you forecast demand or analyze trends, your data must be trustworthy.

Poor data upstream breaks everything downstream:

  • Invalid emails → Failed notifications
  • Incorrect VAT IDs → Legal compliance failures
  • Anomalous values → Biased forecasts and wrong coefficients

Tabular stops bad data at the gate, so your forecasting and analytics pipelines run on clean, validated facts.


Next Steps

  1. Install Tabular on your DuckDB instance
  2. Run the Quickstart to validate your first dataset
  3. Explore Guides for your use case (email, VAT, anomalies, etc.)

Common Questions

Q: How does Tabular differ from DuckDB's built-in functions? A: Tabular provides domain-specific validation (VAT syntax, email MX records, phone carriers) that generic SQL can't do. It's purpose-built for business data.

Q: Can I use Tabular without Forecast or Statistics? A: Yes. Tabular is independent. Use it standalone for data quality gates.

Q: What's the performance overhead? A: DNS and SMTP checks add latency (~100-500ms per record). Use for critical validations; regex validation is instant.


Performance & Limits

  • Email validation (regex): <1ms per record, unlimited scale
  • Email validation (DNS): ~100ms per record, batch-friendly
  • Email validation (SMTP): ~500ms per record, use sparingly
  • Phone validation: ~5-10ms per record
  • VAT validation: ~2-5ms per record
  • Anomaly detection: Linear in dataset size, vectorized

Community & Support

🍪 Cookie Settings