Email Validation
Three modes: Regex, DNS, SMTP. Know when to use each.
The Three Email Validation Modes
Email validation is a classic trade-off between speed and accuracy.
Mode 1: Regex (Syntax Check)
Speed: <1ms per email
Accuracy: Syntax only (format is correct, domain may not exist)
Cost: Free
Use when: Need fast filtering of obviously malformed emails
Function
SELECT anofox_email_validate(email, 'regex') as is_valid
FROM users;
What It Checks
- Format:
local@domain.extension - Local part: Alphanumeric, dots, hyphens, underscores
- Domain: Valid DNS labels
- Extension: Valid TLD (.com, .org, .co.uk, etc.)
Examples
SELECT
email,
anofox_email_validate(email, 'regex') as is_valid
FROM (
SELECT 'john@example.com' as email
UNION ALL
SELECT 'jane.doe+tag@sub.domain.org'
UNION ALL
SELECT 'invalid.email@' -- Missing domain
UNION ALL
SELECT 'bad space@example.com' -- Space not allowed
UNION ALL
SELECT '@example.com' -- Missing local part
);
Output:
john@example.com | true
jane.doe+tag@sub.domain.org | true
invalid.email@ | false
bad space@example.com | false
@example.com | false
Use Cases
- ✅ Initial data load filtering
- ✅ Form validation (client-side and server-side)
- ✅ Email discovery (finding potential typos like
exampl.cominstead ofexample.com) - ✅ Bulk import cleanup
Mode 2: DNS (Domain Check)
Speed: ~100ms per email (network-dependent)
Accuracy: Domain + syntax (domain exists, has mail servers)
Cost: Network latency
Use when: Need moderate confidence; domain must exist
Function
SELECT anofox_email_validate(email, 'dns') as has_valid_mx
FROM users;
What It Checks
- Syntax (same as regex)
- Domain's DNS A/AAAA records (domain exists)
- Domain's MX records (mail servers registered for domain)
Examples
SELECT
email,
anofox_email_validate(email, 'regex') as syntax_valid,
anofox_email_validate(email, 'dns') as dns_valid
FROM (
SELECT 'john@example.com' as email -- Domain has MX records
UNION ALL
SELECT 'jane@thisdomain-does-not-exist-123456.com' -- Syntax OK, domain doesn't exist
UNION ALL
SELECT 'invalid.email@' -- Syntax fails, so DNS check skipped
);
Output:
john@example.com | true | true
jane@thisdomain-does-not-exist-123456.com | true | false
invalid.email@ | false | false
Use Cases
- ✅ Email list validation (B2B, B2C)
- ✅ CRM import (ensure customers have real email domains)
- ✅ Newsletter signup validation (prevent obviously fake domains)
- ✅ Fraud detection (reject emails from domains without mail infrastructure)
Important Notes
- Some domains use wildcard MX records and accept all emails
- Result may vary by DNS resolver (cached vs. uncached)
- Add retry logic for transient DNS failures
Mode 3: SMTP (Mailbox Check)
Speed: ~500ms per email (very slow)
Accuracy: Complete (domain + mailbox exists)
Cost: Very high latency, some servers block SMTP checks
Use when: Critical emails; can afford 500ms per email
Function
SELECT anofox_email_validate(email, 'smtp') as mailbox_exists
FROM users;
What It Checks
- Syntax (same as regex)
- Domain DNS (same as DNS mode)
- SMTP handshake: Connects to mail server, attempts HELO, attempts RCPT TO
- Returns true only if mail server accepts the email address
Examples
SELECT
email,
anofox_email_validate(email, 'smtp') as smtp_valid
FROM (
SELECT 'john@gmail.com' as email -- Gmail likely accepts
UNION ALL
SELECT 'noreply-12345@instagram.com' -- Likely rejects SMTP checks
UNION ALL
SELECT 'fake.email@real-domain.com' -- Domain exists, but mailbox doesn't
);
Output depends on mail server rules:
john@gmail.com | true (probably)
noreply-12345@instagram.com | false (likely blocked)
fake.email@real-domain.com | false (mailbox doesn't exist)
Use Cases
- ✅ Critical customer contact information (high-value B2B)
- ✅ Double opt-in verification (confirmation link sent to address)
- ✅ Payment/billing email (ensure invoice delivery)
- ⚠️ Limited use (many servers block SMTP checks)
Important Notes
- Many mail servers block SMTP checks (see SMTP blacklists)
- Timeout is ~500ms per email (can timeout)
- RFC 5321 compliance: SMTP check doesn't guarantee mailbox exists (servers can accept then bounce)
- Use with caution on third-party domains (may violate their policies)
Mode Comparison
| Mode | Speed | Syntax | Domain | Mailbox | False Negatives | False Positives |
|---|---|---|---|---|---|---|
| Regex | <1ms | ✅ | ❌ | ❌ | High (typos) | Very Low |
| DNS | ~100ms | ✅ | ✅ | ❌ | Medium | Low |
| SMTP | ~500ms | ✅ | ✅ | ✅ | Very Low | Very Low |
Progressive Validation Pipeline
Use all three modes strategically:
-- Step 1: Syntax pass/fail
CREATE TABLE step1_syntax AS
SELECT
customer_id,
email,
CASE
WHEN anofox_email_validate(email, 'regex') THEN 'PASS'
ELSE 'FAIL'
END as step1_result
FROM raw_customers;
-- Step 2: DNS check (only on passing emails)
CREATE TABLE step2_dns AS
SELECT
customer_id,
email,
step1_result,
CASE
WHEN step1_result = 'FAIL' THEN 'FAIL'
WHEN anofox_email_validate(email, 'dns') THEN 'PASS'
ELSE 'FAIL'
END as step2_result
FROM step1_syntax;
-- Step 3: SMTP check (only on passing emails, sample if > 1000)
CREATE TABLE step3_smtp AS
SELECT
customer_id,
email,
step2_result,
CASE
WHEN step2_result = 'FAIL' THEN 'FAIL'
WHEN MOD(customer_id, 10) = 0 THEN anofox_email_validate(email, 'smtp')::VARCHAR -- Sample 10%
ELSE NULL -- Skip SMTP for others (too slow)
END as step3_result
FROM step2_dns;
-- Final: Classify emails
SELECT
customer_id,
email,
CASE
WHEN step2_result = 'FAIL' THEN 'INVALID'
WHEN step3_result IS NOT NULL AND step3_result = 'false' THEN 'MAILBOX_ISSUE'
WHEN step3_result IS NULL THEN 'DNS_VERIFIED' -- Passed DNS, not SMTP checked
ELSE 'SMTP_VERIFIED'
END as verification_status
FROM step3_smtp;
Performance Tuning
1. Batch Processing
-- GOOD: Validate all at once
SELECT email, anofox_email_validate(email, 'dns') FROM users;
-- SLOW: Validate in a loop (avoid!)
FOR email IN (SELECT email FROM users):
validate_email_dns(email)
2. Sample Large Lists
-- For > 100k emails, sample 10% for SMTP
SELECT
customer_id,
email,
anofox_email_validate(email, 'smtp') as smtp_valid
FROM customers
WHERE RANDOM() < 0.1; -- Sample 10%
3. Cache Results
-- Don't re-validate the same email repeatedly
CREATE TABLE email_validation_cache AS
SELECT DISTINCT
email,
anofox_email_validate(email, 'dns') as is_valid
FROM raw_import;
-- Reuse for future imports
SELECT customers.*, cache.is_valid
FROM customers
LEFT JOIN email_validation_cache cache USING (email);
Common Patterns
Pattern 1: Validate on Import
-- Import new customers with validation
CREATE TABLE customers_imported AS
SELECT
customer_id,
email,
anofox_email_validate(email, 'dns') as email_valid
FROM external_import
WHERE anofox_email_validate(email, 'dns') IS TRUE;
-- Log failures
CREATE TABLE import_failures AS
SELECT
customer_id,
email,
'EMAIL_INVALID' as failure_reason
FROM external_import
WHERE anofox_email_validate(email, 'dns') IS NOT TRUE;
Pattern 2: Newsletter Signup Verification
-- User clicks signup link with email
UPDATE newsletter_signups
SET
email_verified = anofox_email_validate(email, 'dns'),
verified_at = CASE WHEN anofox_email_validate(email, 'dns') THEN NOW() ELSE NULL END
WHERE signup_id = ?;
Pattern 3: Re-validation on Update
-- When customer updates email, re-validate
UPDATE customers
SET
email = new_email,
email_verified = anofox_email_validate(new_email, 'dns'),
email_verified_at = CASE WHEN anofox_email_validate(new_email, 'dns') THEN NOW() ELSE NULL END
WHERE customer_id = ?;
Next Steps
- Validation Strategies — Understand regex vs. DNS vs. SMTP trade-offs
- Financial Compliance — Validate VAT and payment info
- Production Deployment — Scale email validation to production