Skip to main content

Email Validation

Three modes: Regex, DNS, SMTP. Know when to use each.


The Three Email Validation Modes

Email validation is a classic trade-off between speed and accuracy.


Mode 1: Regex (Syntax Check)

Speed: <1ms per email
Accuracy: Syntax only (format is correct, domain may not exist)
Cost: Free
Use when: Need fast filtering of obviously malformed emails

Function

SELECT anofox_email_validate(email, 'regex') as is_valid
FROM users;

What It Checks

  • Format: local@domain.extension
  • Local part: Alphanumeric, dots, hyphens, underscores
  • Domain: Valid DNS labels
  • Extension: Valid TLD (.com, .org, .co.uk, etc.)

Examples

SELECT
email,
anofox_email_validate(email, 'regex') as is_valid
FROM (
SELECT 'john@example.com' as email
UNION ALL
SELECT 'jane.doe+tag@sub.domain.org'
UNION ALL
SELECT 'invalid.email@' -- Missing domain
UNION ALL
SELECT 'bad space@example.com' -- Space not allowed
UNION ALL
SELECT '@example.com' -- Missing local part
);

Output:

john@example.com              | true
jane.doe+tag@sub.domain.org | true
invalid.email@ | false
bad space@example.com | false
@example.com | false

Use Cases

  • ✅ Initial data load filtering
  • ✅ Form validation (client-side and server-side)
  • ✅ Email discovery (finding potential typos like exampl.com instead of example.com)
  • ✅ Bulk import cleanup

Mode 2: DNS (Domain Check)

Speed: ~100ms per email (network-dependent)
Accuracy: Domain + syntax (domain exists, has mail servers)
Cost: Network latency
Use when: Need moderate confidence; domain must exist

Function

SELECT anofox_email_validate(email, 'dns') as has_valid_mx
FROM users;

What It Checks

  1. Syntax (same as regex)
  2. Domain's DNS A/AAAA records (domain exists)
  3. Domain's MX records (mail servers registered for domain)

Examples

SELECT
email,
anofox_email_validate(email, 'regex') as syntax_valid,
anofox_email_validate(email, 'dns') as dns_valid
FROM (
SELECT 'john@example.com' as email -- Domain has MX records
UNION ALL
SELECT 'jane@thisdomain-does-not-exist-123456.com' -- Syntax OK, domain doesn't exist
UNION ALL
SELECT 'invalid.email@' -- Syntax fails, so DNS check skipped
);

Output:

john@example.com                          | true  | true
jane@thisdomain-does-not-exist-123456.com | true | false
invalid.email@ | false | false

Use Cases

  • ✅ Email list validation (B2B, B2C)
  • ✅ CRM import (ensure customers have real email domains)
  • ✅ Newsletter signup validation (prevent obviously fake domains)
  • ✅ Fraud detection (reject emails from domains without mail infrastructure)

Important Notes

  • Some domains use wildcard MX records and accept all emails
  • Result may vary by DNS resolver (cached vs. uncached)
  • Add retry logic for transient DNS failures

Mode 3: SMTP (Mailbox Check)

Speed: ~500ms per email (very slow)
Accuracy: Complete (domain + mailbox exists)
Cost: Very high latency, some servers block SMTP checks
Use when: Critical emails; can afford 500ms per email

Function

SELECT anofox_email_validate(email, 'smtp') as mailbox_exists
FROM users;

What It Checks

  1. Syntax (same as regex)
  2. Domain DNS (same as DNS mode)
  3. SMTP handshake: Connects to mail server, attempts HELO, attempts RCPT TO
  4. Returns true only if mail server accepts the email address

Examples

SELECT
email,
anofox_email_validate(email, 'smtp') as smtp_valid
FROM (
SELECT 'john@gmail.com' as email -- Gmail likely accepts
UNION ALL
SELECT 'noreply-12345@instagram.com' -- Likely rejects SMTP checks
UNION ALL
SELECT 'fake.email@real-domain.com' -- Domain exists, but mailbox doesn't
);

Output depends on mail server rules:

john@gmail.com                 | true (probably)
noreply-12345@instagram.com | false (likely blocked)
fake.email@real-domain.com | false (mailbox doesn't exist)

Use Cases

  • ✅ Critical customer contact information (high-value B2B)
  • ✅ Double opt-in verification (confirmation link sent to address)
  • ✅ Payment/billing email (ensure invoice delivery)
  • ⚠️ Limited use (many servers block SMTP checks)

Important Notes

  • Many mail servers block SMTP checks (see SMTP blacklists)
  • Timeout is ~500ms per email (can timeout)
  • RFC 5321 compliance: SMTP check doesn't guarantee mailbox exists (servers can accept then bounce)
  • Use with caution on third-party domains (may violate their policies)

Mode Comparison

ModeSpeedSyntaxDomainMailboxFalse NegativesFalse Positives
Regex<1msHigh (typos)Very Low
DNS~100msMediumLow
SMTP~500msVery LowVery Low

Progressive Validation Pipeline

Use all three modes strategically:

-- Step 1: Syntax pass/fail
CREATE TABLE step1_syntax AS
SELECT
customer_id,
email,
CASE
WHEN anofox_email_validate(email, 'regex') THEN 'PASS'
ELSE 'FAIL'
END as step1_result
FROM raw_customers;

-- Step 2: DNS check (only on passing emails)
CREATE TABLE step2_dns AS
SELECT
customer_id,
email,
step1_result,
CASE
WHEN step1_result = 'FAIL' THEN 'FAIL'
WHEN anofox_email_validate(email, 'dns') THEN 'PASS'
ELSE 'FAIL'
END as step2_result
FROM step1_syntax;

-- Step 3: SMTP check (only on passing emails, sample if &gt; 1000)
CREATE TABLE step3_smtp AS
SELECT
customer_id,
email,
step2_result,
CASE
WHEN step2_result = 'FAIL' THEN 'FAIL'
WHEN MOD(customer_id, 10) = 0 THEN anofox_email_validate(email, 'smtp')::VARCHAR -- Sample 10%
ELSE NULL -- Skip SMTP for others (too slow)
END as step3_result
FROM step2_dns;

-- Final: Classify emails
SELECT
customer_id,
email,
CASE
WHEN step2_result = 'FAIL' THEN 'INVALID'
WHEN step3_result IS NOT NULL AND step3_result = 'false' THEN 'MAILBOX_ISSUE'
WHEN step3_result IS NULL THEN 'DNS_VERIFIED' -- Passed DNS, not SMTP checked
ELSE 'SMTP_VERIFIED'
END as verification_status
FROM step3_smtp;

Performance Tuning

1. Batch Processing

-- GOOD: Validate all at once
SELECT email, anofox_email_validate(email, 'dns') FROM users;

-- SLOW: Validate in a loop (avoid!)
FOR email IN (SELECT email FROM users):
validate_email_dns(email)

2. Sample Large Lists

-- For &gt; 100k emails, sample 10% for SMTP
SELECT
customer_id,
email,
anofox_email_validate(email, 'smtp') as smtp_valid
FROM customers
WHERE RANDOM() &lt; 0.1; -- Sample 10%

3. Cache Results

-- Don't re-validate the same email repeatedly
CREATE TABLE email_validation_cache AS
SELECT DISTINCT
email,
anofox_email_validate(email, 'dns') as is_valid
FROM raw_import;

-- Reuse for future imports
SELECT customers.*, cache.is_valid
FROM customers
LEFT JOIN email_validation_cache cache USING (email);

Common Patterns

Pattern 1: Validate on Import

-- Import new customers with validation
CREATE TABLE customers_imported AS
SELECT
customer_id,
email,
anofox_email_validate(email, 'dns') as email_valid
FROM external_import
WHERE anofox_email_validate(email, 'dns') IS TRUE;

-- Log failures
CREATE TABLE import_failures AS
SELECT
customer_id,
email,
'EMAIL_INVALID' as failure_reason
FROM external_import
WHERE anofox_email_validate(email, 'dns') IS NOT TRUE;

Pattern 2: Newsletter Signup Verification

-- User clicks signup link with email
UPDATE newsletter_signups
SET
email_verified = anofox_email_validate(email, 'dns'),
verified_at = CASE WHEN anofox_email_validate(email, 'dns') THEN NOW() ELSE NULL END
WHERE signup_id = ?;

Pattern 3: Re-validation on Update

-- When customer updates email, re-validate
UPDATE customers
SET
email = new_email,
email_verified = anofox_email_validate(new_email, 'dns'),
email_verified_at = CASE WHEN anofox_email_validate(new_email, 'dns') THEN NOW() ELSE NULL END
WHERE customer_id = ?;

Next Steps

🍪 Cookie Settings