Inference & Hypothesis Testing

Understand statistical significance, p-values, and confidence intervals.

Hypothesis Testing Framework

The Question

"Is this coefficient really different from zero, or just by chance?"

Hypothesis Structure

H₀ (Null): βⱼ = 0 (no effect)
H₁ (Alternative): βⱼ ≠ 0 (there is an effect)

Test Statistic

t = β̂ⱼ / SE(β̂ⱼ)

Large t-statistic → Unlikely to occur by chance

P-Value Interpretation

p-value = Probability of observing this result (or more extreme) if H₀ is true

Common Thresholds

p-value	Interpretation	Decision
< 0.001	Very strong evidence	Reject H₀ ✅
< 0.01	Strong evidence	Reject H₀ ✅
< 0.05	Significant	Reject H₀ ✅
< 0.10	Marginally significant	Weak evidence
> 0.10	Not significant	Fail to reject H₀

Key Point: p < 0.05 does NOT mean 95% probability the effect is real

Example: Marketing Coefficient Test

SELECT
    coefficient[2] as spend_effect,
    std_error[2] as se,
    t_statistic[2] as t_stat,
    p_value[2] as p_val,
    confidence_interval_lower[2] as ci_lower,
    confidence_interval_upper[2] as ci_upper
FROM anofox_statistics_ols_inference(
    'marketing_data',
    'revenue',
    ARRAY['marketing_spend']
);

Output:

spend_effect | se   | t_stat | p_val    | ci_lower | ci_upper
-------------|------|--------|----------|----------|----------
3.5          | 0.15 | 23.3   | 0.000001 | 3.2      | 3.8

Interpretation:

Each $1 spend → $3.50 revenue increase
This effect is highly significant (p < 0.001)
95% confident true effect is between $3.20 and $3.80

Confidence Intervals

CI = β̂ ± t_critical × SE(β̂)

Interpretation

"If we repeated this study 100 times, the true parameter would fall in the confidence interval 95 times"

Confidence Levels

90% CI: Narrower, less confident
95% CI: Standard choice
99% CI: Wider, more confident

Prediction Intervals

Wider than confidence intervals—include individual variation

SELECT
    point_estimate,
    lower_80,     -- 80% prediction interval
    upper_80,
    lower_95,     -- 95% prediction interval
    upper_95
FROM anofox_statistics_ols_predict_interval(
    'marketing_data',
    'revenue',
    ARRAY['marketing_spend'],
    new_data_point,
    0.95
);

Confidence vs. Prediction

Confidence Interval:   ±$50K  (mean prediction uncertainty)
Prediction Interval:   ±$150K (individual variation)

Multiple Testing Problem

Issue: More tests = higher chance of false positives

Example: Test 20 coefficients at α=0.05

Expected false positives: 20 × 0.05 = 1

Solution: Bonferroni Correction

α_adjusted = 0.05 / number_of_tests = 0.0025 (20 tests)

Use stricter threshold for each test

Common Mistakes

❌ Mistake 1: Interpreting p-value as probability

Wrong: "95% probability the effect is real" Right: "If effect is truly zero, 5% chance of observing this or stronger"

❌ Mistake 2: Non-significant ≠ No effect

Wrong: "p=0.06 means no effect" Right: "Insufficient evidence; effect could exist with larger sample"

❌ Mistake 3: Ignoring practical significance

Wrong: "p<0.001, so effect is large" Right: "Test significant, but effect size is tiny (β=0.001)"

Effect Size

Always report effect size alongside p-values

-- Effect size: Standard deviation units
SELECT
    coefficient[2] as effect_size,
    p_value[2],
    (coefficient[2] / stddev(x2)) as standardized_effect
FROM ...;

Next Steps

Diagnostics — Check assumptions
Basic Workflow — Hands-on example
Prediction Intervals — Forecasting with uncertainty

Hypothesis Testing Framework​

The Question​

Hypothesis Structure​

Test Statistic​

P-Value Interpretation​

Common Thresholds​

Example: Marketing Coefficient Test​

Confidence Intervals​

Interpretation​

Confidence Levels​

Prediction Intervals​

Confidence vs. Prediction​

Multiple Testing Problem​

Common Mistakes​

❌ Mistake 1: Interpreting p-value as probability​

❌ Mistake 2: Non-significant ≠ No effect​

❌ Mistake 3: Ignoring practical significance​

Effect Size​

Next Steps​