Inference & Hypothesis Testing
Understand statistical significance, p-values, and confidence intervals.
Hypothesis Testing Framework
The Question
"Is this coefficient really different from zero, or just by chance?"
Hypothesis Structure
- H₀ (Null): βⱼ = 0 (no effect)
- H₁ (Alternative): βⱼ ≠ 0 (there is an effect)
Test Statistic
t = β̂ⱼ / SE(β̂ⱼ)
Large t-statistic → Unlikely to occur by chance
P-Value Interpretation
p-value = Probability of observing this result (or more extreme) if H₀ is true
Common Thresholds
| p-value | Interpretation | Decision |
|---|---|---|
| < 0.001 | Very strong evidence | Reject H₀ ✅ |
| < 0.01 | Strong evidence | Reject H₀ ✅ |
| < 0.05 | Significant | Reject H₀ ✅ |
| < 0.10 | Marginally significant | Weak evidence |
| > 0.10 | Not significant | Fail to reject H₀ |
Key Point: p < 0.05 does NOT mean 95% probability the effect is real
Example: Marketing Coefficient Test
SELECT
coefficient[2] as spend_effect,
std_error[2] as se,
t_statistic[2] as t_stat,
p_value[2] as p_val,
confidence_interval_lower[2] as ci_lower,
confidence_interval_upper[2] as ci_upper
FROM anofox_statistics_ols_inference(
'marketing_data',
'revenue',
ARRAY['marketing_spend']
);
Output:
spend_effect | se | t_stat | p_val | ci_lower | ci_upper
-------------|------|--------|----------|----------|----------
3.5 | 0.15 | 23.3 | 0.000001 | 3.2 | 3.8
Interpretation:
- Each $1 spend → $3.50 revenue increase
- This effect is highly significant (p < 0.001)
- 95% confident true effect is between $3.20 and $3.80
Confidence Intervals
CI = β̂ ± t_critical × SE(β̂)
Interpretation
"If we repeated this study 100 times, the true parameter would fall in the confidence interval 95 times"
Confidence Levels
- 90% CI: Narrower, less confident
- 95% CI: Standard choice
- 99% CI: Wider, more confident
Prediction Intervals
Wider than confidence intervals—include individual variation
SELECT
point_estimate,
lower_80, -- 80% prediction interval
upper_80,
lower_95, -- 95% prediction interval
upper_95
FROM anofox_statistics_ols_predict_interval(
'marketing_data',
'revenue',
ARRAY['marketing_spend'],
new_data_point,
0.95
);
Confidence vs. Prediction
Confidence Interval: ±$50K (mean prediction uncertainty)
Prediction Interval: ±$150K (individual variation)
Multiple Testing Problem
Issue: More tests = higher chance of false positives
Example: Test 20 coefficients at α=0.05
- Expected false positives: 20 × 0.05 = 1
Solution: Bonferroni Correction
α_adjusted = 0.05 / number_of_tests = 0.0025 (20 tests)
Use stricter threshold for each test
Common Mistakes
❌ Mistake 1: Interpreting p-value as probability
Wrong: "95% probability the effect is real" Right: "If effect is truly zero, 5% chance of observing this or stronger"
❌ Mistake 2: Non-significant ≠ No effect
Wrong: "p=0.06 means no effect" Right: "Insufficient evidence; effect could exist with larger sample"
❌ Mistake 3: Ignoring practical significance
Wrong: "p<0.001, so effect is large" Right: "Test significant, but effect size is tiny (β=0.001)"
Effect Size
Always report effect size alongside p-values
-- Effect size: Standard deviation units
SELECT
coefficient[2] as effect_size,
p_value[2],
(coefficient[2] / stddev(x2)) as standardized_effect
FROM ...;
Next Steps
- Diagnostics — Check assumptions
- Basic Workflow — Hands-on example
- Prediction Intervals — Forecasting with uncertainty