Skip to main content

Inference & Hypothesis Testing

Understand statistical significance, p-values, and confidence intervals.

Hypothesis Testing Framework

The Question

"Is this coefficient really different from zero, or just by chance?"

Hypothesis Structure

  • H₀ (Null): βⱼ = 0 (no effect)
  • H₁ (Alternative): βⱼ ≠ 0 (there is an effect)

Test Statistic

t = β̂ⱼ / SE(β̂ⱼ)

Large t-statistic → Unlikely to occur by chance


P-Value Interpretation

p-value = Probability of observing this result (or more extreme) if H₀ is true

Common Thresholds

p-valueInterpretationDecision
< 0.001Very strong evidenceReject H₀ ✅
< 0.01Strong evidenceReject H₀ ✅
< 0.05SignificantReject H₀ ✅
< 0.10Marginally significantWeak evidence
> 0.10Not significantFail to reject H₀

Key Point: p < 0.05 does NOT mean 95% probability the effect is real


Example: Marketing Coefficient Test

SELECT
coefficient[2] as spend_effect,
std_error[2] as se,
t_statistic[2] as t_stat,
p_value[2] as p_val,
confidence_interval_lower[2] as ci_lower,
confidence_interval_upper[2] as ci_upper
FROM anofox_statistics_ols_inference(
'marketing_data',
'revenue',
ARRAY['marketing_spend']
);

Output:

spend_effect | se   | t_stat | p_val    | ci_lower | ci_upper
-------------|------|--------|----------|----------|----------
3.5 | 0.15 | 23.3 | 0.000001 | 3.2 | 3.8

Interpretation:

  • Each $1 spend → $3.50 revenue increase
  • This effect is highly significant (p < 0.001)
  • 95% confident true effect is between $3.20 and $3.80

Confidence Intervals

CI = β̂ ± t_critical × SE(β̂)

Interpretation

"If we repeated this study 100 times, the true parameter would fall in the confidence interval 95 times"

Confidence Levels

  • 90% CI: Narrower, less confident
  • 95% CI: Standard choice
  • 99% CI: Wider, more confident

Prediction Intervals

Wider than confidence intervals—include individual variation

SELECT
point_estimate,
lower_80, -- 80% prediction interval
upper_80,
lower_95, -- 95% prediction interval
upper_95
FROM anofox_statistics_ols_predict_interval(
'marketing_data',
'revenue',
ARRAY['marketing_spend'],
new_data_point,
0.95
);

Confidence vs. Prediction

Confidence Interval:   ±$50K  (mean prediction uncertainty)
Prediction Interval: ±$150K (individual variation)

Multiple Testing Problem

Issue: More tests = higher chance of false positives

Example: Test 20 coefficients at α=0.05

  • Expected false positives: 20 × 0.05 = 1

Solution: Bonferroni Correction

α_adjusted = 0.05 / number_of_tests = 0.0025 (20 tests)

Use stricter threshold for each test


Common Mistakes

❌ Mistake 1: Interpreting p-value as probability

Wrong: "95% probability the effect is real" Right: "If effect is truly zero, 5% chance of observing this or stronger"

❌ Mistake 2: Non-significant ≠ No effect

Wrong: "p=0.06 means no effect" Right: "Insufficient evidence; effect could exist with larger sample"

❌ Mistake 3: Ignoring practical significance

Wrong: "p<0.001, so effect is large" Right: "Test significant, but effect size is tiny (β=0.001)"


Effect Size

Always report effect size alongside p-values

-- Effect size: Standard deviation units
SELECT
coefficient[2] as effect_size,
p_value[2],
(coefficient[2] / stddev(x2)) as standardized_effect
FROM ...;

Next Steps

🍪 Cookie Settings