P-Value and Confidence Interval Over Time

Delivering Growth

Daily Data Entry

/
/

Configuration

95%
90%99%

Enter as percentage points (e.g., 0.8 for 0.8% absolute increase)

How it works

Watch a demo of the P-Value and Confidence Interval Over Time Calculator ⤴

What is this?

This tool helps you visualize how both the p-value and the confidence interval of your experiment evolve over time as you accumulate more data.

Why use it?

P-values can fluctuate wildly in the early days of an experiment. Confidence intervals give you an additional lens: they help you understand not just if there's a statistically significant difference, but how large that difference might plausibly be.

Together, these visualizations help you interpret:

  • Whether your test is stabilizing
  • How confident you can be in the observed lift
  • Whether your uplift is meaningful or likely noise

This is especially useful in borderline, underpowered, or long-running experiments, when you need to decide whether to stop, extend, or rerun.

How it works

Each day, you enter the number of conversions and visitors for both the control and variant groups. The calculator:

  • Aggregates data cumulatively by day
  • Calculates the p-value using a two-proportion z-test
  • Calculates the confidence interval for the difference in conversion rates using the Wald method

It generates two charts:

  1. P-Value Over Time
    • A line showing the p-value on each day
    • A red dashed horizontal line for your significance threshold (e.g. 0.05 for 95%)
  2. Confidence Interval Over Time
    • A line showing the observed difference in conversion rates (variant - control)
    • A shaded band showing the confidence interval for that difference
    • A gray dashed horizontal line at 0 (no difference)
    • Purple dashed lines showing your Minimum Detectable Effect (MDE) reference at +MDE+MDE and MDE-MDE to help you visualize whether your observed effect is meaningful relative to your design target

About the MDE Reference

The MDE (Minimum Detectable Effect) Reference is a visual aid that helps you interpret your results relative to the effect size you originally designed your experiment to detect.

When you enter an MDE value (as an absolute percentage, e.g., 0.1 for 0.1 percentage points), purple dashed horizontal lines appear on the Confidence Interval chart at +MDE+MDE and MDE-MDE. This helps you see:

  • Whether your observed effect is larger, smaller, or similar to what you planned to detect
  • If your confidence interval spans beyond the MDE lines, suggesting the true effect could be meaningfully different from your target
  • Whether your experiment was appropriately powered for the effect you're actually seeing

Important: The MDE reference lines are purely for visualization and do not affect the statistical calculations (p-values or confidence intervals). They help you contextualize your results but are not used in any statistical tests.

What to look for

  • If the p-value drops below the threshold, your result is statistically significant at that time point.
  • If the confidence interval excludes zero, the difference is statistically significant, and the interval tells you how large the effect might realistically be.

What this doesn't mean

Crossing the p-value or CI threshold once doesn't mean the result is conclusive. Always pair these insights with:

  • Good experiment design
  • No peeking / minimal monitoring bias
  • Reasonable power and sample size
  • External context and business judgment

Formulas used

P-Value (Two-Proportion Z-Test)

Z=p1p2p(1p)(1n1+1n2)Z = \frac{p_1 - p_2}{\sqrt{p(1 - p) \left( \frac{1}{n_1} + \frac{1}{n_2} \right)}}

Where:

  • p1p_1, p2p_2 = observed conversion rates
  • pp = pooled conversion rate
  • n1n_1, n2n_2 = sample sizes

Confidence Interval (Wald Method)

CI=(p^1p^2)±Zα/2SECI = (\hat{p}_1 - \hat{p}_2) \pm Z_{\alpha/2} \cdot SE

Where:

  • p^1p^2\hat{p}_1 - \hat{p}_2 = observed lift
  • Zα/2Z_{\alpha/2} = z-score for desired confidence
  • SESE = standard error of the difference in proportions

Use this calculator to get both statistical significance and directional confidence as your experiment evolves.

Community

Need help implementing experiments?

Turn insights from this simulator or calculator into real results. The Delivering Growth Community (free to join) helps PMs, engineers, and founders build experimentation systems that drive conversion, activation, and retention. You'll learn to do this without bloated tooling or siloed teams.

  • ✅ Guidance on A/B testing infrastructure and reliable experiments
  • ✅ Code templates and patterns from top Growth teams
  • ✅ Community of growth practitioners sharing wins and strategies
Join for Free