Skip to main content
KX Toolkit

A/B Test Significance Calculator

Did your A/B test actually win? Returns p-value, statistical significance, and minimum sample size for 95% confidence.

Calculators

About the A/B Test Significance Calculator

The A/B Test Significance Calculator tells you whether the apparent winner in your conversion test is a real result or random noise. Enter the visitors and conversions for each variant; the tool returns the p-value, confidence level, the absolute and relative lift, and whether you have reached the conventional 95% confidence threshold required to declare a winner.

The calculator also shows the minimum sample size required to detect your observed effect with statistical reliability. Most A/B tests are called too early - the team sees variant B winning by 10% at 500 visitors and ships, only to find the lift evaporates at 5,000. The minimum sample size view prevents this trap by setting a clear stop-point before you start the test.

Common use cases

  • Decide whether to ship a tested variant or keep iterating
  • Determine how long to run a test before evaluating
  • Audit past "winning" experiments to see if they were actually significant
  • Compare two-tailed vs one-tailed tests when you have directional priors

Tips for accurate results

Set your sample size before starting the test, then evaluate exactly once at the end. Peeking at the results daily and stopping when something looks significant inflates false-positive rates dramatically - what looks like a 95% confident win can have an actual 60% chance of being noise. The minimum sample size in the calculator is your stopping rule; respect it.

Privacy & data handling

The A/B Test Significance Calculator runs entirely in your browser. Nothing you enter is uploaded, logged, or shared with third parties - the math happens locally and your inputs disappear when you close the tab. There is no signup, no email collection, and no daily-use limit.

What does statistical significance mean?
A result is statistically significant when the probability of seeing it by random chance (the p-value) is below your chosen threshold - typically 5%. So 95% confidence means there's only a 5% chance the apparent difference between A and B was random noise. Below significance, you can't reliably tell the variants apart.
How many conversions do I need before I can call it?
A reasonable rule of thumb: at least 100 conversions per variant before significance is even meaningful, and at least 250 to make stable directional decisions. The tool shows you the minimum sample size required for your detected effect size and desired confidence level - usually larger than people expect.
Why does the result change when I add more visitors?
Because conversion rates fluctuate randomly in small samples. A test that looked significant at 500 visitors per variant may not be at 5,000 - the early "win" was noise. This is called peeking-bias. Pre-compute the sample size you need, run the test that long, then evaluate once. Don't check daily and stop on the first green.
One-tailed or two-tailed?
Two-tailed is the safe default - it tests whether B is different from A in either direction. One-tailed only tests if B is better, which doubles statistical power but is only justified when you've genuinely committed to "we will not launch B if it's worse, period." Most teams should stick with two-tailed.
What if my variants aren't significantly different?
You can't conclude they're the same - only that you don't have evidence of difference. The honest interpretation is: the effect, if any, is smaller than what your sample size could detect. Either run longer, accept the no-decision and pick the simpler variant, or move on to a higher-impact test.

No reviews yet

Be the first to share your experience with the A/B Test Significance Calculator.