About the Statistical Power Calculator
The Statistical Power Calculator computes required sample size (or achievable power, or minimum detectable effect, or the alpha you implicitly use) for four common designs: t-tests (one-sample, paired, two independent samples), two-proportion z-tests (A/B / conversion tests), one-way ANOVA, and chi-squared. Effect-size helpers translate Cohen’s benchmarks (small / medium / large for d, h, f, w) into the parameters each test actually needs.
It is built for growth marketers planning A/B tests before they spend the traffic budget, researchers running grant-required a-priori power analyses, clinical trialists computing N for proposed RCTs, product analysts justifying experiment runtime to leadership, and statistics students learning that effect size matters more than they thought (halving the effect quadruples N).
All calculations run locally in JavaScript. Test selection, alpha, power target, and effect-size inputs never leave your device. The page makes no network call after first load. Pre-experiment power analysis often encodes the strategic hypotheses driving a business; the calculator never sees them.
Results match G*Power and the R pwr package to three decimals on standard designs. For sequential analysis (early-stopping with adjusted alpha), multilevel / hierarchical models, longitudinal / repeated-measures designs, or Bayesian alternatives, use specialized R packages (lme4, Stan / brms) instead. Don’t game alpha by inflating power; convention is α = 0.05 / power = 0.80, and any deviation should be justified in advance, not discovered after the data is in.
Running multiple pairwise tests inflates type-I error. Bonferroni and Holm correct the per-test alpha.
Run interim analyses with O’Brien-Fleming alpha-spending.
Translate frequentist power into evidence thresholds (Bayes factor > 3, 10, 30).
How to Use the Power Calculator
Pick the test that matches your design. Choose what you want to solve for — sample size, power, or minimum detectable effect. Set alpha and the tails. For tests beyond simple t, the right extra inputs appear automatically (proportions for proportions z, number of groups for ANOVA, df for chi-squared). The hero card shows the required per-group sample size; the secondary cards show total N, the critical statistic, and the noncentrality.
Alpha, Beta, Power — The Three Error Rates
Alpha (α) is the type-I error rate: probability of finding a significant effect when there isn’t one. Beta (β) is the type-II error rate: probability of missing a real effect. Power equals 1 − β: probability of detecting a real effect. Convention pegs α at 0.05 and target power at 0.80. Reducing one inflates the other unless N grows.
Cohen's Effect Size Benchmarks Explained
- t-tests (d): 0.2 small, 0.5 medium, 0.8 large.
- Proportions (h): 0.2 small, 0.5 medium, 0.8 large.
- ANOVA (f): 0.10 small, 0.25 medium, 0.40 large.
- Chi-squared (w): 0.1 small, 0.3 medium, 0.5 large.
- Correlation (r): 0.1 small, 0.3 medium, 0.5 large.
The Effect Size Trap — Small Effects, Huge Samples
Required sample size grows roughly inversely with effect-size squared. Detecting a Cohen’s d of 0.2 takes about 4× the sample of d = 0.4, and 16× the sample of d = 0.8. Industry A/B tests of click-through rates often want to detect 1% relative lifts — h ≈ 0.02 — which can require hundreds of thousands per arm. Run the calculation before launching the experiment, not after.
One-Tailed vs. Two-Tailed Reasoning
Two-tailed tests detect effects in either direction. One-tailed tests detect effects only in a pre-specified direction and roughly halve the required sample size — but at the cost of zero ability to detect effects in the opposite direction, and the loss of statistical legitimacy in many fields. Reviewers and journals routinely demand two-tailed tests for novel claims.
Power Analysis for A/B Testing in Industry
The proportion test is the workhorse of conversion-rate experimentation. Treat baseline conversion as p1 and the lift you care about as p2. The tool computes Cohen’s h via the arcsine transform and returns per-arm sample size. Common pitfalls: underestimating real-world variance (week-of-month effects), confounding new-user vs. returning-user populations, and peeking at results — which inflates the effective alpha. Use the Pro sequential-test mode for legitimate early stopping.
When Pre-Registered Power Analyses Improve Research
Registering a power analysis in advance (with effect size, alpha, target power) prevents the post-hoc reasoning that produces irreproducible findings. The Open Science Framework and Center for Open Science both encourage pre-registration; major medical journals require it for trials. The tool’s exported result, including effect size and N, is appropriate for a pre-registration document.
For descriptive statistics, see the Statistics Calculator. All Math & Science tools.
Frequently Asked Questions
What is statistical power?
Power is the probability of correctly detecting a real effect — 1 minus the type II error rate. Conventional target is 0.80, meaning 80% chance of statistically significant results when the true effect exists at the assumed size.
What is a medium effect size?
Cohen benchmarks: d = 0.2 small, 0.5 medium, 0.8 large for t-tests; h = 0.2 / 0.5 / 0.8 for proportions; f = 0.10 / 0.25 / 0.40 for ANOVA. Real research often produces small-to-medium effects.
Should I run a one-tailed or two-tailed test?
Two-tailed unless you have a strong theoretical reason to predict direction in advance. One-tailed nearly halves required sample size but is considered weaker evidence.
How does sample size scale with effect size?
Roughly inversely as the square of effect size. Halving the effect quadruples required sample. Tiny effects need enormous samples, which is why marketing A/B tests detecting 1% lift need millions of users.
Does this replace G*Power or R pwr package?
For standard tests, yes — it matches G*Power to three decimals. For advanced designs like sequential analysis, multilevel models, or longitudinal data, use G*Power or specialized R packages.