Statistical Power Calculator

EVT·T102

Sample Size

About the Statistical Power Calculator

The Statistical Power Calculator computes required sample size (or achievable power, or minimum detectable effect, or the alpha you implicitly use) for four common designs: t-tests (one-sample, paired, two independent samples), two-proportion z-tests (A/B / conversion tests), one-way ANOVA, and chi-squared. Effect-size helpers translate Cohen’s benchmarks (small / medium / large for d, h, f, w) into the parameters each test actually needs.

It is built for growth marketers planning A/B tests before they spend the traffic budget, researchers running grant-required a-priori power analyses, clinical trialists computing N for proposed RCTs, product analysts justifying experiment runtime to leadership, and statistics students learning that effect size matters more than they thought (halving the effect quadruples N).

All calculations run locally in JavaScript. Test selection, alpha, power target, and effect-size inputs never leave your device. The page makes no network call after first load. Pre-experiment power analysis often encodes the strategic hypotheses driving a business; the calculator never sees them.

Results match G*Power and the R pwr package to three decimals on standard designs. For sequential analysis (early-stopping with adjusted alpha), multilevel / hierarchical models, longitudinal / repeated-measures designs, or Bayesian alternatives, use specialized R packages (lme4, Stan / brms) instead. Don’t game alpha by inflating power; convention is α = 0.05 / power = 0.80, and any deviation should be justified in advance, not discovered after the data is in.

Privacy100% client-side · hypotheses never transmitted

Testst · 2-proportion z · ANOVA · χ²

Last reviewed2026-05-14 by Dennis Traina

Test Type

Solve For

Significance (α) & Tails

Desired Power (1 − β)

Effect Size (Cohen's d)

d = (mean₁ − mean₂) / SD pooled. Small 0.2, Medium 0.5, Large 0.8.

Required Sample Size

—

Total N

—

Critical Value

—

Noncentrality δ

—

Power vs. Sample Size Curve

Current target marked.

Pro: Multi-Comparison Correction

Running multiple pairwise tests inflates type-I error. Bonferroni and Holm correct the per-test alpha.

Number of comparisons:

Bonferroni and Holm corrections for multi-arm experiments — family-wise alpha control. Unlock with Pro

Pro: Sequential Test Planning

Run interim analyses with O’Brien-Fleming alpha-spending.

Interim looks (incl. final):

O’Brien-Fleming and Pocock boundaries for adaptive sample size with early stopping. Unlock with Pro

Pro: Bayes Factor Equivalents

Translate frequentist power into evidence thresholds (Bayes factor > 3, 10, 30).

Bridge between p-value land and Bayes-factor evidence categories. Unlock with Pro

Save requires subscription

How to Use the Power Calculator

Pick the test that matches your design. Choose what you want to solve for — sample size, power, or minimum detectable effect. Set alpha and the tails. For tests beyond simple t, the right extra inputs appear automatically (proportions for proportions z, number of groups for ANOVA, df for chi-squared). The hero card shows the required per-group sample size; the secondary cards show total N, the critical statistic, and the noncentrality.

Alpha, Beta, Power — The Three Error Rates

Alpha (α) is the type-I error rate: probability of finding a significant effect when there isn’t one. Beta (β) is the type-II error rate: probability of missing a real effect. Power equals 1 − β: probability of detecting a real effect. Convention pegs α at 0.05 and target power at 0.80. Reducing one inflates the other unless N grows.

Cohen's Effect Size Benchmarks Explained

t-tests (d): 0.2 small, 0.5 medium, 0.8 large.
Proportions (h): 0.2 small, 0.5 medium, 0.8 large.
ANOVA (f): 0.10 small, 0.25 medium, 0.40 large.
Chi-squared (w): 0.1 small, 0.3 medium, 0.5 large.
Correlation (r): 0.1 small, 0.3 medium, 0.5 large.

The Effect Size Trap — Small Effects, Huge Samples

Required sample size grows roughly inversely with effect-size squared. Detecting a Cohen’s d of 0.2 takes about 4× the sample of d = 0.4, and 16× the sample of d = 0.8. Industry A/B tests of click-through rates often want to detect 1% relative lifts — h ≈ 0.02 — which can require hundreds of thousands per arm. Run the calculation before launching the experiment, not after.

One-Tailed vs. Two-Tailed Reasoning

Two-tailed tests detect effects in either direction. One-tailed tests detect effects only in a pre-specified direction and roughly halve the required sample size — but at the cost of zero ability to detect effects in the opposite direction, and the loss of statistical legitimacy in many fields. Reviewers and journals routinely demand two-tailed tests for novel claims.

Power Analysis for A/B Testing in Industry

The proportion test is the workhorse of conversion-rate experimentation. Treat baseline conversion as p₁ and the lift you care about as p₂. The tool computes Cohen’s h via the arcsine transform and returns per-arm sample size. Common pitfalls: underestimating real-world variance (week-of-month effects), confounding new-user vs. returning-user populations, and peeking at results — which inflates the effective alpha. Use the Pro sequential-test mode for legitimate early stopping.

When Pre-Registered Power Analyses Improve Research

Registering a power analysis in advance (with effect size, alpha, target power) prevents the post-hoc reasoning that produces irreproducible findings. The Open Science Framework and Center for Open Science both encourage pre-registration; major medical journals require it for trials. The tool’s exported result, including effect size and N, is appropriate for a pre-registration document.

For descriptive statistics, see the Statistics Calculator. All Math & Science tools.

Frequently Asked Questions

What is statistical power?

Power is the probability of correctly detecting a real effect — 1 minus the type II error rate. Conventional target is 0.80, meaning 80% chance of statistically significant results when the true effect exists at the assumed size.

What is a medium effect size?

Cohen benchmarks: d = 0.2 small, 0.5 medium, 0.8 large for t-tests; h = 0.2 / 0.5 / 0.8 for proportions; f = 0.10 / 0.25 / 0.40 for ANOVA. Real research often produces small-to-medium effects.

Should I run a one-tailed or two-tailed test?

Two-tailed unless you have a strong theoretical reason to predict direction in advance. One-tailed nearly halves required sample size but is considered weaker evidence.

How does sample size scale with effect size?

Roughly inversely as the square of effect size. Halving the effect quadruples required sample. Tiny effects need enormous samples, which is why marketing A/B tests detecting 1% lift need millions of users.

Does this replace G*Power or R pwr package?

For standard tests, yes — it matches G*Power to three decimals. For advanced designs like sequential analysis, multilevel models, or longitudinal data, use G*Power or specialized R packages.

About the Statistical Power Calculator

How to Use the Power Calculator

Alpha, Beta, Power — The Three Error Rates

Cohen's Effect Size Benchmarks Explained

The Effect Size Trap — Small Effects, Huge Samples

One-Tailed vs. Two-Tailed Reasoning

Power Analysis for A/B Testing in Industry

When Pre-Registered Power Analyses Improve Research

Frequently Asked Questions

More Math & Science Tools

AC Circuit Analyzer New

Beam Deflection & Load Calculator New

Boiling Point Calculator

Density Calculator