Introduction    Types of Mistakes       Suggestions       Resources       Table of Contents     About    Glossary

Detrimental Effects of Underpowered or Overpowered Studies

The most straightforward consequence of underpowered studies (i.e., those with low probability of detecting an effect of practical importance) is that effects of practical importance are not detected.

But there is a second, more subtle consequence: underpowered studies result in a larger variance of the estimates of the parameter being estimated. For example, in estimating a population mean, the sample means of studies with low power have high variance; in other words, the sampling distribution of sample means is wide.  This is illustrated in the following picture, which shows the sampling distributions for a variable with zero mean when sample size n = 25 (red) and when n = 100 (blue). The vertical lines toward the right of each sampling distribution show the cut-off for a one-sided hypothesis test with null hypothesis µ = 0 and significance level alpha = .05. Notice that
Sampling distributions for n = 25 and n = 100

This reflects the general phenomenon that  studies with low power have a larger chance of having a large effect size (e.g., sample mean) than studies with high power.1

In particular, when there is a Type I error (falsely rejecting the null hypothesis), the effect will appear to be stronger with a small sample size (lower power) than with a large sample size (higher power).2
This may suggest an effect that is not there. Such a mistake may go undetected because of the File Drawer Problem. Thus, when studies are underpowered, the literature is likely to be inconsistent and often misleading. Here is an example that appears to show this phenomenon in a research survey. 

Overpowered studies waste resources. When human or animal3 subjects are involved, having an overpowered study can be considered unethical. More generally, an overpowered study may be considered unethical if it wastes resources.

A common compromise between overpower and underpower is to try for power around .80. However, power needs to be considered case by case, balancing the risks of Type I and Type II errors.

1.  For more discussion, see Andrew Gelman and David Weakliem, Of Beauty, Sex, and Power, The American Scientist, 97(4), July-August 2009,

2. This
sentence was misstated in the original version of this page, but the misstatement was corrected Sept. 23, 2013. Thanks to Stefan Wiens for pointing out the error.

3. For more on ethical considerations in study design for research on animals, see:
Last updated Sept 23, 2013