Detrimental Effects of Underpowered or Overpowered Studies

COMMON MISTEAKS MISTAKES IN USING STATISTICS: Spotting and Avoiding Them

Introduction Types of Mistakes Suggestions Resources Table of Contents About Glossary

Detrimental Effects of Underpowered or Overpowered Studies

The most straightforward consequence of underpowered studies (i.e., those with low probability of detecting an effect of practical importance) is that effects of practical importance are not detected.

But there is a second, more subtle consequence: underpowered studies result in a larger variance of the estimates of the parameter being estimated. For example, in estimating a population mean, the sample means of studies with low power have high variance; in other words, the sampling distribution of sample means is wide. This is illustrated in the following picture, which shows the sampling distributions for a variable with zero mean when sample size n = 25 (red) and when n = 100 (blue). The vertical lines toward the right of each sampling distribution show the cut-off for a one-sided hypothesis test with null hypothesis µ = 0 and significance level alpha = .05. Notice that

The sampling distribution for the smaller sample size (n = 25) is wider than the sampling distribution for the larger sample size ( n = 100).
Thus, when the null hypothesis is rejected with the smaller sample size n = 25, the sample mean tends to be noticeably larger than when the null hypothesis is rejected with the larger sample size n = 100.

Sampling distributions for n = 25 and n = 100

This reflects the general phenomenon that studies with low power have a larger chance of having a large effect size (e.g., sample mean) than studies with high power.¹

In particular, when there is a Type I error (falsely rejecting the null hypothesis), the effect will appear to be stronger with a small sample size (lower power) than with a large sample size (higher power).² This may suggest an effect that is not there. Such a mistake may go undetected because of the File Drawer Problem. Thus, when studies are underpowered, the literature is likely to be inconsistent and often misleading. Here is an example that appears to show this phenomenon in a research survey.

Overpowered studies waste resources. When human or animal³ subjects are involved, having an overpowered study can be considered unethical. More generally, an overpowered study may be considered unethical if it wastes resources.

A common compromise between overpower and underpower is to try for power around .80. However, power needs to be considered case by case, balancing the risks of Type I and Type II errors.

Notes:
1. For more discussion, see Andrew Gelman and David Weakliem, Of Beauty, Sex, and Power, The American Scientist, 97(4), July-August 2009, www.stat.columbia.edu/~gelman/research/published/power4r.pdf

2. This sentence was misstated in the original version of this page, but the misstatement was corrected Sept. 23, 2013. Thanks to Stefan Wiens for pointing out the error.

3. For more on ethical considerations in study design for research on animals, see:

Festing, Michael, Statistics and animals in biomedical research, Significance Volume 7 Issue 4 (December 2010), available online
Kilkenny et al, (2010) Improving bioscience research reporting: The ARRIVE guidelines for reporting animal research. PLoS Biology, 8, online

Last updated Sept 23, 2013