COMMON MISTEAKS MISTAKES IN USING STATISTICS: Spotting and Avoiding Them    

Introduction    Types of Mistakes       Suggestions       Resources       Table of Contents     About    Glossary    Blog


Power of a Statistical Procedure

"... power calculations ... in general are more delicate than questions relating to Type I error."
 B. Efron (2010), Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction, Cambridge, p. 85
Overview

The power of a statistical procedure can be thought of as the probability that the procedure will detect a true difference of a specified type. As in talking about p-values and confidence levels, the reference category for "probability" is the sample.
So spelling this out in detail:

Power is the probability that a randomly chosen sample satisfying the model assumptions will detect a difference of the specified type when the procedure is applied, if the specified difference does indeed occur in the population being studied.

Note also that power is a conditional probability: the probability of detecting a difference, if indeed the difference does exist.

In many real-life situations, there are reasonable conditions that we would be interested in being able to detect, and others that would not make a practical difference.

Examples:
In cases such as these, neglecting power could result in one or more of the following:

Elaboration

For a confidence interval procedure, power can be defined as the probability1 that the procedure will produce an interval with a half-width of at least a specified amount2.

For a hypothesis test, power can be defined as the probability1 of rejecting the null hypothesis under a specified condition.
Example: For a one-sample t-test for the mean of a population, with null hypothesis H0: µ = 100, you might be interested in the probability of rejecting H0 when µ ≥ 105, or when |µ - 100| > 5, etc.

As with Type II error, we need to think of power in terms of power against a specific alternative rather than against a general alternative.

Example: If we are performing a hypothesis test for the mean of a population, with null hypothesis H0: µ = 0, and are interested in  rejecting Ho when µ > 0, we might (depending on the situation -- i.e., on what difference is of practical significance) calculate the power of the test against the specific alternative H 1: µ = 1, or against the specific alternate H3 : µ = 3, etc. The picture below shows three sampling distributions:
Sampling distributions for three specific hypotheses, showing cut-off for rejction

The red line marks the cut-off corresponding to a significance level α = 0.05.
This illustrates the general phenomenon that the farther an alternative is from the null hypothesis, the higher the power of the test to detect it. 3

Note: For most tests, it is possible to calculate the power against a specific alternative, at least to a reasonable approximation, if relevant information (or good approximations to them) is available.  It is not usually possible to calculate the power against a general alternative, since the general alternative is made up of infinitely many possible specific alternatives.

Power and Type II Error

Recall that the Type II Error rate 
β of a test against a specific alternate hypothesis test is represented in the diagram above as the area under the sampling distribution curve for that alternate hypothesis and to the left of the cut-off line for the test. Thus

(Power of a test against a specific alternate hypothesis) + β = total area under sampling distribution curve = 1,
so
Power = 1 - β

Factors that Influence Power

In addition to the alternative or other degree of difference (e.g., width of confidence interval) desirable to detect, sample size, variance, and experimental design influence power. More


Detrimental Effects of Underpowered or Overpowered Studies

Common Mistakes Involving Power


Notes:
1.  Again, the reference category for the probability is the samples.

2. This assumes a confidence interval procedure that results in a confidence interval centered at the parameter estimate. Other characterizations may be needed for other types of confidence interval procedures.

3. 
The Rice Virtual Lab in Statistics' Robustness Simulation can be used to illustrate, in an interactive manner, the effect of the difference to be detected (and also of standard deviation), on power for the two-sample t-test. 

Last updated August 28, 2012