"...
power calculations ... in general are more delicate than questions
relating to Type I error."

B.
Efron (2010), Large-Scale
Inference: Empirical Bayes Methods for Estimation, Testing, and
Prediction, Cambridge, p. 85

OverviewThe power of a statistical procedure can be thought of as the probability that the procedure will detect a true difference of a specified type. As in talking about p-values and confidence levels, the reference category for "probability" is the sample. So spelling this out in detail:

Power is the probability that a
randomly chosen sample
satisfying the model assumptions will detect a difference of the
specified type when the procedure is applied, if the specified
difference does indeed occur in the population being studied.

Note also
that power is a conditional probability:
the probability of detecting a difference, if indeed the difference
does exist.In many real-life situations, there are reasonable conditions that we would be interested in being able to detect, and others that would not make a practical difference.

Examples:

- If you can only measure the response to within 0.1 units, it doesn't really make sense to worry about falsely rejecting a null hypothesis for a mean when the actual value of the mean is within less than 0.1 units of the value specified in the null hypothesis.
- Some differences are of no practical
importance -- for
example, a medical treatment that extends life by 10 minutes is
probably not worth it.

- Doing much more work than necessary
- Obtaining results which are meaningless,
- Obtaining results that don't answer the question of interest.

Elaboration

For a confidence interval procedure, power can be defined as the probability

For a hypothesis test, power can be defined as the probability

Example:
For a one-sample t-test
for the mean of a population, with null hypothesis H_{0}:
µ = 100,
you might be interested in the probability of rejecting H_{0}
when µ
≥ 105, or when |µ - 100| > 5, etc.

As with Type II error, we need to
think of power in terms of power
against a specific alternative rather
than against a general alternative.Example:
If we are performing a
hypothesis test
for the mean of a population, with null hypothesis H_{0}:
µ = 0, and
are interested in rejecting Ho when µ > 0, we might
(depending on the situation -- i.e., on what difference is of practical
significance) calculate the power of the test against
the specific alternative H_{ 1}:
µ = 1, or against the specific alternate H_{3}
:
µ = 3, etc. The picture below shows three
sampling distributions:

The red line marks the cut-off corresponding to a significance level α = 0.05.

^{3
}

Note: For most
tests, it is possible to
calculate the power against a specific
alternative, at least to a reasonable approximation, if relevant
information (or good approximations to them) is available. It is not usually possible to
calculate the power against
a general alternative, since the general alternative is made up of
infinitely many possible specific alternatives.
- The sampling distribution assuming H
_{0}(blue; leftmost curve) - The sampling distribution assuming H
_{1}(green; middle curve) - The sampling distribution assuming H
_{3}(yellow; rightmost curve)

The red line marks the cut-off corresponding to a significance level α = 0.05.

- Thus the area under the blue curve to the right of the red line is 0.05.
- The area under the green
curve the to right of the red line
is the probability of rejecting the
null hypothesis (µ = 0) if the specific
alternative H
_{1}: µ = 1 is true. In other words, this area is the power of the test against the specific alternative H_{1}: µ = 1. We can see in the picture that in this case, this power is greater than 0.05, but noticeably less than 0.50. - Similarly, the area under the yellow curve
the to right of the red line
is the power
of the test against the
specific alternative H
_{3}: µ = 3. Notice that it is much larger than 0.5.

Power and Type II Error

Recall that the Type II Error rate β of a test against a specific alternate hypothesis test is represented in the diagram above as the area under the sampling distribution curve for that alternate hypothesis and to the left of the cut-off line for the test. Thus

(Power of a test against a
specific alternate hypothesis) + β
= total area under sampling distribution curve = 1,

soPower = 1 - β

Factors that Influence Power

In addition to the alternative or other degree of difference (e.g., width of confidence interval) desirable to detect, sample size, variance, and experimental design influence power. More

Detrimental Effects of Underpowered or Overpowered Studies

Common Mistakes Involving Power

Notes:

1. Again, the reference category for the probability is the samples.

2. This assumes a confidence interval procedure that results in a confidence interval centered at the parameter estimate. Other characterizations may be needed for other types of confidence interval procedures.

3. The Rice Virtual Lab in Statistics' Robustness Simulation can be used to illustrate, in an interactive manner, the effect of the difference to be detected (and also of standard deviation), on power for the two-sample t-test.

Last updated August 28, 2012