site is under construction. Please check back every few weeks for
USING STATISTICS: Spotting and Avoiding Them
Common Mistakes in Interpretation of Regression Coefficients
1. Interpreting a coefficient as a rate of
change in Y instead of as a rate of change in the conditional mean of Y.
2. Not taking confidence intervals for coefficients into account.
Even when a regression coefficient is (correctly) interpreted as
a rate of change of a conditional mean (rather than a rate of change of
the response variable), it is important to take into account the
uncertainty in the estimation of the regression coefficient. To
illustrate, in the example used in
item 1 above, the computed regression line has equation ŷ
= 0.56 + 2.18x. However, a 95% confidence interval for the slope
is (1.80, 2.56). So saying, "The rate of change of the conditional mean
of Y with respect to x is estimated to be between
1.80 and 2.56" is usually1 preferable to saying, "The rate
of change of the conditional mean Y with respect to x is about 2.18."
3. Interpreting a coefficient that is
not statistically significant.2
Interpretations of results that are not statistically significant are
made surprisingly often. If the t-test for a regression coefficient is
not statistically significant, it is not appropriate to interpret the
coefficient. A better alternative might be to say, "No statistically
significant linear dependence of the mean of Y on x was detected.
4. Interpreting coefficients in multiple regression with the same
language used for a slope in simple linear regression.
Even when there is an exact linear dependence of one variable on
two others, the interpretation of coefficients is not as simple as for
a slope with one dependent variable.
y = 1 + 2x1 + 3x2, it is not accurate to say "For each
change of 1 unit in x1,
y changes 2 units". What is
correct is to say, "If x2
is fixed, then for each change of 1 unit in x1,
y changes 2 units."
Similarly, if the computed regression line is ŷ
= 1 + 2x1 + 3x2,
with confidence interval (1.5, 2.5), then a correct interpretation
would be, "The
estimated rate of change of the conditional mean of Y with respect
is fixed, is between 1.5 and 2.5 units."
For more on interpreting coefficients in multiple regression, see
Section 4.3 (pp 161-175) of Ryan3.
5. Multiple inference on coefficients.
When interpreting more than one coefficient in a regression
equation, it is important to use appropriate
methods for multiple
inference, rather than using just the individual confidence
that are automatically given by most software. One technique for
multiple inference in regression is using confidence regions. 4
1. The decision needs to be made on the basis of what difference is
practically important. For example, if the width of the
confidence interval is less than the precision of measurement, there is
no harm in neglecting the range. Another factor that is also important
in deciding what level of accuracy to use is what level of accuracy
your audience can handle; this, however, needs to be balanced with the
possible consequences of not communicating the uncertainty in the
results of the analysis.
2. This is really just a special case of the
mistake in item 2. However, it is frequent enough to deserve explicit
3. T. Ryan (2009), Modern Regression
4. Many texts on regression discuss confidence regions. See, for
example, S. Weisberg (2005) Applied
Linear Regression, Wiley,
Section 5.5 (pp. 108 - 110), or R. D. Cook and S. Weisberg (1999), Applied Regression Including Computing and
Graphics, Wiley, Section 10.8 (pp. 250 - 255).