USING STATISTICS: Spotting and Avoiding Them
Mistakes in Thinking About Causation
Confusing correlation and causation
Any statistics text worth its salt will caution the reader not to
confuse correlation with causation. Yet the mistake is very common. As
a refresher, here's an example I often give my classes:
Consider elementary school
students' shoe sizes and scores on a standard reading exam. They are
correlated, but saying that larger shoe size causes higher reading
scores is as absurd as saying that high reading scores cause larger
In this example, there is a clear lurking variable, namely, age. As
the child gets older, both their shoe size and reading ability increase.
Elaborating on this situation:
If you agree that increasing age
(for elementary school children) causes increasing foot size, and
therefore increasing shoe size, then you expect a correlation between
age and shoe size. Correlation is symmetric, so shoe size and age are
correlated. But it would be absurd to say that shoe size causes age.
In other words, even when there is a causal relationship, the
causality typically only goes one way. (Of course, it could go both
ways, as in a feedback loop.)
One situation where people slip into confusing correlation and
causality is in regression. For example, one might regress college GPA
on SAT scores, obtaining a positive coefficient beta of SAT score
in the regression equation. Consider the following two statements:
Statement B is correct (assuming, of course, that the regression
has been carried out correctly). Statement A is incorrect: the
regression equation gives no information about causality. Indeed, there
is likely a lurking variable (or probably a bunch of
lurking variables) that affects both GPA and SAT score; SAT score is
considered to be a (perhaps crude) measure 1of
this lurking variable.
- An increase of one point in SAT scores causes, on average,
an increase of β points in college GPA.
- For every increase of one point in SAT scores, the increase
in average college GPA is β points.
deterministically when the evidence is
from designed experiments, when analyzed appropriately, allow stronger
(almost) causative inferences, which incubate further scientific
inspiration and hypothesis generation, and so forth, through the cycle.
In the right hands, and with a component of luck, this cycle leads to
Noel Cressie and Christopher K. Wikle,
Statistics for Spatio-Temporal Data,
Wiley, 2011, p. 9
After pointing out problems such as confusing correlation and
causation, most statistics textbooks include a statement such as:
"The only legitimate way to try to
establish a causal connection
statistically is through the use of
randomized experiments." 2
Unfortunately, such discussions usually come early in the book, and are
not revisited for elaboration later after statistical inference has
been discussed. When a well-designed, carefully analyzed experiment
(or, better yet, series of experiments) has established good evidence
of causality, there is still room for misinterpretation, since usually
the analysis is in terms of a summary statistic such as an average.
When this is the case, the results do not give evidence to a
deterministic causation -- that is, they do not prove that "If this is
done, then this will be the result in all cases." Instead, what they
say is, "If this is done, under these circumstances, then on average
this will be the result."
Thus, for example, it is rare that an experiment will support an
assertion such as "If you take this medication, your blood pressure
will go down" or "If you do this type of exercise this frequently you
will not have a heart attack." All that can be concluded are statements
such as, "On average, people who take this
medication have a
decrease in blood pressure" or "Fewer people who do this type of
exercise this frequently have heart attacks than people who don't."3And,
as noted in the quote from Cressie and Wikle above, even this requires
careful design of the experiment and appropriate statistical analysis.
1. The discussion in the
page is framed in terms of outcome variables, but the considerations
apply to predictor variables, such as SAT score, as well.
2. Utts, Jessica (2005) Seeing
(Thompson), p. 211. Use of this quote here is not intended as a criticism of this
text; the quote is extracted from the context of a very
good two-page discussion on establishing causation.
3. What would be of more interest than a difference in means would be
the probability that assignment to treatment gives better outcome than
assignment to no treatment. This is discussed in Richard H.
Browne, The t-Test p Value and Its Relationship to the Effect Size and
P(X>Y), The American
Statistician, February 1, 2010, 64(1), 30 - 33.
Last updated Sept. 25,