As the title of this post indicates, I think the Gigerenzer paper is a mix of good points and one big questionable point.
The good points (for which I refer you to the appropriate parts of his paper for elaboration):
- The “identification” of the “null ritual” (Section 1)
- The discussion of what Fisher and Neyman-Pearson actually said (Section 2)
- The discussion of different interpretations of the phrase “level of significance” (Sections 3)1
- The discussion of Oakes’ and Haller and Krause’s studies of misinterpretations of statistical significance (Section 4)
- The account of G. Loftus’ efforts to change journal policies (Section 5)
- The discussion of Meehl’s conjecture (Section 7)
- The discussion of Feynmann’s conjecture (Section 8)
- The repeated references to the “statistical toolbox” (discussed most explicitly in Section 1) and “statistical thinking”
But I have serious reservations about his Section 6, which starts,
“Why do intelligent people engage in statistical rituals rather than in statistical thinking? Every person of average intelligence can understand that p(D|H) is not the same as p(H|D). That this insight fades way when it comes to hypothesis testing suggests that the cause is not intellectual but social and emotional. Here is a hypothesis …: The conflict between statisticians, both suppressed by and inherent in the textbooks, has become internalized in the minds of researchers. The statistical ritual is a form of conflict resolution, like compulsive hand washing which makes it resistant to arguments. To illustrate this thesis, I use the Freudian unconscious conflicts as an analogy.”
I do appreciate that he labels his hypothesis as such. However, I see several problems with it:
- He gives no evidence to support it.
- He “illustrates” his thesis via the Freudian theory of unconscious conflicts, which theory I can’t see as anything but a religious type belief.
- Similarly, his analogy with compulsive hand-washing seems overextended.
- He rather cavalierly dismisses (without giving any substantive supporting evidence) what I believe are genuine intellectual problems in understanding frequentist inference and acquiring a good facility with statistical thinking.
I would not argue that there are no emotions, conflicts, or conflicting incentives involved in the widespread acceptance of statistical “rituals,” but my experience as a teacher of both mathematics and statistics indicates that there are also intellectual challenges that contribute to the problem, and that addressing these challenges is important in improving statistical practice.
First, Gigerenzer gives no evidence to back up his claim that “every person of average intelligence can understand that p(D|H) is not the same as p(H|D)”. I can’t say for sure that he is wrong, since I don’t have much experience teaching people of average intelligence: some of my teaching has been at elite universities, and most of it has been at the University of Texas at Austin, which although it is not “elite,” has mostly students who were in the top ten percent of their high school graduating class, and thus presumably are above average in intelligence. And most of the students I have taught have been majoring in STEM fields. I can only give anecdotal evidence of the students I have taught: Of above average intelligence, and usually with an interest in math, science, or technology. But if they have intellectual difficulties, then these difficulties are likely more widespread in people of average intelligence.
I have found in teaching advanced math courses for math majors, that a substantial number of these students have difficulty (at least at first) distinguishing between a statement and its converse – that is, distinguishing between “A implies B” and “B implies A”. Sure, if the implication is stated that way, then they pick up on the idea pretty quickly. But there are lots of ways of stating an implication. (For example, “A whenever B” says “B implies A”.) Identifying hypothesis and conclusion in an implication (which is necessary to distinguish a statement from its converse) is much harder in these less straightforward situations; a fair percentage of students struggle a lot with this, particularly when the problem is imbedded in a context.
I have found in teaching probability and statistics that the same type of difficulty arises in distinguishing between p(D|H) and p(H|D): Students catch on fairly easily for simple situations, but as situations become more complex, they make more mistakes. And the situation is really very complex for frequentist hypothesis testing. So at the very least, very careful teaching is needed to build true understanding. That’s why I have learned to “test” students on questions more-or-less like those in Section 4 of Gigerenzer’s paper when teaching hypothesis testing. Most students struggle intellectually with such questions – but having their incorrect answers pointed out and explained does seem to help some of them understand.
Indeed, I have learned that few students really understand the concepts of sampling distribution, p-value, and confidence interval after just one course. More seem to catch on in a second course, but some still get seriously stuck in misinterpretations. I always reviewed these basics in a graduate course (e.g., regression or ANOVA), because I was aware of this phenomenon. And bear in mind,
- These students are of above average intelligence
- They are (with very few exceptions) not psychology students – they typically are in fields such as math, statistics, biology, engineering, or business.
- Most of them have not yet been immersed in a “publish or perish” environment.
So from this experience, I believe that there are substantive intellectual challenges involved in understanding frequentist statistics, especially to the point of applying it meaningfully in a specific context. Ignoring these challenges in favor of emotional explanations such as Gigerenzer hypothesizes won’t solve the problem.
Footnote to title: The first mixed bag I blogged about was Simmons et al’s now famous paper; see http://www.ma.utexas.edu/blogs/mks/2013/01/09/a-mixed-bag/, http://www.ma.utexas.edu/blogs/mks/2013/01/13/more-re-simmons-et-al-part-i-uncertainty/, http://www.ma.utexas.edu/blogs/mks/2013/01/23/more-re-simmons-et-al-part-ii-not-far-enough/, and http://www.ma.utexas.edu/blogs/mks/2013/01/24/more-re-simmons-et-al-part-iii-huh/
1. However, I think this section would have been better without the introduction of Dr. Publish-Perish’s superego and feelings of guilt.