This is the second in a series of posts discussing concerns that arose in reading two of the papers in the special issue of Social Psychology on Registered Replications. (The first was here; this one will be continued in the next post.)
Every statistical hypothesis test has certain model assumptions. For example, the model assumptions for one-way, fixed effects ANOVA, with k groups and response variable Y, can be stated as follows1:
- The groups are independent of each other
- The variances of Y on each group are all the same.
- Y is normal on each group
For example, if Y is the score on a certain test, and the groups are groups of students, then the model assumptions would say:
- The groups of students are independent.
- The variance of the test scores is the same for each group of students.
- The distribution of test scores for each group is normal.
The model assumptions are what make the hypothesis test valid – if the model assumptions are not true, we don’t have any assurance that the logic behind the validity of the hypothesis test holds up. (For more detail in a simpler case, see pp. 10 – 26 of this. In particular:
- If the model assumptions are not true, then the actual type I error rate (“alpha level”) might be different (smaller or larger) than the intended type I error rate. For example, if the researcher sets alpha = .05 for rejection of the null hypothesis, the actual type I error rate might be smaller than that (e.g., .03), or larger (e.g., .07), just depending on the departures from model assumptions and other particulars of the test.
- Similarly, if the model assumptions are not true, then power calculations (which would necessarily be based on the assumption that the model assumptions are true) are unreliable – actual power could be smaller or larger than calculated. Even if the type of departure from model assumption is known, accurate power calculations could be practically speaking impossible to carry out.
However, hypothesis tests might still be fairly credible if the model assumptions are not too far from true. The technical terminology for this is that “the test is robust to some violations of the model assumptions.”
- It is usually impossible to tell whether or not the model assumptions are true in any particular case.
- There may be lots of ifs, ands, and buts involved in when a hypothesis test is robust to some model violations.
- There is a lot that is known about robustness.2
- There are some fairly standard “checks” that can often help a researcher make an informed decision as to whether model assumptions are so far off that using the test would be like building a house of cards, or whether it would be reasonable to proceed with the hypothesis test.(More on this in my next post)
- There are in many cases alternate tests which have different model assumptions that might apply.3
- Many textbooks (and websites and software guides) ignore model assumptions.
- Some mention them but give “folklore” reasons to ignore them.4
In other words, the metaphor of the game of telephone, and TTWWADI, tend to foster lack of attention to model assumptions and robustness in using statistics.
However, there are some textbooks that do a good job of discussing model assumptions and robustness. (More on this in my next post.)
In the next post, I will discuss model assumptions in the context of the papers on stereotype susceptibility that I mentioned in the last post, and will propose some recommendations concerning model assumptions and research proposals. Meanwhile, here is an example of neglecting model assumptions that is also related to the special issue of Social Psychology on replications:
I his May 20 Guardian article Psychology’s “registration revolution”, Chris Chambers quotes psychologist Don Simon as saying that study preregistration “keeps us from convincing ourselves that an exploratory analysis was a planned one.” One commenter responded,
“Why do we even split this stuff. I mean, if I study something and do hypothesis testing and THEN something interesting comes up by a few clicks in SPSS/PSPP, shouldn’t we just integrate it? Why write another research report?”
This comment gives an example of how the “game of telephone” phenomenon has worked to drop model assumptions (as well as other concerns such as multiple testing, to be discussed in a later post) from consideration: SPSS (as with other statistical software) can only perform calculations that the user tells it to perform. It has no way of checking if those calculations are appropriate. In particular, the software just spits out the results of a hypothesis test it is told to do, regardless of whether or not the test is appropriate; it has no way of knowing whether or not the model assumptions fit the context. That is up to the user to figure out. So “something interesting” that “comes up by a few clicks in SPSS/PSPP” may be simply an artifact of the user’s choosing to do tests that are not appropriate in the context of the data being used.
1. There are many ways of stating the model assumptions; I have chosen the form above to minimize use of notation. However, some statements of the model assumptions in the literature and (especially) on the web are incorrect. For example, the page http://en.wikipedia.org/wiki/One-way_analysis_of_variance (as of this writing) says, “Response variable residuals are normally distributed.” This would be correct if “residuals” were replaced by “errors”. The problem is that the residuals depend on the data; the word “errors” in this context refers to the difference between the value of Y and the mean of Y on the subgroup. The errors, with this definition, do not depend on the data; they are unknown.
Also, assumption (1) above is stated in a somewhat fuzzy manner; technically, what it means is that the random variables Y1 , Y2 , … , Yk, are independent, where Yi is the random variable Y restricted to group i. (In the example, Yi would be the test score for students in group i only)
2. For fixed effects ANOVA, see, for example Harwell, M.R., E.N. Rubinstein, W.S. Hayes, and C.C. Olds. 1992. Summarizing Monte Carlo results in methodological research: the one- and two-factor fixed effects ANOVA cases. J. Educ. Stat. 17: 315-339.
3. See, for example, Wilcox, Rand R. (2005 and 2012), Introduction to Robust Estimation and Hypothesis Testing, Elsevier, and Huber and Ronchetti (2009) Robust Statistics, Wiley
4. For example, Wilcox (2005) (see note 3 above) comments (p. 9), “For many years, conventional wisdom held that standard analysis of variance (ANOVA) methods are robust, and this point of view continues to dominate applied research,” and explains how that misunderstanding appeared to have come about.