My last post discussed the importance of model assumptions in using hypothesis tests, and the potential difficulties in checking them. This post will focus on what can (and should) be done to check model assumptions before one plunges into performing a hypothesis test. It will focus on One-Way ANOVA, using examples from the literature on stereotype susceptibility.
Before proceeding, I repeat the caveat from the first post in this series:
Important Caveat: I cannot emphasize enough that the comments I will be making in the posts in this series are not intended, and should not be construed, as singling out for criticism the authors of those two papers or of any papers referred to, nor their particular area of research. Indeed, my central point is that registered reports are not, by themselves, enough to substantially improve the quality of published research. In particular, the proposals for registered reports need to be reviewed carefully with the aim of promoting best research practices.
Recall from the preceding post that the model assumptions for one-way, fixed effects ANOVA can be stated as:
- The groups are independent of each other
- The variances of Y on each group are all the same.
- Y is normal on each group
Checks that can and should be done include1:
- Plotting2 the data for each group to check for possible evidence of violations of model assumptions
- Checking how similar (or different) sample variances for individual groups are
- Being especially cautious if group sizes differ appreciably
Unfortunately, none of the papers mentioned in Part I discussed these checks.
Also unfortunately, this is not surprising to me, since model checks are all too often not discussed in the literature (TTWWADI?), although best practice definitely requires such discussion.
However, all of these papers that used an ANOVA analysis did provide group sample standard deviations, so it was possible for the reader to check that variances across groups were fairly constant (item B above).
One violation of model assumptions that requires particular attention is if Y skewed. ANOVA compares means, but for skewed distributions, means are not good measures of what is “typical.”3 A paper on stereotype threat that I looked at a few years ago reported some means and standard deviations that strongly suggest that the distributions of the response variable in some cases were skewed to the right. For example, one group had mean .04 and standard deviation .13. A normal distribution with this mean and standard deviation would have a substantial proportion of values less than zero, which could not happen with the response variable in these studies. A plot of values would have helped bring attention to this problem.
Another model assumption check can sometimes be used: Using information about the response variable to help decide whether it is (close to) normal.
1. Standardized tests are often constructed to have normal distributions of scores
- However, scores on such a test cannot be assumed to have a normal distribution on a subgroup.
2. If a random variable is binomial with parameters n and p, then if p is not extreme, the variable is approximately normal.
3. Variables that are quotients of random variables can be very messy (see, e.g., http://en.wikipedia.org/wiki/Ratio_distribution). They are often skewed or have kurtosis (a measure of how sharp or flat the “peak” of the distribution is) very different from that of a normal distribution. Both of these deviations from normality can affect the alpha-level of the ANOVA test4. The accuracy response variable in the stereotype susceptibility studies is a quotient of random variables (number correct over number attempted), and thus might have properties that make an ANOVA test not robust.
Again, I do not suggest that lack of attention to model assumptions and robustness is a problem just in the area of research on stereotype susceptibility, or even just in the subject of psychological research; I have seen it frequently in a variety of areas, including many cases in biology. I invite readers to select a few papers of their choice (that use frequentist statistics) and look at how well (or how poorly) they address the problems of model assumptions and robustness.
So I propose:
RECOMMENDATION #2 for improving research proposals (and thereby improving research quality):
1. Proposers should include in their proposals:
- How the study design is planned to increase chances that the model assumptions of the proposed analysis methods will be satisfied.
- What checks on model assumptions will be performed after collecting data.
- Contingency plans in case model assumptions cannot be adequately met.
2. Reviewers of research proposals should check that each of the above points is addressed soundly.
1. Two textbooks that I am familiar with that are strong on checks for model assumptions and discussion of robustness are:
DeVeaux, Velleman, and Bock (2012), Stats: Data and Models, Addison Wesley.
The book does not use the term “robustness,” but for each statistical procedure includes a list of “conditions” (along with the model assumptions), that summarize the practical implications of robustness considerations. It gives such “conditions” for all hypothesis tests and confidence interval procedures it includes.
Dean and Voss (1999), Design and Analysis of Experiments, Springer
This discusses types of ANOVA other than one-way, as well as some alternatives when model assumptions are not satisfied.
2. This might be done via dot plots, histograms, or box plots.
3. For example, real estate information by locality usually lists median prices, rather than mean prices, since the mean is influenced by higher-end houses to give a value higher than the typical price, which is better indicated by the median. Similarly, when I discussed exam scores with a class when returning graded exams, I would give the median rather than the “average,” since the latter would be influenced by the “tail” of a few low-performing students to give a value that would not be typical of class performance overall.
There are alternate hypothesis tests that do compare medians. Also, in some cases, medians can be compared by first taking logs, then using ANOVA on the transformed variable. (If logY is normal, then the mean of logY will also be the median of logY, which will be the log of the median of Y.)
4. See p. 316 of Harwell, M.R., E.N. Rubinstein, W.S. Hayes, and C.C. Olds. 1992. Summarizing Monte Carlo results in methodological research: the one- and two-factor fixed effects ANOVA cases. J. Educ. Stat. 17: 315-339