Beyond the Buzz Part VIII: Outliers

Most of my posts so far in the Beyond the Buzz series have been more critical than positive, so I’m happy to post about a couple of things that are strongly positive.

Both stem from Monday’s SPSP* blog post, Without Replications We Will All Die**, by Jelte Wicherts.

I. Wicherts’ concluding paragraph is worth quoting, both for its own sake and as a teaser to encourage you to read his entire blog post:

 “Science is not free from errors, big egos, competition, and personal biases. But in science honest researchers should not have anything to hide or to worry about. Science is the place where we all make errors and try to deal with that somehow. Science is where disagreements sharpen our thoughts but should not make us angry. Science is where we replicate to deal with our necessary doubts. And science is where a PhD student sitting in a small university office can set out to replicate findings to contradict the statements made by a Nobel prize winner, without there ever being any animosity or fear. That is the true DNA of science and that is what makes it thrive.”

II. Reading the post prompted me to follow the link provided at the end to Wicherts’ home page, and so to look at some of his work. In particular, I found the article Bakker, M. & Wicherts, J. M. (2014). Outlier removal, sum scores, and the inflation of the type I error rate in independent samples t tests: The power of alternatives and recommendations. Psychological Methods, in press,  (available here) particularly interesting. Here are some of its highlights, with some comments in footnotes:

1. The authors surveyed publications in a few thoughtfully chosen journals from 2001 to 2010, identified those articles in these issues using the word “outlier” in the text, and selected a random sample of 25 such articles from each journal involved***. The selected articles were examined in more detail. Of the selected articles, 77% said that outliers had been removed before starting analysis.**** Forty-six percent of the articles identified  outliers on the basis of having a z-score exceeding some threshold.***** The authors also discuss other questionable criteria for removing outliers. Only 18% of the articles reported analyses both with and without outliers.

2. The authors performed and reported on a variety of simulations to investigate Type I error rate when outliers are removed in performing a t test for two independent  samples from the same distribution. One simulation assumed a normal distribution. Others were devised to simulate distributions likely to be encountered in psychological research. Two large real datasets were also used in simulations. Simulations were performed for four different sample sizes and for different thresholds for removing outliers, and also for a “data-snooping” choice of threshold. Nominal Type I error rate .05 was used. Some results:

  • In the case of a normal distribution, the actual Type I error rate could exceed .10 (e.g., for threshold 2 and large sample size), but it approached .05 as the threshold approached 4.
  • In some other cases, actual Type I error rate was as large as .15 – and as large as .45 in a “data snooping” simulation.
  • The conclusion: “The removal of outliers is therefore not recommended.”

3. The authors report more simulations regarding power, and end with a list of recommendations for good practice.

Notes:

*Society for Personality and Social Psychology

** The title is a kind of pun, referring both to DNA replication, and to replication of  studies in psychology.

*** Except one journal which had only 12 articles mentioning outlier; all 12 of these were examined, giving total number of examined articles 137.

****I try to teach my students not to remove outliers unless there is good reason to believe that they are recording or measurement errors, so I groaned here.

***** More groans – my gut reaction here was, “Oh no! This is likely to mess up Type I error rate!” Happily, the authors proceed to explain intuitive reasons why removal of outliers so identified is not a good idea, before proceeding as noted with simulations providing evidence that indeed, actual Type I error rate  can increase considerably when outliers above a threshold are removed.