This site is under construction. Please check back every few weeks for updates

COMMON MISTEAKS MISTAKES IN USING STATISTICS: Spotting and Avoiding Them

Introduction Types of Mistakes Suggestions Resources Table of Contents About

Using a Two-sample Test Comparing Means when Cases Are Paired

One of the model assumptions of the two-sample t-tests for means is that the observations between groups, as well as within groups, are independent. Thus if samples are chosen so that there is some natural pairing, then the two-sample t-test is not appropriate.

Example 1: A random sample of heterosexual married couples is chosen. Each spouse of each pair takes a survey on marital happiness. The intent is to compare husbands' and wives' scores.

The two-sample t-test would compare the average of the husband's scores with the average of the wives' scores. However, the samples of husbands and wives are not independent -- whatever factors influence a particular husband's score may influence his wife's score, and vice versa. Thus the independence assumption between groups for a two-sample t-test is violated.

In this example, we can instead consider the individual differences in scores for each couple: (husband's score) - (wife's score). If the questions of interest can be expressed in terms of these differences, then we can consider using the one-sample t-test (or perhaps a non-parametric test if the model assumptions of that test are not met).

Example 2: A test is given to each subject before and after a certain treatment. (For example, a blood test before and after receiving a medical treatment; or a subject matter test before and after a lesson on that subject)

This type of example poses the same problem as Example 1: The "before" test results and the "after" test results for each subject are not independent. The solution is the same: analyze the difference in scores.

Example 2 is a special case of what is considered repeated measures: some measurement is taken more than once on the same unit. Because repeated measures on the same unit are not independent, the analysis of such data needs a method that takes this lack of independence into account. There are various ways to do this; just which one is best depends on the particular situation.