COMMON MISTEAKS MISTAKES IN USING STATISTICS: Spotting and Avoiding Them

- Model assumptions (e.g., for
the t-test for the
mean, the model
assumptions can be phrased as: simple random sample
^{1}of a random variable with a normal distribution) - Null and alternative hypothesis
- A test statistic. This needs to have the property that extreme values of the test statistic cast doubt on the null hypothesis.
- A mathematical theorem
saying, "If the model
assumptions and the
null hypothesis are both true, then the sampling distribution of the
test statistic has this
particular form."
^{2}

1. Model assumptions: We are
dealing with simple
random samples of the random variable X which has a normal
distribution.^{1}

2. Null hypothesis: The mean
of the random
variable in question is a certain value µ_{0}. The
alternative
hypothesis could be either "The mean of the random variable X is
not µ_{0},"
or
"The mean of the random variable X is less than µ_{0},"
or "The mean of the random variable X is greater than µ_{0}."
For this example, we will use the first alternative, "The
mean of the random variable is not µ_{0}."
(This is called the two-sided
alternative.)

3. Test statistic: x-bar

We now step back and consider all possible simple random samples of X of size n. For each simple random sample of X of size n, we get a value of x-bar. We thus have a new random variable X-bar. (X-bar stands for the new random variable; x-bar stands for the value of X-bar for a particular sample of size n.) The distribution of X-bar is called the sampling distribution of X-bar.

4.
The theorem states: If the
model
assumptions are true and if the mean of X is µ_{0},
then the sampling distribution is normal, with mean µ_{0}
and standard deviation σ/(√n), where σ
(sigma) is the
standard deviation of the random variable X. (Note: σ
is called the
population standard deviation
of X; it is not the same as the sample standard deviation s, although s
is an estimate of σ.)

The validity of the hypothesis test depends on the truth of the conclusion of the theorem; the only way we know the conclusion is true is if we know the hypotheses of the theorem are true. Thus: If the model assumptions are not true, then we do not know that the theorem is true, so we do not know that the hypothesis test is valid.

In the example , this translates to: If the sample is not a simple random sample, then the reasoning establishing the validity of the hypothesis test breaks down.

Comments:

- Different hypothesis tests have different model assumptions. Some tests apply to random samples that are not simple; see Other Types of Random Samples. For many tests, the model assumptions consist of several assumptions. If any one of these model assumptions is not true, we do not know that the test is valid.
- Many techniques are robust to departures from at least some model assumptions. This means that if the particular assumption is not too far from true, then the technique is still approximately valid

1. This refers to a simple random sample of a random variable; see the page More Precise Definition of Sample Random Sample for more information.

2. The distribution of the test statistic, when considering all possible suitably random samples of the same size, is called a sampling distribution. For additional discussion of sampling distributions, see Overview of Frequentist Confidence Intervals and Frequentist Hypothesis Tests, p-values, and Type I Error. Those two pages and this one are best read as a unit.

Last modified May 10, 2012