"...
all models are limited by the validity of the assumptions on which they
ride."

"Assumptions
behind models are rarely articulated, let alone defended. The problem
is exacerbated because journals tend to favor a mild degree of novelty
in statistical procedures. Modeling, the search for significance, the
preference for novelty, and the lack of interest in assumptions --
these
norms are likely to generate a flood of nonreproducible results."

David Collier, Jasjeet S. Sekhon, and
Philip B. Stark, Preface (p. xi) to Freedman David A., Statistical Models and Causal Inference: A
Dialogue with the Social Sciences.

David Freedman, Chance 2008, v. 21 No 1, p. 60

Many techniques are robust to departures from at least some model assumptions. This means that if the particular assumption is not too far from true, then the technique is still approximately valid.

- What are the model assumptions for that technique?
- Is the technique robust to some departures from the model assumptions?
- What reason is there to believe that the model assumptions (or something close enough, if the technique is robust) are true for the situation being studied?

Unfortunately, the model assumptions vary from technique to technique, so there are few if any general rules. One general rule of thumb, however is:

Techniques are
least likely to be robust to departures from assumptions of
independence.^{3, 4}

Note: Assumptions of
independence are
often phrased in terms of "random sample"
or "random assignment", so
these
are very important."The
independence assumption is fragile. ... Even modest violations of
independence can introduce substantial biases into conventional
procedures."

David A. Freedman, Statistical Models and Causal Inference: A
Dialogue with the Social Sciences, p. 31

" The independence assumption ...
is a dangerous assumption in practice!"

Bradley Efron, Large Scale Inference, p. 26

1.
When selecting samples or dividing into
treatment groups, be very careful in randomizing according to the
requirements of the method of analysis to be used.

See What
is a Random Sample? and further
links from that page for more detail.

See also:

See also:

Biased
Sampling and
Extrapolation for some examples of how the sampling method may
result in a problematical sample.

Analyzing Data Without Regard to How They Were Collected

Analyzing Data Without Regard to How They Were Collected

2.
Sometimes (not too often) model
assumptions can be justified plausibly by well-established^{5}
facts, mathematical theorems, or theory that is well-supported by sound
empirical evidence.

3. Sometimes a rough idea of whether or not model assumptions might fit can be obtained by either plotting the data or plotting residuals obtained from a tentative use of the model.

Note:
Unfortunately, these methods are typically better at telling you when
the model assumption does not
fit than when it does.

Examples, Guidelines, and Cautions

Examples, Guidelines, and Cautions

Using
a two-sample test comparing means when cases are paired (also
includes discussion of repeated
measures)

Comparisons of treatments applied to people, animals, etc (Intent to Treat; Comparisons involving Drop-outs)

Fixed vs Random Factors

Analyzing Data without Regard to How the Data Were Collected

Dividing a Continuous Variable into Categories ("Chopped Data")

Pseudoreplication

Mistakes in Regression

For More Discussion of
Inappropriate Methods of AnalysisComparisons of treatments applied to people, animals, etc (Intent to Treat; Comparisons involving Drop-outs)

Fixed vs Random Factors

Analyzing Data without Regard to How the Data Were Collected

Dividing a Continuous Variable into Categories ("Chopped Data")

Pseudoreplication

Mistakes in Regression

- Freedman, David A., ed by David Collier, Jasjeet S. Sekhon, and Philip B. Stark (2010), Statistical Models and Causal Inference, A Dialogue with the Social Sciences, Cambridge University Press. I heartily recommend this. (Many of the articles in this book are also available in preprint form at http://www.stat.berkeley.edu/~census/)
- Harris, A. H. S., R. Reeder and J. K. Hyun (2009), Common statistical and research design problems in manuscripts submitted to high-impact psychiatry journals: What editors and reviewers want authors to know, Journal of Psychiatric Research, vol 43 no15, 1231 -1234

1. Bayesian statistical techniques also involve assumptions; this web site focuses mostly on frequentist techniques.

2. The Rice Virtual Lab in Statistics' Robustness Simulation can be used to demonstrate the effect of some violations of model assumptions on the two-sample t-test.

3. However, there is some robustness to some types of departures from independence. One is that, for large enough populations, sampling without replacement is good enough, even though "independent" technically means sampling with replacement; see More Precise Definition of Simple Random Sample.

4. For more discussion of the independence assumption and possible effects of violations of it, see the Freedman (2010) reference above, especially chapters 1 - 3 and 19.

5. Here, "well established" means well established by sound empirical evidence and/or sound mathematical reasoning. This is not the same as "well-accepted," since sometimes things may be well-accepted without sound evidence or reasoning.

Last updated August 28, 2012