Summary Statistics for Distributions with Large Variability

A measure of center (such as the mean or median) of a random variable gives limited information if the variability of the distribution is large. So in most cases, we need some measure of variability as well as a measure of center.

The standard deviation is one measure of variability that is commonly used. It is especially appropriate for normal or near-normal distributions, but less helpful for skewed distributions.

Confidence intervals are another way of summarizing variability. The endpoints of, say, a 95% confidence interval for a mean are summary statistics. ^{1, 2} However, they are summary statistics that give information about the variability of the sampling distribution of the mean of the original distribution, not about the original distribution itself.

Notes

1. This uses the word "statistic" in the technical sense of "something calculated by a specific rule from data." The left and right endpoints of a 95% confidence interval satisfy this definition. Many (but not all) statistics are estimates of parameters. (A parameter is a number which depends on the distribution (the random variable) itself, but does not depend on the data.) For example, the (sample) mean of a random sample of data from a distribution is an estimate for the (population) mean (also known as expected value, or expectation) of the distribution. Similarly, the sample median is an estimate of the population median; the sample standard deviation is an estimate of the population standard deviation; etc. The endpoints of confidence intervals are unusual in that they do not estimate parameters.

2. The left and right endpoints of a 90% confidence interval for the mean would be different statistics from the endpoints of a 95% confidence interval for the mean.