Formulas for sampling without replacement

Formulas for
Sampling with Replacement and Sampling without Replacement

Formulas for sampling with replacement (the usual textbook formulas)

Reference: Mathematical Statistics and Data Analysis, John A. Rice. Wadsworth, 1988, 1995. All proofs of the results for sampling without replacement that are in these web pages are included in the "Survey Sampling" chapter.

Preliminaries:

Assume that we have a population of size N. The values of the population are numbers. When we take a sample, it is a simple random sample (SRS) of size n, where .

Population mean:

Population standard deviation:

Unbiased estimator of the population mean (sample mean):

If the individual values of the population are "successes" or "failures", we code those as 1 or 0, respectively. Then the parameter of interest is usually called the population proportion, even though, strictly speaking, it is also the population mean.

Population proportion:

Population standard deviation:

Unbiased estimator of the population proportion (sample proportion):

Return to the top.

If we assume the simple random sampling is with replacement, then the sample values are independent, so the covariance between any two different sample values is zero. This fact is used to derive these formulas for the standard deviation of the estimator and the estimated standard deviation of the estimator. The first two columns are the parameter and the statistic which is the unbiased estimator of that parameter.

standard deviation of the estimator usual estimator of the standard deviation of the estimator

where

Return to the top.

If we assume the simple random sampling is without replacement, then the sample values are not independent, so the covariance between any two different sample values is not zero. In fact, one can show that

Covariance between two different sample values: for

This fact is used to derive these formulas for the standard deviation of the estimator and the estimated standard deviation of the estimator. The first two columns are the parameter and the statistic which is the unbiased estimator of that parameter.

standard deviation of the estimator estimator of the standard deviation of the estimator

where

Return to the top.

Discussion:

Notice that the main difference between the two sets of formulas is the extra factor on each when we are sampling without replacement. In each case, the extra factor is some number between 0 and 1, so it makes the standard deviation smaller than it is for sampling with replacement.
If we actually do sampling without replacement (as we usually do), but we analyze the results as if we sampled without replacement (easier formulas that we all learned), how are our results in error? (Answer: Our estimates of the standard deviation are a little larger than they really should be, so we don't claim as much accuracy in our estimators as we really should.)
How much is the error? Answer: If the sample size is a lot smaller than the population size, then the extra factor is really close to 1, so there isn't much error. If the sample size is a significant fraction of the population size, like half or so, then the extra factor is about 0.70 (square root of ½). So we really need to use the correct formula there to get reasonably accurate results.
In the textbook Basic Practice of Statistics, the author (David Moore) says we can use the simple formulas for the standard deviation as long as the population is at least 10 times as big as the sample. If we use the simple formulas when the population is exactly 10 times as big as the sample, how large is the factor? (Answer: 0.95.) So how much off are we? (Answer: 5%) What if the population is 20 times as big as the sample? (Answer: factor is .975, for error of about 2.5%.)
Why do we use the n-1 in the denominator of the estimator ? Answer: That's what it takes to make this an unbiased estimator of .
Why isn't there an equivalent n-1 in a denominator when we're estimating the variance of ? Answer: Strictly speaking, there should be, if we were using exactly as accurate mathematics in this case. However, because the numerators here are very small, and usually the values of n are very large, there is only a very tiny difference between using n and n-1. Thus, most textbooks just put n here.
Do we always use unbiased estimators? Answer: No. Part of the mathematics we learn to do in M378K is about proving which estimators are unbiased. And we find that sometimes slightly biased estimators are even better than strictly unbiased ones. What do we mean by better? That they have smaller Mean Squared Error (MSE). One of the main topics of a theoretical mathematical statistics class is the theory of estimation. So if you want to really learn about how to determine which estimators are better, you should take M378K. In introductory courses, you have to just take the word of the statisticians about which estimators people have found to be most useful.

Return to the top.

Mary Parker

		standard deviation of the estimator	usual estimator of the standard deviation of the estimator
			where

		standard deviation of the estimator	estimator of the standard deviation of the estimator
			where

Formulas for Sampling with Replacement and Sampling without Replacement

Formulas for
Sampling with Replacement and Sampling without Replacement