Sampling is an accepted way of making estimates at a much lower cost and when done properly, statistics can provide useful information regarding the likelihood that information obtained from the sample is representative of the entire population from which the sample was taken. This article provides a primer of information that the person considering a sample will need to evaluate. We include interactive calculators that determine how large the sample size will need to be in light of these key considerations.
The first step is to determine the type of information being sought. There are two types of information, with each type having its own sampling method. The two sampling types are:
Since the whole point of using a sample (vs. measure the entire population) is to save money, those paying for the statistical work obviously prefer to have the smallest sample that will meet their objectives. Sample size is determined by the following factors that the person conducting the sample gets to determine (at least initially)
Note that population size is not included in the above list. For any population large enough to warrant sampling, population size becomes irrelevant.
Of the above questions, the standard deviation is often the most difficult parameter to define. If insufficient information exists regarding the standard deviation, a pilot test can be useful. The pilot is a sample itself, although much smaller than is needed to perform the ultimate estimate.
For attribute sampling, calculating the expected results is simple. As noted above, attribute sampling involves discrete possibilities such as off or on. Accordingly, to measure standard deviation, take a small pilot test and then calculate the percentage that has the desired attribute. In the sample size calculator provided below, this percentage is called the “probability of success”.
For variable sampling, the following calculator calculates the standard deviation, or standard deviation for the pilot test. If each observation is input in size (or time) order, the calculator will also plot the best trend line for your data.
Sometimes, standard deviation within a population is predictable; meaning, there is a pattern to the standard deviation. When this occurs, one can break the population into two or more groups, and treat each group as its own population. This has the advantage of decreasing the standard deviation within each group. If we consider these groups, or strata as separate subpopulations, draw separate samples from each stratum, and combine the results, our sampling method is called stratified sampling. To ensure that the stratified sample is representative of the entire population, the size of each stratum is proportional to what exists in the entire population. Stated otherwise, the proportion of the sample taken from each stratum should be equal to the proportion of each stratum in the entire population.
With both of the following calculators, you can alter each of the four variables to immediately see the impact on the sample size.
The following calculator determines the sample size for attribute sampling. As noted in “Sample Size Determinants” above, you will need four inputs.
The following similar calculator determines the sample size for variable sampling.