Random Sampling

You are here: Sampling > Sampling Methods > Random Sampling

Random Sampling

Random sampling is the least preferred sampling method. In random sampling, you specify the population size p, the number of items to select n and a random seed. Analyzer uses the random seed to initialize a random number generator. It then produces n numbers between zero and p and sorts the numbers in ascending order. Analyzer will not generate the same random number twice. If more than one random number of the same value is generated, it is discarded and replaced by a new one.

Note: The maximum sample size that can be specified for random samples (Record or Monetary Unit Sampling) is 5000.

Remember that for monetary unit samples, the item selected is actually a cent, not a dollar, so it is unlikely that any numbers in a monetary unit sample will be discarded. The implication of this is that in record sampling the same record will not be selected twice, but in monetary unit sampling, the same record might be selected more than once.

Once the list of selections has been established, Analyzer selects those specific items. For example, if the population is 1,000, the sample size is 5 and the random seed is 9084633983, Analyzer might generate the numbers 244, 261, 339, 874 and 985. These specific items would then be selected.

If you use random sampling, be aware that while each item has an equal chance of selection, there is no guarantee that the results will be evenly distributed. In the example above, there is a gap between 339 and 874 of more than 500 items from which no selections were made. An equivalent fixed-interval sample would ensure that no gap exceeded 200.

There is also no top stratum cutoff in random sampling. If our example were a monetary unit sample, it would be possible for one item representing over half of the file to not be selected at all if it fell in the gap noted above. This means that a random sample may under-sample certain segments of the population and material transactions may be completely skipped.

Because there is no way to prevent selection of numbers that are “close” as opposed to “the same” in monetary unit samples, the same entry might be selected more than once or even many times. The result is that random sampling is generally the least suitable sampling method. However, it is available if you need it.

If you check the No Repeats checkbox to limit successive draws, a random sample may produce fewer selections than you asked for. In a random MUS, if two of the random numbers generated are close, they may actually be part of the same record. If this occurs, Analyzer does not select another item. For more information, see Sampling With and Without Replacement.

When performing a random sample with field output (not record output), you can request the inclusion of an additional column in the sample output file called ORDER. This column lists the selection order for the sample items drawn. This is useful when you oversample. For ease of use, you can sort or index the sample output file by the ORDER column to see the sample items in selection order. You can then select the first x number of items and, if necessary, go back and select the next x number of items.