Cell (Random Interval) Sampling

You are here: Sampling > Sampling Methods > Cell (Random Interval) Sampling

Cell (Random Interval) Sampling

Like fixed-interval sampling, cell or random interval sampling is an interval selection method. In cell sampling, you specify an interval and a random seed. Because it is used by Analyzer only to initiate a series of pseudo-random numbers, the random seed can be any value. If you use cell sampling, Analyzer generates one random number for each selection. The data is treated as a stream and broken down into groups of items. For each group of items, a random number greater than zero and less than or equal to the interval size is generated. The item represented by this random number is then selected and the process is repeated for the next group of items. For example, if the interval is 1,100 and the random seed is 349870322, item 429 might be selected from the first group of 1,100 items, then item 1,844 from the second group and so on.

The main advantage of cell sampling is that it automatically avoids problems relating to patterns in the data. A disadvantage is that the entries selected in cell sampling might not be as consistent as those selected in fixed-interval sampling for monetary unit examples.

The lack of consistency results from the fact that an item can span the dividing point between each group and the item therefore appears in two different groups for sampling purposes. One implication of this is that when you apply cell sampling to a monetary unit sample, it is possible for the same record to be selected twice.

This can happen if a record straddles the limit between two intervals. The random number generated for the first interval is high and the random number generated for the second interval is low. Depending on the nature of the transactions, selecting the same item twice may result in under-sampling.

As well, large items that are less than the top stratum cutoff have a slightly reduced chance of being selected. As an example, consider a “worst case” scenario in which you have an interval of 100 and a $99 item that exactly straddles the border between two groups. Depending on the random numbers chosen, the $99 item might not be selected in either of the two groups. In fact, there is about a 25% chance the item will not be chosen at all. Normally, the item will not be missed if it falls mostly into one group. This example shows you one of the possible implications of cell sampling.