home > classroom
Untitled Document
Sampling: How is it done?
1. What questions are being asked of the data?
Before collecting any data, it is essential to define clearly what information is required. It is easy to waste time and resources collecting either the wrong data, or not collecting enough information at the time of data collection. Try to anticipate questions that will be asked when analyzing the data. What additional information would be desirable? When collecting data, it is easy to record additional information; trying to track information down later is far more difficult, and may not be possible.
2. Determine the frequency of sampling.
The frequency of sampling refers to how often a sample should be taken. A sample should be taken at least as often as the process is expected to change. Examine all factors that are expected to cause change, and identify the one that changes most frequently. Sampling must occur at least as often as the most frequently changing factor in the process. For example, if a process has exhibited the behavior shown in the diagram below, how often should sampling occur in order to get an accurate picture of the process?
Factors to consider might be changes of personnel, equipment, or materials. The questions identified in step 1 may give guidance to this step.
Common frequencies of sampling are hourly, daily, weekly, or monthly. Although frequency is usually stated in time, it can also be stated in number: every tenth part, every fifth purchase order, every other invoice, for example. If it is not clear how frequently the process changes, collect data frequently, examine the results, and then set the frequency accordingly.
3. Determine the actual frequency times.
The purpose of this step is to state the actual time to take the samples. For instance, if the frequency were determined to be daily, what time of day should the sample be taken—in the morning at 8:00 am, around midday, or late in the day around 5:00 pm? This is important because inconsistent timing between data gathering times will lead to data that is unreliable for further analysis. For example, if a sample is to be taken daily, and on one day it is taken at 8:00 am, the next day at 5:00 pm, and the following day at midday, the timing between the samples is inconsistent and the collected data will also be inconsistent. The data will exhibit unusual patterns and will be less meaningful. Stating the time that the sample is to be taken will reduce this type of error. The actual time should be chosen as close to any expected changes in the process as possible, and when taking a sample will be convenient. Avoid difficult times, such as during a shift change or lunch break.
4. Select the subgroup (sample) size.
A subgroup (or sample) is the number of items to be examined at the same time. The terms “subgroup” and “sample” may be used interchangeably. When doing calculations, subgroup size is denoted by the letter n. To choose the most appropriate subgroup size, determine first whether the data being collected is “variables data” or “attributes data.”
For variables data:
When measuring variables data, a subgroup size larger than one is preferable because larger subgroups sizes yield greater possibilities for analysis. However, it may not be possible to get a subgroup size larger than one. Some examples of this are electricity usage per month, profit per month, sales per month, temperature of a room, and the viscosity of a fluid. In situations such as these when a subgroup size larger than one does not make sense, the subgroup (or sample) size is equal to one.
If a subgroup size larger than one can be chosen, the size is usually between three and eight. A subgroup size between three and eight has been determined to be statistically efficient. The most commonly-used subgroup size is five. When more data is desired, the frequency of taking samples, not the subgroup size, should be increased.
When a sample is taken, it should be selected to assure that conditions within the sample are similar. If gathering a sample size of five, for example, take all five pieces in a row as they are produced in the process. This is known as a rational subgroup.
For attributes data:
The subgroup size for attributes data depends on the process being sampled. The general rule of thumb is to gather a large enough sample so that all possible characteristics being investigated will appear. That is, the sample is large enough that a “0” occurrence is rare.
Begin by answering the question, “How many items does this process produce during the frequency interval (per hour, week, etc.)?” When that number is determined, the sample size should be at least the square root of that number. For instance, if a purchasing department processes 100 purchase orders per week, an appropriate sample size would be 10 purchase orders per week (the square root of 100 is 10.)
The above article is an excerpt from the "Sampling" chapter of Practical Tools for Continuous Improvement Volume 1 Statistical Tools. The full chapter provides more details on setting sample size and frequency. This reference book is available from PQ Systems.
|