Skip navigation
Skip Navigation
small header image
Click for menu... About NAEP... Click for menu... Subject Areas... Help Site Map Contact Us Glossary NewsFlash
Sample Questions Analyze Data State Profiles Publications Search the Site
NAEP Technical Documentation
The Nation's Report Card (home page)

Table of Contents  |  Search Technical Documentation  |  References

The NAEP Database → Quality Control of NAEP Data Entry

Quality Control of NAEP Data Entry

         

Summary of Quality Control Error Analysis for NAEP 2000 Data Entry


To evaluate the effectiveness of the quality control of the data entry process, the corresponding portion of the final integrated database is verified in detail against a sample of the original instruments received from the field. Overall, observed error rates are comparable from year to year.

Quality Control Analysis

The purpose of a quality control analysis is to assess the quality of the data resulting from the complete data entry system, beginning with the actual instruments collected in the field and ending with the final machine-readable database used in the analyses. The process involves the selection of instruments at random from among those returned from the field and the comparison of each entire instrument, character by character, with its representation in the final database. In this way, it is possible to measure the error rates in the data as well as the success of the data entry system.

The observed error rate cannot be taken at face value. For example, let us say a sample of school questionnaires that were selected for close inspection contained two errors out of a total of 2,251 characters. To conclude that the entire school questionnaire database has an error rate of 2 per 2,251 or .0009, would be too optimistic; we may simply have been lucky (or unlucky) with this particular random sample. What is needed is an indication of how bad the true error rate might be, given what we observed. Such an indication is provided by confidence limits. Confidence limits indicate how likely it is that a value falls inside a specified range in a specified context or distribution. In this analysis, the specified range is an error rate between zero and some maximum value beyond which we are confident at a specified level (traditionally 99.8 percent) that the true error rate does not lie. The specified context or distribution turns out to be the cumulative binomial probability distribution. An example will demonstrate this technique:

Let us say that 1,000 booklets were processed, each with 100 characters of data transcribed for a total of 100,000 characters. Let us say further that 5 of these characters were discovered to be in error in a random sample of 50 booklets that were completely checked; in other words, five errors were found in a sample of 5,000 characters. The following expression may be used to establish the probability that the true error rate is .0025 or less, rather than the single-value estimate of the observed rate, one in a thousand (.001):

The sum over j equals zero to five of 5000 choose j times .0025 to the jth power times the quantity one minus .0025 to the 5000 minus j power equals .0147

This is the sum of the probability of finding five errors plus the probability of finding four errors plus . . . etc. . . . plus the probability of finding zero errors in a sample of 5,000 with a true error rate of .0025; that is, the probability of finding five or fewer errors by chance when the true error rate is .0025. Notice that we did not use the size of the database in this expression. Actually, the assumption here is that our sample of 5,000 was drawn from a database that is infinite. The smaller the actual database is, the more confidence we can have in the observed error rate; for example, had there been only 5,000 in the total database, our sample would have included all the data, and the observed error rate would have been the true error rate. The result of the above computation allows us to say, conservatively, that .0025 is an upper limit on the true error rate with 98.53 percent (i.e., 100-1.47) confidence; that is, we can be quite sure that our true error rate is no larger than .0025. For NAEP quality control we use a more stringent confidence limit of 99.8 percent, which yields an even more conservative upper bound on the true error rate; with 99.8 percent confidence, we would state that the true error rate in this example is no larger than .0031, rather than .0025.

Calculations of true probabilities based on a combinatorial analysis have been done (e.g., Grant 1964). Even when the sample was as much as 10 percent of a population of 50, the estimate of the probability based on the binomial theorem was not much different from the correct probability. NAEP does not sample at a rate greater than about 2 percent. Thus, the computations of the upper limits on the true error rates based on the binomial theorem are likely to be highly accurate approximations.

Last updated 02 June 2008 (TS)

Printer-friendly Version

1990 K Street, NW
Washington, DC 20006, USA
Phone: (202) 502-7300 (map)