Skip navigation
Skip Navigation
small header image
Click for menu... About NAEP... Click for menu... Subject Areas... Help Site Map Contact Us Glossary NewsFlash
Sample Questions Analyze Data State Profiles Publications Search the Site
NAEP Technical Documentation
The Nation's Report Card (home page)

Table of Contents  |  Search Technical Documentation  |  References

NAEP Scoring → Trend Rescoring

Trend Scoring

To measure comparability of current-year scoring to the scoring of the same items scored in prior assessment years, a certain number of student responses per item from prior years are retrieved from image archives or rescanned from prior assessment year booklets and loaded into the system with their scores from prior years as the first score. These are loaded into a separate application to keep the data separate from current year scoring.

At staggered intervals during the scoring process, the scoring supervisor releases items from prior assessment years for raters to score. Since prior year scores are pre-loaded as first scores, the current year's teams are 100 percent second-scoring the prior year papers. Following scoring of trend rescore items from prior years, scoring supervisors and trainers look at reliability reports, t-statistics reports, and backreading to gauge consistency with prior year scoring and make adjustments in scoring where appropriate.

The score given to each response is captured, retained, and provided for data analysis at the end of scoring. For each item one of the following decisions is made based on these data:

  • continue scoring the current year responses without changing course;

  • stop scoring and retrain the current group of raters; or

  • stop scoring, delete the current scores, and train a different group of raters.

For the 2000 and 2001 NAEP assessments, the initial release of trend item responses on the image-based scoring system took place very soon after training was completed. Scoring supervisors controlled the number released by asking raters to score a certain amount that totaled the number required. Immediately upon completion, the scoring supervisor accessed the t-statistic report. The acceptable range for the t value was within + or - 1.5 of zero. If the t value was outside that range, raters were not allowed to begin scoring current year responses. Usually this next group of responses was scored successfully. Scoring of current year responses only began after a successful t-test.

These trend items were also released after every break over fifteen minutes long (first thing in the morning and after lunch) to calibrate raters. Scoring resumed only after the trainer and scoring supervisor had determined a plan of action. This was usually accomplished by their studying scored papers from prior assessment years to find trends in scoring. This helped determine what needed to be communicated to the raters before scoring could begin again.

The t-statistic was printed out at the end of every trend release. An interrater agreement (IRA) matrix was also viewed after every trend release. The matrix was used as a tool to determine if the team was scoring too harshly or too leniently. IRAs were required to be within + or - 7 of the trend year reliability. Trainers and scoring supervisors had access to reliabilities for each item from prior years.

This "trend scoring" is not related to the long-term trend assessment. Trend scoring looks at changes over time using main NAEP item responses (e.g., 2000 reading assessment scores for an item compared to the 1998 reading assessment scores for that item). View a table that lists the differences between main NAEP assessment and the long-term trend NAEP assessment.

Last updated 18 June 2008 (MH)

Printer-friendly Version

1990 K Street, NW
Washington, DC 20006, USA
Phone: (202) 502-7300 (map)