NAEP Scoring NAEP Scoring
Three types of cognitive items are scored for NAEP. Multiple-choice item responses are captured by high-speed scanners during student booklet processing. Short constructed-response items (those with two or three valid score points) and extended constructed-response items (those with four or more valid score points) are scored by trained personnel using images of student responses also captured during processing. Scoring a large number of short and extended constructed responses with a high level of accuracy and reliability within a limited time frame is essential to the success of NAEP. To ensure reliable, efficient scoring, NAEP takes the following steps:
The table below presents a general overview of recent NAEP scoring activities.
The table below presents a general overview of recent NAEP Long Term Trend scoring activities.
As new NAEP items are created, tested, and improved, test development staff creates scoring guides using as specific examples a range of actual student responses captured by the materials-processing staff. Aided by the scoring staff, the test development staff creates training materials matching the assessment framework criteria. For future assessments, continuous documentation assures that the scoring staff will train and score the item in the same way that it was originally implemented. This repeatability allows reporting on trends in student performance over time. NAEP Scoring Staff Scorers score student responses. Scoring supervisors provide logistical support to the trainers and help monitor team activities. Trainers are responsible for training both scorers and supervisors on specific content and for assuring that team scoring performance meets expectations. Content leads for each subject area (Reading, Science, etc) oversee the trainers and provide support as needed. Scorers must have a minimum of a baccalaureate degree from a four-year college or university. An advanced degree and scoring experience and/or teaching experience are preferred. In some subjects, scorers must complete a placement test, used as a tool to identify scorers with appropriate content knowledge. During the training process, scoring teams will be trained so that each student response can be scored consistently. Following training, for all extended response items and some other items, each scorer qualifies by passing a qualification set consisting of pre-scored student responses. Individual scorer performance data are retained. Scoring supervisors and trainers are selected based upon many factors, including their previous experience, educational and professional backgrounds, the demonstration of a strong understanding of the scoring criteria, and strong interpersonal communication skills and organizational abilities. NAEP scoring teams usually consist of 10-12 scorers who are led by a scoring supervisor and a trainer. Prior to the scoring effort, all personnel are intensively trained. The trainers who train the individual scorers, the supervisors who oversee a group of scorers, and the scorers themselves are all given both general scoring training and item-specific content training. NAEP Scoring System The NAEP electronic scoring system offers the latest technology coupled with secure network communications to transmit images of student responses to the trained scorers and to receive back the scores given by them. Student responses are scanned from the original test booklets; the actual test booklets can be accessed and referenced if needed. The scorer sees each student response in isolation on a computer video screen and assigns a score. As each response is scored, another student response is shown for scoring, until all responses for an item have been scored. During scoring, the NAEP electronic scoring system provides documentation of numerous scoring metrics. Reports on item and scoring performance can be retrieved as needed. In addition, custom reports of daily activities are sent out nightly to development, scoring, and analysis staff to monitor NAEP scoring quality and progress. All assessments are scored item by item so that scorers train on one item and one scoring guide at a time. This method is efficient only with electronic presentation of student responses. NAEP Scoring Procedures During the scoring of a particular item, a percentage of scored responses are randomly recirculated by the system to be rescored by a second scorer in order to check the consistency of current-year scoring. (Five percent of responses are second-scored for large state samples, and 25 percent of responses are second-scored for smaller national samples.) This comparison of first and second scores yields the within-year interrater agreement. In addition, NAEP trend scoring is used to compare the consistency of scoring over time (i.e., cross-year interrater agreement). During trend scoring, the NAEP electronic scoring system allows for the presentation of a pool of scored responses from a prior assessment to current scorers. Comparing current scores to the scores given in the prior assessment yields the cross-year reliability. Backreading of current year responses ensures frequent monitoring of scorer decision-making by supervisory staff. Through the backreading process, supervisors review student responses and scores given by each scorer to check if each scorer is applying the scoring guides correctly. About five percent of each scorer's output is monitored by backreading. During training and scoring, any changes to existing documentation are captured by scoring staff, shared across scoring teams, and incorporated into the history of the NAEP item. This is reviewed prior to the next scoring effort. Last updated 04 March 2009 (RF) |