Scoring NAEP Mathematics AssessmentsThe NAEP mathematics items that are not scored by machine are constructed-response items—those for which the student must write in a response rather than selecting from a printed list of multiple choices. Each constructed-response item has a unique scoring guide that identifies the range of possible scores for the item. To measure longitudinal trends in mathematics, NAEP requires trend scoring—replication of scoring from prior assessment years—to demonstrate statistically that scoring is comparable across years. Students' constructed responses are scored on computer workstations using an image-based scoring system. This allows for item-by-item scoring and online, real-time monitoring of mathematics interrater reliabilities, as well as the performance of each individual rater. In the 2000 assessment, some of these items—those that appeared in large-print booklets—required scoring by hand. The 2000 mathematics assessment included 199 discrete constructed-response items. The total number of constructed responses scored was 3,856,211. The number of raters working on the mathematics assessment and the location of the scoring are listed here:
Each constructed-response item has a unique scoring guide that identifies the range of possible scores for the item and defines the criteria to be used in evaluating student responses. During the course of the project, each team scores constructed-response items using a 2-, 3-, or 5-point scale as outlined below: Dichotomous Items Short Three-Point Items Extended Five-Point Items Early (1990) mathematics constructed-response items used a 1 = incorrect and 7 = correct rating scale. Several of these items also tracked how a student approached the problem by expanding the rating 1 to [1, 2, and 3] or by expanding the rating 7 to [6 and 7.] An example of this would be if the student was asked to draw a figure with four 90 degree angles. A student's response that was rated 6 or 7 was correct; 6 tracked the 'square' while 7 tracked the 'rectangle' response. An example of a response that would be rated as incorrect would be one for which the student renamed incorrectly in a subtraction problem and therefore got an incorrect response. This might be tracked as a 2. In some cases, student responses do not fit into any of the categories listed on the scoring guide. Special coding categories for the unscorable responses are assigned to these types of responses. These categories are only assigned if no aspect of the student's response can be scored. Scoring supervisors and/or trainers are consulted prior to the assignment of any of the special coding categories. The unscorable categories for mathematics are outlined below.
Special studies are also included in the mathematics assessment. When the special study item is the same as the operational item, the responses are scored together within one team. Last updated 15 April 2008 (TS) |