Scoring NAEP Science Assessments
The NAEP science items that are not scored by machine are constructed-response items—those for which the student must write in a response rather than selecting from a printed list of multiple choices. Each constructed-response item has a unique scoring guide that identifies the range of possible scores for the item. To measure longitudinal trends in science, NAEP requires trend scoring—replication of scoring from prior assessment years—to demonstrate statistically that scoring is comparable across years. Students' constructed responses are scored on computer workstations using an image-based scoring system. This allows for item-by-item scoring and online, real-time monitoring of science interrater reliabilities and the performance of each individual rater. In the 2000 assessment, some of these items—those that appeared in large-print booklets—required scoring by hand. The 2000 science assessment included 295 discrete constructed-response items. The total number of constructed responses scored was 4,398,021. The number of raters working on the science assessment and the location of the scoring are listed here:
One unique aspect of the science assessment is the use of "hands-on" tasks that are given to students as a part of the assessment. Each student who performs a hands-on task is given a kit with all of the materials needed to conduct the experiment. For the 2000 assessment, a total of 9 hands-on tasks (3 per grade) originally designed for the 1996 assessment were chosen for use, although the actual kits used by the students were new. During scoring of the hands-on task items, raters actually performed the experiment as part of their training. Each student's experiment was scored as a unit because of the inter-connectivity of the questions the student had to answer. Each item's scoring guide identifies the range of possible scores for the item and defines the criteria to be used in evaluating student responses. During the course of the project, each team scores the items using a 2-, 3-, 4-, or 5-point scale as outlined below: Dichotomous Items Short Three-Point Items Extended Four-Point Items Extended Five-Point Items In some cases, student responses do not fit into any of the categories listed in the scoring guide. Special coding categories for the unscorable responses are assigned to these types of responses. These categories are only assigned if no aspect of the student's response could be scored. Scoring supervisors and/or trainers are consulted prior to the assignment of any of the special coding categories. The unscorable categories used for science are outlined below.
Last updated 15 April 2008 (TS) |