Design Goals: NAEP 2002 and Beyond
In preparation for the 2002 assessment, NCES commissioned several authors to propose ways of streamlining and redesigning NAEP so that it would become possible to report results within 6 months of the completion of data collection. Based on the ideas originally developed for these design papers, NAEP implemented several design changes in 2002. These changes are too new to have been documented in NAEPs technical reports or other publications and are briefly summarized below.
With the expansion of NAEP under the No Child Left Behind
Act, NAEP began conducting biennial state-level assessments,
administered by contractor staff (not local teachers). The
newly redesigned NAEP has four important features. First,
NAEP now administers tests for different subjects (such
as mathematics, science, and reading) in the same classroom,
thereby simplifying and speeding up sampling, administration,
and weighting. Second, NAEP conducts pilot tests of candidate
items for the next assessment two years in advance and field
tests of items for pre-calibration one year in advance of
data collection, thereby speeding up the scaling process.
Third, NAEP conducts bridge studies, administering tests
both under the new and the old conditions, thereby providing
the possibility of linking old and new findings. Finally,
NAEP is adding more test questions at the upper and
lower ends of the difficulty spectrum, thereby increasing
NAEPs power to measure performance gaps.
Testing different subjects in the same classroom
Previously, NAEP tests in different subjects were designed independently. For some subjects, the short background questionnaire was given before the test, while for others, it was given afterwards. For some subjects, the test was given in three 15-minute blocks; for others, it was given in two 20-, 25-, or 30-minute blocks. As a result, tests for different subjects could not be administered in the same room without instructions for one test interrupting administration of another, and the process of data collection and analysis becoming unnecessarily complicated.
To solve this administrative problem, NAEP has adopted
a standard test structure for all subjectstwo 25-minute
blocks of test questions, followed by two short blocks of
background questions. Common block timings permit assessing
different subjects in the same classroom, reduce the number
of classrooms required, require fewer numbers of students
per subject in each school (increasing precision of the
findings), permit simultaneously pretesting questions that
are not yet operational, and simplify the development of
sampling weights. In U.S. history, geography, and reading,
the only required change is shifting the order of the background
and test question blocks; but in mathematics and science,
the blocks of test questions must be reconfigured.
Shortened time for report production
Previously, NAEP has conducted weighting, scoring, and
scaling after the completion of data collection. No further
reductions in reporting time could be squeezed out of the
previous design. Weighting has been speeded by reducing
the number of different sets of sample weights. Scoring
is now conducted in parallel in distributed scoring centers.
To speed scaling, NAEP will pretest questions twice, two
years and one year in advance. The latter pretest, with
larger, representative samples, will permit calibration
of items prior to operational testing and thereby accelerate
scaling. NCES will look for ways to streamline its checking
and approval of draft reports. Pretesting questions two
years in advance will require longer lead time for development
of tests in each subject.
Bridge studies to ensure comparability
Reconfiguring NAEPs test questions into blocks of different lengths in mathematics and science and changing the order of the background and assessment questions in reading could change scale parameters, reducing comparability of current NAEP scores with those of past assessments. The solution to this problem is to conduct supplemental NAEP surveys in which the same test questions are administered under both the old and the new designs. The resulting data permit measurements of the impact of design changes and bridge past NAEP results with those of the new design.
Capacity to measure gaps in achievement
The accuracy of NAEP scores for a subgroup depends principally
on two factors: the size of the subgroup sample and the
accuracy of the test in the range in which the subgroup
scores. NAEP ensures adequate sample sizes of groups for
which NAEP measures gaps by targeting needed students for
oversampling and, if necessary, by increasing state sample
sizes. In addition, NAEP ensures adequate precision in the
upper and lower ranges of the NAEP tests by adding more
test questions at both ends of the difficulty range.
For more detailed information of each of above design principles
implemented for 2002 under the expansion of NAEP, view Design
Principles: 2002 and Beyond (120K PDF File )
by Andrew Kolstad, Andrew.Kolstad@ed.gov.
|
|