Chapter 1: Elementary and Secondary Education - Student Performance in Mathematics and Science

Available data on U.S. student performance in mathematics and science present a mixed picture. Although data show some overall gains in achievement, most students still perform below levels considered proficient or advanced by a national panel of experts. Furthermore, sometimes substantial achievement gaps persist between various U.S. student subpopulations, and U.S. students continue to do poorly in international comparisons, particularly in the higher grades. This section describes long-term trends based on curriculum frameworks developed in the late 1960s, recent trends based on frameworks aligned more closely with current standards, and the performance of U.S. students relative to their peers in other countries.

The National Assessment of Educational Progress (NAEP), also known as "The Nation's Report Card," has charted U.S. student performance for the past 3 decades (Campbell, Hombo, and Mazzeo 2000) and is the only nationally representative, continuing assessment of what students know and can do in a variety of academic subjects, including reading, writing, history, civics, mathematics, and science. NAEP consists of three separate testing programs. The "long-term trend" assessment of 9-, 13-, and 17-year-olds has remained substantially the same since it was first given in mathematics in 1973 and in science in 1969, and it thereby provides a good basis for analyzing achievement trends. [More detailed explanations of the NAEP long-term trend study are available in Science and Engineering Indicators — 2002 (National Science Board 2002) and at http://www.nces.ed.gov/naep3/mathematics/trends.asp.] A second testing program, the "National" or main NAEP, is based on more contemporary standards of what students should know and be able to do in a subject. It assesses students in grades 4, 8, and 12. A third program, "state" NAEP, is similar to national NAEP, but involves representative samples of students from participating states. The NAEP data summarized here come from the long-term trend assessment and the national NAEP. Chapter 8 covers the considerable variation by state.

The most recent NAEP long-term trend assessment took place in 1999. Because the 1999 NAEP data have already been reported widely (including in the 2002 version of this report), this chapter only summarizes the main findings.

Trends in Mathematics and Science Performance: Early 1970s to Late 1990s

The NAEP trend assessment shows that student performance in mathematics improved overall from 1973 to 1999 for 9-, 13-, and 17-year-olds, although not at a consistent rate across the 3 decades (Campbell, Hombo, and Mazzeo 2000) (figure 1-1

). In general, declines occurred in the 1970s, followed by increases in the 1980s and early 1990s and relative stability since that time.[1] The average performance of 9-year-olds held steady in the 1970s, increased from 1982 to 1990, and showed additional modest increases after that. For 13-year-olds, average scores improved from 1978 to 1982 with additional improvements in the 1990s. The average performance of 17-year-olds dropped from 1973 to 1982, rose from 1982 to 1992, and has since remained about the same, resulting in an overall gain from 1973 to 1999.

Average student performance in science also improved from the early 1970s to 1999 for 9- and 13-year-olds, although again, not consistently over the 3 decades. Achievement declined in the 1970s and increased in the 1980s and early 1990s, holding relatively stable since that time. By 1999, increases had overcome the declines of the 1970s. In 1999, 9-year-olds' average performance was higher than in 1970. Among 13-year-olds, average performance in 1999 was higher than in 1973 and essentially the same as in 1970. By 1999, 17-year-olds had not recouped decreases in average scores that took place during the 1970s and early 1980s. This resulted in lower performance in 1999 than in 1969 when NAEP first assessed 17-year-olds in science.

The NCLB Act requires every student, regardless of poverty level, sex, race, ethnicity, disability status, or English proficiency, to meet challenging standards in mathematics and science. Patterns in the NAEP long-term trend data can show whether the nation's school systems are providing similar learning outcomes for all students and whether performance gaps between different groups of students have narrowed, remained steady, or grown.

Performance Trends for Males and Females

In general, the average performance of both males and females in mathematics improved from the early 1970s to the late 1990s, including the period from 1990 to 1999 (Campbell, Hombo, and Mazzeo 2000). For 9- and 13-year-olds, differences in average mathematics scores shifted from favoring females in the 1970s to favoring males by the 1990s (figure 1-2

and appendix table 1-1

). Among 17-year-olds, the performance gap that favored males in 1973 had narrowed by 1999. By 1999, none of the apparent sex differences in mathematics performance were statistically significant. In science, average scores tended to favor males through 1999, although the apparent difference in 1999 for 9-year-olds was not statistically significant. The gender gap in science has remained relatively stable for 9- and 13-year olds, but it narrowed for 17-year-olds between 1969 and 1999.

Performance Trends for Racial/Ethnic Subgroups

In every racial/ethnic subgroup, a general trend of improved mathematics performance occurred over the past 3 decades. Scores for white, black, and Hispanic students, regardless of age, were higher in 1999 than in 1973 (Campbell, Hombo, and Mazzeo 2000). (Trends for other racial/ethnic groups are not reported because the samples for these groups are too small to analyze separately.) However, during the 1990s, although the performance of white students increased for each age group, the performance for blacks in each age group and for Hispanic 9- and 13-year-old students remained flat. The performance of Hispanic 17-year-olds increased from 1990 to 1999.

In science, scores for 9- and 13-year-olds from each racial/ethnic subgroup in 1999 were higher than in the year NAEP first assessed a particular subgroup (1970 for whites and blacks, 1977 for Hispanics) but held steady from 1990 to 1999. Among 17-year-olds, science performance trends varied. White students in that age group had lower scores in 1999 than in 1969, although the average score did increase between 1990 and 1999. The performance of black 17-year-old students was about the same in 1969, 1990, and 1999. Science scores of Hispanic 17-year-olds were higher in 1999 than in 1969 and increased from 1990 to 1999.

Despite improved performance overall from the 1970s to the late 1990s for all racial/ethnic subgroups studied, significant performance gaps persist among these subgroups (figure 1-3

and appendix table 1-2

). In mathematics, the sizable gap between white and black students of all ages in 1973 narrowed until 1986 but remained relatively stable in the 1990s. Even larger performance gaps exist between white and black students in science. These gaps narrowed somewhat from 1970 to 1999 for 9- and 13-year-olds but remained essentially unchanged among 17-year-olds from 1969 to 1999. To place these gaps in perspective, in 1999 in mathematics, black students averaged about 30 points lower than did white students; in science, scores ranged from 39 to 52 points lower than those of white students, depending on the age level. These differences are roughly the same size as the differences between the average 13-year-old and 17-year-old in these subjects (figure 1-1

Substantial gaps also exist between Hispanic and white students at each grade level for both mathematics and science. Among 9-year-olds, the mathematics gap favoring white students widened between 1982 and 1999. Hispanic-white mathematics performance differences for 13- and 17-year-olds persist but have lessened over the past 3 decades. In science performance, even larger gaps exist. For 9-year-olds, the science gap did not narrow overall. The 1977 science gap for 13-year-olds narrowed during the 1980s and early 1990s, but by 1999, it had returned to nearly the 1973 level. The score difference between 17-year-old white and Hispanic youth did increase at several points in time, but by the end of the 1990s, was at the same point as in the late 1970s. The white-Hispanic differences in average scale scores in 1999 ranged from 22 to 26 points in mathematics and from 30 to 39 points in science (figure 1-3

Racial/ethnic subgroups differ in several characteristics generally agreed to influence academic achievement. For example, black and Hispanic students' parents have less education compared with the parents of white students, and black and Hispanic students are more likely to live in poverty (Peng, Wright, and Hill 1995). Economic hardship and low education levels can limit parents' ability to provide stimulating educational materials and experiences for their children (Hao 1995; and Smith, Brooks-Gunn, and Klebanov 1997). Appendix table 1-3

illustrates the persistent achievement gaps between students whose parents have different levels of education.

Recent Performance in Mathematics and Science

Thus far, this section has presented NAEP results based on the long-term trend assessments, which use the same items each time. The next analysis uses data from the national NAEP program, which updates instruments to measure the performance of students based on more current standards. These assessments are based on frameworks developed through a national consensus process involving educators, policymakers, assessment and curriculum experts, and representatives of the public, then approved by the National Assessment Governing Board (NAGB).

NAEP first developed a mathematics framework in 1990, then refined it in 1996 (NCES 2001c).[2] It contains five broad content strands (number sense, properties, and operations; measurement; geometry and spatial sense; data analysis, statistics, and probability; and algebra and functions). The assessment also tests mathematics abilities (conceptual understanding, procedural knowledge, and problem solving) and mathematical power (reasoning, connections, and communication). Along with multiple-choice questions, assessments include constructed-response questions that require students to provide answers to computation problems or describe solutions in sentence form.

NAEP developed the science framework in 1991 and used it in the 1996 and 2000 assessments (NCES 2003c). It includes a content dimension divided into three major fields of science (earth, life, and physical) and a cognitive dimension covering conceptual understanding, scientific investigation, and practical reasoning. The science assessment also relies on both multiple-choice and constructed-response test questions. A subsample of students in each school also conduct a hands-on task and answer questions related to that task.

Student performance on the national NAEP is classified according to three achievement levels developed by NAGB that are based on judgments about what students should know and be able to do. The basic level represents partial mastery of the knowledge and skills needed to perform proficient work at each grade level. The proficient level represents solid academic performance at grade level and the advanced level signifies superior performance. Disagreement exists as to whether NAEP has appropriately defined these levels, but they do provide a useful benchmark for examining recent changes in achievement.[3]

The proportion of fourth and eighth grade students reaching at least the proficient level in mathematics increased by a few percentage points from 1996 to 2000, when just over one-fourth of fourth and eighth grade students scored at or above that level (NCES 2001c) (figure 1-4

). Among 12th graders, only 17 percent reached that level. Approximately one-third of students at each grade level scored below the basic level in 2000. The proportion of fourth and eighth grade students scoring below the basic level decreased from 1996 to 2000, but the proportion for 12th graders increased.

In general, the 2000 science results mirror the mathematics results (NCES 2003c). Only a minority of students reached the proficient level, and at least one-third of students at each grade level did not reach the basic level. Among 12^th graders, that figure approached half, an increase from 1996. Across both subjects, very few students performed at the advanced level (only 2 to 5 percent).

Mathematics and Science Proficiency for Males and Females

Like the NAEP long-term assessment program, the national NAEP assessment reports results by subgroups, which allows comparisons of achievement levels among different subgroups. In 2000, similar percentages of males and females in each grade reached at least the basic level in mathematics (figure 1-5

). However, more males scored at or above the proficient level. The 2000 mathematics results show improvement over 1996 for both sexes in the percentage scoring at or above the basic level in grade 4, but a decline in grade 12 (appendix table 1-4

The 2000 science results show that a greater percentage of males than females in both grades 4 and 8 attained at least the basic level, and higher percentages of males at each grade level scored at or above the proficient level. The period between 1996 and 2000 saw no significant change in the proportion of females scoring at or above basic, or at or above proficient. Males in grade 12 registered a decline in the percentage at or above the basic level, and males in grade 8 registered an increase in the percentage at or above proficient (appendix table 1-4

Mathematics and Science Proficiency by Racial/Ethnic Subgroups

Variations in performance levels across racial/ethnic groups are more apparent than variations between males and females (figure 1-6

). At each grade level in mathematics in 2000, higher proportions of white and Asian/Pacific Islander students (when scores for the latter group were reported) scored at or above the basic and proficient levels compared with black, Hispanic, and American Indian/Alaskan Native students. Among 12th grade students, 74 percent of white students and 80 percent of Asian/Pacific Islander students scored at or above the basic level compared with 31 percent of blacks, 44 percent of Hispanics, and 57 percent of American Indians/Alaskan Natives. Overall, black students had the lowest percentage scoring both at or above the basic level and at or above the proficient level. Only one statistically significant change occurred from 1996 to 2000: the proportion of white fourth grade students scoring at or above the proficient level in mathematics increased (appendix table 1-5

). These differences in mathematics performance across racial/ethnic groups are evident even when children begin school (Denton and West 2002). Children from low-income and minority family backgrounds start kindergarten at a disadvantage in mathematics knowledge and skills. This disadvantage persists throughout kindergarten and into the first grade. By the first grade, black and Hispanic children are less likely than white children to solve addition, subtraction, multiplication, and division problems, and children from poor families are also less likely than those from nonpoor families to demonstrate proficiency in these areas.

Similar racial/ethnic differences hold true for science. In 2000, higher percentages of white and Asian/Pacific Islander students scored at or above the basic level and at or above the proficient level at each grade level compared with their black, Hispanic, and American Indian/Alaskan Native counterparts. Black students at all grade levels were least likely to reach these performance goals. Only one statistically significant change occurred from 1996 to 2000, a decrease in the proportion of white 12th graders reaching or exceeding the basic level (appendix table 1-5

Mathematics Achievement in High-Poverty Schools

Poverty is negatively associated with student achievement. Analyses of NAEP 2000 mathematics data show that fourth graders in schools with higher proportions of students eligible for the Free/Reduced-Price Lunch Program, a commonly used indicator of poverty, tend to have lower scores (NCES 2002a) (figure 1-7

.) [4] This pattern occurred among eligible and not eligible students. These high-poverty schools also enrolled a greater percentage of black and Hispanic students and had higher rates of absenteeism, a lower proportion of students with a very positive attitude toward academic achievement, and lower levels of parent involvement in school activities (NCES 2002a).

International Comparisons of Mathematics and Science Performance

Two international assessment programs collected data on student performance in mathematics and science during the past decade. The 1995 Third International Mathematics and Science Study (TIMSS) involved 41 nations and studied the performance of fourth and eighth grade students as well as students in their final year of secondary school (12th grade in the United States). Four years later, a repeat study focused on the performance of eighth graders (TIMSS-R) in 38 countries. In 2000, the Program for International Student Assessment (PISA), organized by the Organisation for Economic Co-operation and Development (OECD), assessed 15-year-olds from 32 countries in reading, mathematics, and science.

The design and purpose of the two assessment programs differ somewhat (Nohara 2001). TIMSS and TIMSS-R measured students' mastery of curriculum-based scientific and mathematical knowledge and skills. PISA assessed students' scientific and mathematical "literacy," with the aim of understanding how well students can apply scientific and mathematical concepts and thinking skills to real-life challenges and nonschool situations. The TIMSS and TIMSS-R findings have been reported extensively, including in the two most recent editions of Science and Engineering Indicators (National Science Board 2000 and 2002). Therefore, this section only briefly reviews the main findings from TIMSS and TIMSS-R, and devotes more coverage to the PISA findings.

Achievement of Fourth and Eighth Grade U.S. Students on TIMSS and TIMSS-R

In 1995, U.S. students performed slightly better than the international average in mathematics and science in grade 4, but by grade 8, their relative international standing had declined, and it continued to erode through grade 12 (figure 1-8

). Of the 25 other countries participating in the fourth grade component of the assessment, 12 had lower average mathematics scores than the United States, 6 had equivalent average scores, and 7 had higher average scores. In science, 19 countries had lower scores, 5 had equivalent scores, and 1 had a higher score. Not all nations participated in every aspect of the TIMSS assessment.

U.S. eighth graders scored below the international average in mathematics but above the international average in science (NCES 1997b). However, nine countries outperformed the United States compared with only one in the fourth grade science assessment.

The fourth and eighth grade results from the 1995 TIMSS study suggest that U.S. students perform less well on international comparisons as they advance through school. TIMSS-R, by enabling comparisons between the relative international standing of U.S. fourth grade students in 1995 and U.S. eighth grade students 4 years later, tended to confirm this interpretation (NCES 2000b).

Achievement of 12th Grade U.S. Students on TIMSS

TIMSS assessed the mathematics and science performance of students in their final year of secondary school (12th grade in the United States).[5] It included a test of general knowledge of mathematics and science for all students and a more specialized assessment for students enrolled in advanced courses. U.S. 12th graders performed below the 21-country international average on the TIMSS test of general knowledge in mathematics and science (NCES 1998).

U.S. students taking advanced mathematics and science courses also did not fare well in comparison with their international counterparts. The advanced mathematics assessment was administered to students in 15 other countries who were taking or who had taken advanced mathematics courses and to U.S. students who were taking or who had taken precalculus, calculus, or Advanced Placement (AP) calculus. Among students who participated in the advanced assessment, U.S. students registered lower average scores compared with their international counterparts, even though the United States tends to have fewer young people taking advanced mathematics and science courses relative to other countries. A total of 11 nations outperformed the United States, and 4 nations scored similarly. No nation scored significantly below the United States.

TIMSS administered the advanced science assessment, a physics assessment, to students in 15 other countries who were taking science courses and to U.S. students who were taking or had taken physics I and II, advanced physics, or AP physics. U.S. students performed below the international average, with 14 countries having average scores higher than the United States, and 1, Australia, having an average score equivalent to that of the United States.

Mathematics and Science Literacy of U.S. 15-Year-Olds on PISA

OECD first conducted PISA in 2000 and plans two additional assessments at 3-year intervals (NCES 2001d). Although PISA 2000 concentrated on reading, it did include some mathematics and science items.

In both mathematics and science literacy, U.S. student performance did not differ from the average performance of students in the other OECD countries (appendix table 1-6

and 1-7

). Of the seven countries that had significantly higher average science scores, all also had higher average mathematics scores (Australia, Canada, Finland, Japan, New Zealand, South Korea, and the United Kingdom). In addition, Switzerland significantly outperformed the United States in mathematics. A common set of six countries had average scores significantly lower than the United States in both mathematics and science: Brazil, Greece, Latvia, Luxemburg, Mexico, and Portugal.

Subgroup Differences in Mathematics and Science Literacy

A recent report released by the U.S. Department of Education (NCES 2001d) considers PISA score differences by sex, parents' education, parents' occupation, parents' national origin, and language spoken in the home. Findings reveal no statistically significant sex difference among U.S. 15-year-olds in mathematics. This was also true for 16 other countries that participated in PISA; however, males outperformed females in mathematics in 14 countries. In science literacy, male and female students in the United States, as in most other nations, performed equally well. This absence of sex differences in mathematics and science literacy in the United States is generally consistent with findings from the NAEP, TIMSS, and TIMSS-R assessments, all of which assess more curriculum- and school-based achievement.

PISA also collected information on parents' education levels and occupation, both of which have been linked to student achievement (Coleman et al. 1966; NCES 2000b and 2001c; West, Denton, and Reaney 2000; and Williams et al. 2000). PISA data indicate that parents' education level and occupation are more strongly associated with mathematics and science literacy in the United States than in some other countries, although links between parents' education level and student achievement existed in all PISA countries (NCES 2001d). For example, in every country, students whose parents have college degrees outperformed students whose parents did not have a high school diploma. However, in only 12 of 29 countries, including the United States, students whose parents graduated from college scored higher in science literacy than students whose parents completed high school but not college. In the remaining countries, science performance did not differ between the subgroups of students with these two levels of parental education. A stronger association between parents' occupation and student mathematics and science literacy existed in the United States compared with some other PISA countries. In Finland, Iceland, Japan, Latvia, and South Korea, the relationship between parents' occupation and mathematics and science literacy was smaller than it is the United States; for mathematics, the relationship was also smaller in Canada and Italy. No country had a stronger relationship than the United States between parents' occupation and student performance on PISA's mathematics and science portions.

Students who are foreign born or who have foreign-born parents face challenges in adjusting to a new country and a new school system. According to PISA data, approximately 13 percent of U.S. students have parents who were both born outside the United States. In about half of the participating countries that reported this data (15 of 26), including the United States, students whose parents were both native-born scored significantly higher in mathematics. In the United States, no difference in science literacy by parent nativity existed, although differences did exist in 17 of 26 participating countries.

U.S. schools educate many students who speak a language other than English at home. In 19 of the 28 nations that reported data on students' home language, including the United States, students who spoke the language of the assessment at home scored better in mathematics literacy than students who did not. U.S. students registered a greater difference in mathematics performance by home language than the average OECD difference. In science, in 21 of 28 participating nations, including the United States, students who spoke the language of the assessment at home scored better than those who did not. Many PISA items impose a fairly high reading (and sometimes writing) load, which contributes to home language effects.

Footnotes

[1] The NAEP data are based on sample surveys. All trends and changes reported in this section are statistically significant at the .05 level.

[2] The revision to the 1990 framework reflects recent curricular changes, but assessments are connected to permit trend measurement through 2003. The 2005 assessment will have a new framework.

[3] A study commissioned by the National Academy of Sciences judged the process used to set these levels "fundamentally flawed" (Pellegrino, Jones, and Mitchell 1998), and NAGB acknowledges that considerable controversy remains over the setting of achievement levels (Bourque and Byrd 2000). NCES considers the achievement levels developmental and warns that they should be used and interpreted with caution (NCES 2001c). Because the levels are set by panels of experts separately by grade level and subject, meaningful comparisons across grades or subjects are not possible.

[4] Similar analyses were not conducted using the grade 8 and grade 12 data. Using participation in the Free/Reduced-Price School Lunch Program as a proxy for poverty level is not reliable at higher grades because older students may attach stigma to receiving a school lunch subsidy and choose not to participate.

[5] NAEP has identified problems related to testing 12th grade students (NCES 2001c). Compared with students in fourth and eighth grades, they are less likely to participate, more likely to omit responses, and much less likely to indicate that they thought it either important or very important to do well on the test. If students do not try their best NAEP may underestimate their achievement. Whether similar patterns exist in other countries is not known.

Elementary and Secondary Education