Skip Navigation
small header image
Trends in International Mathematics and Science Study (TIMSS)

Frequently Asked Questions About the Assessment

  1. Why does the United States participate in international assessments? We already know how well our students are doing based on results from state tests and from the National Assessment of Educational Progress.
  2. What international assessments do we participate in, and what do they measure?
  3. How valid are international assessments? Are test questions that are appropriate for students in one country necessarily appropriate for students in another country?
  4. How can you be sure that countries administer the test in the same way?
  5. How do international assessments deal with the fact that educational systems are so different—for example, students start school at different ages, are tracked or streamed differently, etc?
  6. How do international assessments deal with the fact that student populations in participating countries are so different —for example, the U.S. has higher percentages of immigrant students than some other countries?
  7. What if countries select only their best students to participate? Won't they look better than the rest?
  8. Are schools and students required to participate in these assessments?
  9. How different are TIMSS assessments from what students are expected to learn in the classroom?
  10. Have there been changes in the countries participating in TIMSS?
  11. If the makeup of the countries changes across the years, how can one compare countries to the TIMSS scale average?
  12. Why does the U.S. report different findings for the same subjects on different assessments?
  13. Why do U.S. boys outperform girls in mathematics at grade 4 but not at grade 8, and U.S. boys outperform girls in science at grade 8 but not at grade 4? Why aren't differences between the sexes more consistent?
  14. How do the results of TIMSS 2007 compare with the results in other recent international studies, such as PISA 2006?
  15. How does the mathematics and science achievement of US students on TIMSS compare with achievement on NAEP?
  16. How does the mathematics and science achievement of US students on TIMSS compare with achievement on PISA?
  17. Can you use the international data to report scores for states?
  18. Can you directly compare scores at grade 4 to scores at grade 8?
  19. Why don't TIMSS, PISA, and PIRLS report differences between U.S. minority students and other countries' minority students?
  20. Where can I get a copy of the TIMSS U.S. Report?
  21. When is TIMSS next scheduled?

1. Why does the United States participate in international assessments? We already know how well our students are doing based on results from state tests and from the National Assessment of Educational Progress.

We want to know how our students perform compared to other countries.
Assessments are a common feature of school systems that are concerned about accountability and assuring students' progress throughout their educational careers. National assessments allow us to know how well students are doing in a variety of subjects and at different ages and grade levels compared to other students nationally or within their own state. International assessments, on the other hand, offer a unique opportunity to benchmark our students' performance to the performance of students in other countries. It is important to know how our students fare in an internationally competitive environment.

We can learn from the experiences of other countries.
We can learn from each other about the variety of approaches to schooling and identify promising practices and policies to consider for our own schools. We learn a great deal about patterns of achievement by looking at student achievement in an international context, including comparisons of average achievement and performance at the high and low ends of the distribution within and across countries.

Top

2. What international assessments do we participate in, and what do they measure?

The United States participates in:

TIMSS – Trends in International Mathematics and Science Study.
TIMSS is a study of student performance in mathematics and science at the fourth and eighth grades. TIMSS provides trend data on a four-year cycle, beginning in 1995, on these subjects as they are commonly taught. Through participation in TIMSS, the United States has gained reliable and timely data on the mathematics and science achievement of our students compared to that of students in other countries. TIMSS is organized by the International Association for the Evaluation of Educational Achievement (IEA), working with an increasing number of countries across the world. TIMSS data were most recently collected in 2007, and were released on December 9, 2008.

PISA – Program for International Student Assessment.
PISA is a system of international assessments that focus on 15-year-olds' capabilities in reading literacy, mathematics literacy, and science literacy. PISA also includes measures of general or cross-curricular competencies such as learning strategies. PISA emphasizes functional skills that students have acquired as they near the end of mandatory schooling. PISA is organized by the Organization for Economic Cooperation and Development (OECD), an intergovernmental organization of industrialized countries. PISA was administered in 2000, 2003, and 2006, and will be administered again in the fall of 2009.

PIRLS – Progress in International Reading Literacy Study.
The PIRLS study focuses on the achievement and reading experiences of children in fourth grade. The study includes a written test of reading comprehension and a series of questionnaires focusing on the factors associated with the development of reading literacy. PIRLS was conducted in 2001 and 2006, and will be conducted again in 2011. PIRLS is organized by the International Association for the Evaluation of Educational Achievement (IEA) with national sponsors in each participating education system (most education systems are countries, but subnational entities, such as some Canadian provinces, participate as well).

Top

3. How valid are international assessments? Are test questions that are appropriate for students in one country necessarily appropriate for students in another country?

Test questions for each assessment are developed in a collaborative, international process.
For each study, an international subject area expert group is convened by the organization conducting the assessment. This expert group drafts an initial framework, which reflects a multinational consensus on the assessment of a subject area. Based on the framework, the national representatives from each country review every test item to be included in the assessment. While not every item may be equally familiar to all students, if any questions are considered inappropriate or offensive for a participating country or an identified subgroup within a country, that item is eliminated.

Test items are field-tested prior to administration.
Before the administration of the assessment, a field test is conducted in the participating countries. An expert panel convenes after the field test to review the results and look at the items to see if any were biased due to national, social or cultural differences. If such items exist, they are not included in the full assessment. Only after this thorough process, in which every participating country is involved, are the actual items administered to the students.

There is an extensive translation verification process.
Each participating country is responsible for translating the assessment into their own language or languages, unless the original test items are in the language of the country. Each country identifies translators to translate the English (and sometimes French) source versions into their own language. External translation companies independently review each country's translations. Instruments are verified twice, once before the field test and again before the main data collection. Statistical analyses of the item data are then conducted to check for evidence of differences in student performance across countries that could indicate a translation problem. If a translation problem with an item is discovered in the field test, it is removed for the full assessment.

Top

4. How can you be sure that countries administer the test in the same way?

Procedures for administration are standardized and independently verified.
These assessments are designed, developed and implemented by international organizations that have extensive experience in international assessments and data collection projects. These coordinating organizations produce a number of manuals that are provided to each country's representative for the administration of the assessment. These manuals specify standardized procedures that all countries must follow on all aspects of assessment sampling, preparation, administration, and scoring. To further ensure standardization, independent international quality control monitors visit a sample of schools in each country. In addition, the countries themselves organize their own quality control monitors to visit an additional number of schools.

Top

5. How do international assessments deal with the fact that educational systems are so different—for example, students start school at different ages, are tracked or streamed differently, etc?

Target populations are comparable across countries.
The fact that education systems are different across countries is one of the main reasons we are interested in making comparisons across countries. However, these differences make it important to carefully designate the populations to be compared, so that comparisons are as fair and valid as possible. Depending in large part on when students first start school, students at a given age may have less or more schooling in different countries, and, students in a given grade may be of different ages in different countries.

In TIMSS the two target populations are defined as follows: all students enrolled in the grade that represents 4 years of schooling—fourth grade in most countries--providing that the mean age at the time of testing is at least 9.5 years, and; all students enrolled in the grade that represents 8 years of schooling—eighth grade in most countries—providing that the mean age at the time of testing is at least 13.5 years. At grade four in 2007, only England, Scotland, and New Zealand included students who had 5 years of formal schooling at the time of testing. At grade eight, England, Malta, Scotland, and Bosnia and Herzegovina included students who had 9 years of formal school at the time of testing. In addition, at grade eight, the Russian Federation and Slovenia included some students who had less than 8 years of formal schooling. However, in all of these cases, the assessed students were of comparable average age to those participating in other countries.

For PIRLS 2006, the target population represents students in the grade that corresponds to four years of schooling, counting from the first year of International Standard Classification of Education (ISCED) Level 1 – fourth grade in most countries, including the United States. This population represents an important stage in the development of reading.

Another approach, used in PISA, is to assess students of a particular age (15), regardless of grade. Both approaches are suited to addressing particular research questions posed by the assessments. The focus of TIMSS and PIRLS is on content as commonly taught in classrooms, while PISA emphasizes the skills and knowledge that students have acquired throughout their education both in and out of school.

Top

6. How do international assessments deal with the fact that student populations in participating countries are so different —for example, the U.S. has higher percentages of immigrant students than some other countries?

International assessments help us understand similarities and differences across countries.
Student population characteristics may explain part of why differences in achievement occur, along with differences in curriculum, teacher preparation, and other educational or societal factors. In addition, student populations in other countries are perhaps not as different as you might think. For example, in the TIMSS 2007 fourth-grade sample Algeria, Australia, Hong Kong, Kuwait, New Zealand, Qatar, and Singapore had percentages of immigrants that were greater than the United States (15 percent). In some of these cases, these immigrant students were assessed in a language other than their native language. For example, in Singapore, students were assessed in English, not in their native language.

Top

7. What if countries select only their best students to participate? Won't they look better than the rest?

Sampling of schools and students is carefully planned and monitored.
Students in each country are selected from a national probability sample of all students in the particular grade or of a particular age. The rules of participation require that countries submit a sampling plan for international approval. Subsequent international quality control procedures ensure that the approved sampling plan is implemented properly.

Once a sample of schools is selected and schools agree to participate, they are asked to provide a list of students of a particular age within the school or a list of a particular kind of classes (for example, 8th grade math classes or 4th grade classrooms). From those lists, a group or whole class of students is then randomly selected for assessment. Each study establishes a set of guidelines for excluding individual students from assessment—typically, if a student has a verifiable mental or physical disability, he or she can be excluded from assessment. However, all student exclusions (at the school level and within schools) cannot exceed established levels, and are reported in international publications.

Samples for each country are verified by an international sampling referee. Once their sample is selected, each country must contact these original schools to solicit participation in the assessment. Every assessment establishes response rate targets of selected schools and students that countries must meet in order to have their data reported. If the response rate target is not met, countries may be able to assess students from replacement schools following international guidelines. For example, TIMSS guidelines specify that substitute schools be identified at the time that the original sample was selected by assigning the two schools neighboring the sampled school on the sampling frame as substitute. If the original school refuses to participate, the substitute school is contacted, although there are several constraints on the use of substitute schools in order to prevent bias. If participation levels, even using substitute schools, still fall short of international or national guidelines, a special nonresponse bias analysis is conducted to determine if the schools that did not participate differ systematically from the schools that did participate. If the analysis does not show evidence of bias, then the data for a country may still be included in the reporting of results for the international assessment but the problem of participation rates is noted.

Across the participating countries in 2007, the net enrollment ratio of students in school (that is, the ratio of the children of official school age who are enrolled in school to the population of the corresponding official school age) ranges from 66 to 100 percent at primary school and 37 to 100 percent at secondary school. In general, developing countries have lower net enrollment ratios than developed countries. Based on data compiled by the United Nations Educational, Scientific, and Cultural Organization (UNESCO) Institute for Statistics, the United States had a net enrollment ratio of 92 percent in primary school and 88 percent in secondary school in 2007 (UNESCO Institute for Statistics 2007)1. The range among countries that outperformed the United States at grade four (mathematics or science) was 90 to 100 percent (Singapore did not report data). At grade eight, the range among countries that outperformed the United States (in mathematics or science) was 86 to 100 percent (again, Singapore did not report data).

Detailed information on net enrollment ratios for countries participating in TIMSS 2007 can be found in exhibit 3 of the TIMSS 2007 Encyclopedia (Mullis et al. 2008).

Top

8. Are schools and students required to participate in these assessments?

To our knowledge, few, if any, countries require all schools and students to participate in TIMSS. However, some countries give more prominence to these assessments than do others. In the United States, TIMSS is a voluntary assessment.

Top

9. How different are TIMSS assessments from what students are expected to learn in the classroom?

The TIMSS assessment is curriculum-based and is designed to assess what students have been taught in school about mathematics and science.

Top

10. Have there been changes in the countries participating in TIMSS?

In TIMSS the makeup of the participating countries has changed somewhat over the four cycles between 1995 and 2007, as some countries have dropped out and others have joined.
In 2007, more than 60 separate nations participated in TIMSS. TIMSS also allows subnational entities to participate as full partners in the assessment. The subnational entities that participated in TIMSS 2007 are: the Basque country in Spain, four Canadian provinces (Alberta, British Columbia, Ontario, and Quebec), Dubai and two states in the United States (Massachusetts and Minnesota). In the case of the Canadian provinces, the Basque country, and Dubai, the larger nation in which they are located chose not to participate. In the case of the states of Massachusetts and Minnesota, students in these states were eligible for participation in the U.S. national sample as well as in the separate samples that these states drew for the study.

Top

11. If the makeup of the countries changes across the years, how can one compare countries to the TIMSS scale average?

Achievement results from TIMSS are reported on a scale from 0 to 1000, with a TIMSS scale average of 500 and standard deviation of 100. The scale is based on the 1995 results, and the results of all subsequent TIMSS administrations have been placed on this same scale. This allows countries to compare their performance over time as well as to compare with a set standard, the TIMSS scale average. Across TIMSS administrations, the average score of participating countries would vary with the makeup of countries, as well as with progress in education systems. However, the TIMSS scale average is a fixed standard.

Top

12. Why does the U.S. report different findings for the same subjects on different assessments?

While these different assessments may appear to have significant similarities, each was designed to serve a different purpose and each is based on a separate and unique framework and set of items. Thus, not surprisingly, there may be differences in results for a given year or in trend estimates among the studies, each giving a slightly different view into U.S. students' performance in these subjects. Possible differences can stem from a number of sources. The goals of the assessments have some subtle but important distinctions in regard to U.S. curricula. For the international assessments, the groups of countries in the comparisons may be different. The students being studied may represent different grade or age groups. The reading, mathematics and science being assessed may be different in terms of content coverage and item format.

Top

13. Why do U.S. boys outperform girls in mathematics at grade 4 but not at grade 8, and U.S. boys outperform girls in science at grade 8 but not at grade 4? Why aren't differences between the sexes more consistent?

The seeming inconsistencies between the achievement scores of U.S. boys and girls in mathematics and science are not easily explainable. Research into differences in achievement by sex has been unable to offer any definitive explanation for these differences. For example, Xie and Shauman (2003), in examining sex differences primarily at the high school level, find that "differences in math and science achievement cannot be explained by the individual and familial influences that we examine." Indeed, that sex differences vary in the participating TIMSS countries—some in favor of males and others in favor of females—would appear to support the idea that the factors related to sex differences in mathematics and science achievement are complicated.

Top

14. How do the results of TIMSS 2007 compare with the results in other recent international studies, such as PISA 2006?

TIMSS and PISA differ in a number of important, but complementary ways. Direct comparisons are not very meaningful.
While both international studies measure the mathematics and science achievement of students, they do this in somewhat different ways in different sets of countries for different sets of students. TIMSS focuses on the mathematics and science achievement of students in the fourth and eighth grades. The assessment draws its content directly from the school curriculum and is designed to assess how well students have learned what they have been taught. TIMSS emphasizes the links between achievement, mathematics and science curricula, and classroom practices.

PISA aims to assess the mathematics and science literacy of students near the end of their compulsory schooling. The intent is to measure the "yield" of education systems--the skills and competencies acquired and applied in real-world contexts by students at age 15. The literacy concept emphasizes the mastery of processes, understanding of concepts, and application of knowledge. PISA draws not only from school curricula but also from learning that may occur outside of school. PISA does not explicitly examine mathematics and science curricula and classroom practices, though it does collect information on school resources.

TIMSS assesses students in fourth and eighth grades and selects whole classrooms of students for this purpose. PISA assesses a sample of 15-year-olds in each school. These students range across several grades in most countries. While about 60 nations participate in each study, PISA focuses on the 30 OECD-member nations, treating the non-OECD jurisdictions separately. Comparing these 30 PISA nations with the 60 TIMSS nations highlights the different makeup of each study. For example: European countries make up about two-thirds of all PISA countries but only one-third of TIMSS countries; and, Middle-Eastern countries comprise about 3 percent of all PISA countries but 25 percent of TIMSS countries. About 25 percent of TIMSS countries also participate in PISA, and about one-half of PISA countries are in TIMSS as well. The U.S. participates in both studies.

Top

15. How does the mathematics and science achievement of US students on TIMSS compare with achievement on NAEP?

Both TIMSS and NAEP provide a measure of fourth- and eighth-grade mathematics and science. Both PISA and NAEP provide measures of mathematics and science performance for older students (grade 12 and 15 years old, respectively). It is natural to compare them, but the distinctions described in other Frequently Asked Questions about TIMSS need to be kept in mind in understanding the converging or diverging results.

MATHEMATICS: The most recent results from NAEP and TIMSS include information on trends over time in fourth- and eighth-grade mathematics for a largely similar time interval: in NAEP between 1996 and 2007 and in TIMSS between 1995 and 2007. For both grades, the trends shown by NAEP and TIMSS are largely consistent with one another.

Both assessments showed statistically significant increases in the mathematics performance of fourth- and eighth-grade students: overall, among boys, and among girls.

NAEP also reported increases for each of four racial/ethnic groups (White, Black, Hispanic, and Asian), for students at the top and bottom extremes of the distribution (at the 10th and 90th percentiles), and for students receiving free and reduced price lunch, at both grades.2 TIMSS only detected increases in mathematics performance for some of these groups (e.g., White and Black students in both grades, students in the 10th percentile in both grades) and no change for others (e.g., Hispanic fourth-grade students). This is likely to do with NAEP's larger sample sizes, which make it more sensitive to picking up small changes among nationally relevant subgroups than TIMSS, which is designed primarily to detect differences among countries.

SCIENCE: The most recent results from NAEP and TIMSS also provide trend information for fourth- and eighth-grade science, although covering a slightly shorter time interval in NAEP than in TIMSS. NAEP provides trends for the period 1996 to 2005 and TIMSS for the period 1995 to 2007. Compared with mathematics, the trends shown by NAEP and TIMSS in science are less consistent with one another, which may not be surprising given the differing time periods and the relatively greater differences in the assessments discussed in the previous sections.

For example, in fourth grade, NAEP shows that there was an increase in students' science performance both overall and among boys between 1996 and 2005, whereas TIMSS did not detect any change in performance for either of those groups from 1995 to 2007.

NAEP also reported increases in science performance for four of five racial/ethnic subgroups3 whereas TIMSS only reported increases for Black and Asian students in the fourth grade. At the eighth-grade level, neither NAEP nor TIMSS showed any change in science performance among students overall. But in contrast to the fourth-grade results, TIMSS reported increases for Black, Hispanic, and Asian eighth-grade students, whereas NAEP only reported increases among Black students. This suggests that Hispanic and Asian eighth-grade students performed relatively better on the content unique to TIMSS than unique to NAEP.

Top

16. How does the mathematics and science achievement of US students on TIMSS compare with achievement on PISA?

As we have seen, the TIMSS 2007 results at 8th grade, the grade closest to the age of the PISA students, showed U.S. average scores higher than the TIMSS scale average in both mathematics and science. In PISA 2006, the average scores of U.S. 15-year-old students were below the OECD average—the average score of students in the 30 Organization for Economic Cooperation and Development countries. How do we reconcile the apparent differences?

The differences are difficult to reconcile, but also difficult to compare, because the assessments are so different in at least three key ways that could influence results. First, TIMSS assesses 8th- and 4th-graders, while PISA is an assessment of 15-year-old students, regardless of grade level. In the United States, PISA data collection occurs in the autumn, when most 15-year-olds are in 10th grade. So, the grade levels of students in PISA and TIMSS differ. Second, the knowledge and skills measured in the two assessments differ. PISA is focused on application of knowledge to "real-world" situations, while TIMSS is more academically designed and is intended to measure how well students have learned the mathematics and science curricula in participating countries. Third, the partner countries in the two assessments differ. Both assessments cover much of the world, but the overlap between them is not complete. For instance, 26 of the 48 countries that participated in TIMSS 2007 at the 8th grade level participated in PISA 2006. Both assessments include key economic competitors and partners, but the overall makeups of the countries participating in the two assessments differ.

Top

17. Can you use the international data to report scores for states?

The U.S. data are representative of the nation as a whole and are not used to report scores for states, unless states elect to participate as individual jurisdictions.
Drawing a sample that would be state-representative would require a much larger sample, requiring considerable amounts of additional time and money. For TIMSS 2007 this means that statistics for the U.S. as a whole can be reported, along with statistics for Massachusetts and Minnesota, two states that independently funded their participation.

Top

18. Can you directly compare scores at grade 4 to scores at grade 8?

The scaling of TIMSS data is conducted separately for each grade and each content domain. While the scales were created to each have a mean of 500 and a standard deviation of 100, the subject matter and the level of difficulty of items necessarily differ between the assessments at both grades. Therefore, direct comparisons between scores across grades should not be made.

Top

19. Why don't TIMSS, PISA, and PIRLS report differences between U.S. minority students and other countries' minority students?

There are certain demographic characteristics that are not easy to collect across countries. Race/ethnicity is one of these. Moreover, even if the data were collected by all countries, comparisons may not be meaningful because the makeup of minority populations in each country differ.

Top

20. Where can I get a copy of the TIMSS U.S. Report?

The U.S. TIMSS 2007 report can be downloaded or viewed online by clicking here. Printed copies of the report can be obtained by ordering online at http://www.edpubs.org, calling 1-800-4ED-PUBS, or writing to the U.S. Department of Education, ED Pubs, PO Box 1398, Jessup, MD 20794-1398. Reference report number NCES 2009001.

Top

21. When is TIMSS next scheduled?

TIMSS is next scheduled for spring 2011. Reporting of TIMSS 2011 results will occur at the end of 2012.

Top


1 UNESCO Institute for Statistics. (2007). Global education digest 2007: Comparing education statistics across the world. Montreal: Author. Available at http://www.uis.unesco.org/template/pdf/ged/2007/EN_web2.pdf.
2 There was one exception: there was no change in the performance of Asian eighth-grade students, although this was calculated over a different time period (1992 to 2007) than the other NAEP trends.
3 NAEP's race/ethnicity categories include: White, Black, Hispanic, Asian/Pacific Islander, and American Indian/Alaska Native—the first four of which saw the increases in science performance between 1996 and 2005 referred to in this section. Race categories exclude Hispanic origin.

1990 K Street, NW
Washington, DC 20006, USA
Phone: (202) 502-7300 (map)