A Content Examination of the TIMSS Items | |
![]() |
By Jianjun Wang With all due respect to the distinguished scholars of the TIMSS Subject Matter Advisory Committee, Mr. Wang notes that not all TIMSS items are true reflections of student achievement in mathematics and science. Illustration © 1998 by Deborah Zemke |
THE Third International Mathematics and Science Study (TIMSS) is a collaborative research project sponsored by the International Association for the Evaluation of Educational Achievement (IEA). With more than 40 nations participating and five grade levels assessed in two subject areas, TIMSS is the largest and most ambitious study of comparative educational achievement ever undertaken. However, no general consensus has been reached in the research community on the interpretation of the TIMSS results.1
So far, the debate has focused mainly on various confounding variables, such as differences in curricula, influences of cultural traditions, and differences in school enrollments. Few researchers have examined closely the academic content of the TIMSS instrument, and this omission can be partly explained by the endorsement of the items by the TIMSS Subject Matter Advisory Committee, which included "distinguished scholars from 10 countries."2 In this article I take a close look at the items that have been released to the public.
The development of the TIMSS instrument was documented in a technical report.3 In the past, international comparisons were based on the results from multiple-choice tests; in TIMSS, about one-third of students' response time was devoted to free-response questions.
To date, two-thirds of the TIMSS items have been disseminated on a website (http://www.csteep.bc.edu/timss1/database.html). I have examined these items and wish to share five potential problems that I have discovered. Since the TIMSS results have drawn considerable attention in the United States, my examination might prove useful to policy makers and the American public and should cause them to take a more thoughtful approach to some of the TIMSS outcomes.
Not all free-response scores reflect student science achievement. By definition, a free-response item allows a range of answers. However, the answer type is a nominal variable, with different categories for different responses. Because "different" says nothing about "better" or "worse," a natural dilemma emerges with regard to how to assign higher or lower scores to different student responses.
According to the TIMSS technical report cited above, specific rubrics were developed for each free-response item, using a two-digit coding system. The first digit indicated the correctness score (including levels of partial credit), and the second digit represented diagnostic information about the specific type of response. The TIMSS two-digit coding scheme marked the individual item by the correctness of the responses, thereby avoiding the general problem of differentiating answer types in the item score.4 However, the scoring rubrics also contain problems. For instance, one TIMSS question reads as follows:
The water level in a small aquarium reaches up to a mark A. After a large ice cube is dropped into the water, the cube floats and the water level rises to a new mark B. What will happen to the water level as the ice melts? Explain your reasoning. (Item #G11, http://www.csteep.bc.edu/timss1/Items.html)
This item was included in the 12th-grade physics test. Without some information about the experimental temperature, it is unanswerable. The melting process could take a number of hours, which would introduce a potentially significant effect of evaporation along with the melting process. Ignoring the effect of evaporation, students could reach the desired answer of "the same level." On the other hand, students who considered the effect of evaporation could well decide that the water would be at "a lower level." Indeed, the second answer seems likely to come from the more thoughtful students. According to the TIMSS scoring rubrics, however, the second answer is deemed incorrect and would earn zero credit!
Not all science items have only one correct response. If the preceding example demonstrates that the TIMSS scoring missed a correct answer, some TIMSS items were potentially confusing because they listed more than one correct choice. The following item was quoted from the TIMSS science test for the third/fourth-grade level:
Seeds develop from which part of a plant?
A. Flower
B. Leaf
C. Root
D. Stem
(Item #P9, http://www.csteep.bc.edu/timss1/Items.html)
A good deal of information about seeds and plants is presented in many widely used elementary science textbooks. Two groups of seed plants are typically introduced at the elementary level. Specifically, according to George Mallinson and his colleagues, "One of these groups is made up of seed plants that have cones. . . . The second group of seed plants is made up of seed plants that have flowers."5 Thus even in elementary school, students would know enough to be able to say that "flower," the response coded as correct, is not the only correct answer. While "cones" is not among the responses, these seed-bearing parts of gymnosperms develop from the stem of the plant. Therefore, better science students had to choose between response A and response D. And to receive credit, they had to choose correctly!
Not all TIMSS scores are grounded in students' levels of cognitive development. One question in the TIMSS science test at the third/fourth-grade level states:
The Sun is bigger than the Moon, but they appear to be about the same size when you look at them from the Earth. Why is this?
(Item #Y1, http://www.csteep.bc.edu/timss1/Items.html)
Clearly, the answer hinges on the fact that the Sun is farther away from the Earth than the Moon. However, according to the TIMSS coding, if a student "refers to the sun being higher up than the moon," he or she will receive a score of zero. To describe the difference in distances in the sky, many third- and fourth-graders use "higher" and "farther" interchangeably. Using language in this way is typical for students at the level of "concrete operations."6 Unfortunately, the TIMSS grading system failed to take account of the level of student cognitive development.
Not all TIMSS items reflect collaboration between mathematics and science educators. TIMSS is the only IEA project that assessed mathematics and science achievement at the same time. Indeed, the TIMSS items covered mathematics applications in science. However, not all the items are free of misconceptions. Consider, for example, the following item from the seventh/eighth-grade mathematics test:
A chemist mixes 3.75 milliliters of solution A with 5.625 milliliters of solution B to form a new solution. How many milliliters does this new solution contain?
(Item #K2, http://www.csteep.bc.edu/timss1/Items.html)
The answer, according to the TIMSS coding, is 9.375 ml. Apparently, the item writer assumed that when any two solutions are mixed, the volume is simply additive. However, this is not true in many cases, particularly when the answer carries many significant digits to the right of the decimal point. For instance, if one solution is mainly alcohol and the other mainly water, the combined volume would be less than 9.375.
While this item is certainly incorrect as stated, some readers might conclude that my complaint is merely a quibble. And it is true that many middle-schoolers would not have the relevant background to think so deeply about the item. Yet in my own middle school education, the teacher conducted just such a demonstration. However, my secondary schooling took place in China, so I consulted John Staver, who directs the Center for Science Education at Kansas State University and is president-elect of the Association for Education of Teachers of Science, to see what he thought of giving such a question to students in the U.S. He concurred with my view that such poorly conceived mathematics problems could lead students to develop misconceptions in science. Moreover, we also agreed that such mathematics items could have been avoided in TIMSS because both mathematics and science educators were involved.
The TIMSS findings have drawn great public interest, as they should have, for they cost U.S. taxpayers some $51 million. Many researchers continue to debate both the "horse race" aspect of TIMSS and the value of the information the study gathered about mathematics and science education in the developed world. Yet few have questioned the instruments used. With all due respect to the distinguished scholars of the TIMSS Subject Matter Advisory Committee, my examination of the content of the tests indicates that not all TIMSS items are true reflections of student achievement in mathematics and science.
2. Albert Beaton et al., Mathematics Achievement in the Middle School Years (Chestnut Hill, Mass.: Boston College, 1996), p. A-9.
3. Michael O. Martin and Dana L. Kelly, eds., Third International Mathematics and Science Study: Technical Report (Chestnut Hill, Mass.: Boston College, 1996).
4. Svein Lie, Alan Taylor, and Maryellen Harmon, "Scoring Techniques and Criteria," in Martin and Kelly, pp. 7-17-16.
5. George Mallinson et al., Science (San Carlos, Calif.: Silver Burdett, 1984), p. 29.
6. Jean Piaget, Noam Chomsky, and Massimo Piatelli-Palmarini, Language and Learning: The Debate Between Jean Piaget and Noam Chomsky (Cambridge, Mass.: Harvard University Press, 1980).
![]()
PDK Home | Site Map
Kappan Professional
Journal
Last updated 9 September 1998
URL: http://www.pdkintl.org/kappan/kwan9809.htm
Copyright 1998 Phi
Delta Kappa International