![]() |
The Uses and Limits of Performance Assessment By Elliot W. Eisner In his introduction, Mr. Eisner, who served as guest editor of this special section, puts performance assessment into a broad educational and social context. Illustration © 1999 by Karen Stolper | |
PERFORMANCE assessment is one of the "hot topics" on the agenda of education reform -- and for good reason. Performance assessment is a closer measure of our children's ability to achieve the aspirations we hold for them than are conventional forms of standardized testing. Indeed, our educational aspirations have been influenced by the fact that our children will inhabit a world requiring far more complex and subtle forms of thinking than children needed three or four decades ago. For example, our children will need to know how to frame problems for themselves, how to formulate plans to address them, how to assess multiple outcomes, how to consider relationships, how to deal with ambiguity, and how to shift purposes in light of new information.
These modes of thought will be critical in a society in which citizens are apt to change vocations several times during the course of a worklife,1 in which mobility has increased and new forms of adaptability are required, and in which choosing a course of action requires the consideration of diverse and sometimes conflicting information. No longer will most jobs, particularly those that are the most desirable, require the use of routine skills and rote memory.2 As Edward Haertel informs us in his article in this special section, these changing expectations for the outcomes of education reflect a nonbehaviorist view of human nature. When learning was conceived of as the acquisition and aggregation of reinforced units of information, "practice makes perfect" could serve as a guiding principle for teaching. The kind of thinking that students are now being encouraged to engage in requires much more than what Edward Thorndike, the father of American psychological connectionism, dreamed of. Context matters, judgment counts, and the opportunity to act in order to try out one's speculations is of critical importance.
The demise of behaviorism and the emergence of constructivism in our view of human nature are not the only sources of our changing conception of children and education. We have come to realize that meaning matters and that it is not something that can be imparted from teacher to student. In a sense, all teachers can do is to "make noises in the environment." By this I mean that we have in education no main line into the brains of our students. We are shapers of the environment, stimulators, motivators, guides, consultants, resources. But in the end, what children make of what we provide is a function of what they construe from what we offer.3 Meanings are not given, they are made. And we are interested in enabling students to make their activities in school meaningful, not merely because of the grades they receive but, more important, because of the satisfactions and insights their efforts make possible.
We have also come to realize that the kinds of meanings that our students can make are related to the forms of representation they can employ themselves or can decode when others have used them.4 Each of the forms of representation that exist in our culture -- visual forms in art, auditory forms in music, quantitative forms in mathematics, propositional forms in science, choreographic forms in dance, poetic forms in language -- are vehicles through which meaning is conceptualized and expressed. A life driven by the pursuit of meaning is enriched when the meanings sought and secured are multiple.
In addition to these considerations, we have also begun to recognize that the aim of schooling is not merely to enable our children to do well in school. The stakes are considerably higher. What we are after is to enable our children to do well in life outside of school; the scores generated by the kinds of tests we have been using are proxies, but, alas, we have found that as proxies they are most useful for making inferences about the scores students are likely to receive on other tests.5
We want more. What we want is an approach to assessment that possesses what psychometricians call concurrent or predictive validity. That is, we want test scores to tell us about how students address tasks beyond the classroom.
These factors -- the virtual demise of behaviorism, the emergence of constructivism, the importance of meaning, the desire for concurrent and predictive validity -- have provided the ground for interest in performance assessment. Despite the lack of a single definition, performance assessment is aimed at moving away from testing practices that require students to select the single correct answer from an array of four or five distractors to a practice that requires students to create evidence through performance that will enable assessors to make valid judgments about "what they know and can do" in situations that matter. Performance assessment is the most important development in evaluation since the invention of the short-answer test and its extensive use during World War I.
The Army Alpha6 intelligence test was, among other things, a representation of our view of learning, a view promulgated by Thorndike, the father of stimulus-response theory. This form of testing was given a great boost by the creation of optical scoring devices and was regarded as a sound procedure for determining what students had learned. Standardized achievement testing fit comfortably not only within our conception of learning but also within our conception of schooling. Consider the ways in which schools are organized, the tacit assumptions underlying that organization, and how conventional achievement tests fit school organization.
In 1847 the first graded school was invented in America.7 The assumptions about the course of learning upon which the school was built were straightforward: students would be grouped by age, and each age level would be assigned to a grade. Age grading in our schools became the dominant organizational structure. It was further assumed that, since students were grouped by age, the content and aims for each grade should be the same for all children in that grade. Effective teaching was defined as the ability to enable all children in a grade to achieve the goals for that grade level. Like an army marching in tandem, at the end of an eight- or a 12-year period students would exit the school having mastered the content assigned to each of the previous grade levels.
These assumptions about human learning and these features of school organization are alive and well in American schools today. Indeed, perhaps more than in the past, the specification of grade-level standards is more than a tacit embrace of age grading. If we were to take President Clinton's advice, we would take grade-level standards so seriously that no child would be promoted without having met grade-level expectations, despite research indicating that retention is not generally a good remedy.8 The roots of the problem are deeper.
We have learned that human development does not proceed in a tidy manner. An 8-year-old is not an 8-year-old is not an 8-year-old. Children differ not only in the rate of their development but also in the particular areas of work they are expected to perform. Some students have high-level aptitudes in the arts; others, in the sciences. Some children are gifted in social skills; others, in the use of language. If by some magic, teaching could be made optimal for every student in a class of 25, the variability of student performance in that class would increase in each subsequent grade. Optimal forms of teaching for those gifted in the arts would enable them to move farther and faster in artistic pursuits than those gifted in other areas; those gifted in mathematics would move farther and faster in quantitative areas than those not so gifted. Under optimal teaching, variability would surely increase. Yet much of what we attempt to do in education is predicated on standardization and uniformity, on homogenization and on a model of learning represented by a tidy procession of students marching in unison through the grade levels toward fixed targets that have been defined well in advance. Standardized testing has always fit such a model very well.
Although, as Linda Mabry points out in her article in this special section, performance assessment practices that employ restrictive rubrics for scoring can be congruent with a standardized, age-graded approach to schooling, performance assessment practices that provide opportunities for open-ended responses and that enable youngsters to play to their strengths fly in the face of assumptions about uniformity. Uniformity can be pursued and assessed if assessment practices significantly constrain the way data are secured and analyzed. But when performance assessment tasks have an open-ended quality and thus make possible the expression of individuality, the assessment of standardized outcomes is considerably more complex. But the problem is even more complicated.
One of the motivations behind the standards movement is the desire to hold schools accountable, and that accountability is facilitated if schools, classrooms, and students can be compared. The ability to compare is compromised if students move through different curricular tracks or if assessment practices provide tasks that are open-ended; classrooms, schools, districts, and states can be more easily compared when assessment practices, goals, and curriculum content are uniform and closed. But when, for example, the amount of time allocated to the study of a subject differs from school to school, when the content of a subject taught differs from one school to another, or when the aims of a field of study differ in different schools, comparisons are more than treacherous. Thus, while assessment practices that are open-ended make possible the assessment of individuals, they may not provide the kind of global comparative "temperature taking" that a public anxious about the educational productivity of its schools seems to want.9 Is there any prospect of developing an approach to performance assessment that will make the particular achievements of individual students visible and, at the same time, provide information about a class or a school that is useful for comparing it with other classes and other schools?
One option, of course, is to employ two different kinds of assessment. One of these would be the continued use of large-scale, temperature-taking testing intended to provide comparative data on the performance of schools or school districts. Such a practice would not allow for the description of distinctive forms of individual student performance. A second assessment would need to be designed to reveal the distinctive talents of individual students and the effects of school practice on their development. One form of assessment would focus on the general; the other, on the particular.
And yet I cannot help but wonder whether an assessment oriented to the revelation of individual talents could survive in a "race" with an assessment bent on enabling the public to make comparisons between schools and school districts. Indeed, one feature of President Clinton's most recent proposal for school reform is the provision of a "report card" that would rate the performance of schools.
The desire to compare is implicit every time we talk about "world-class schools." The basic assumption is that not only can we compare schools, school districts, and states, but we can also make meaningful comparisons of the educational quality of schools in different nations. If we can't make those comparisons meaningfully and at the same time make the kinds of assessments that would describe individual student performance with respect to a student's distinctive features, is it likely that we will decide to undertake the latter and to forgo the former approach to assessment? Will we give up our inclination to seek comparisons, to form rankings, to establish hierarchies, to position ourselves in relation to others? We are, after all, a meritocracy, and we tend to use test scores in education to identify the meritorious. The use of standardized testing is conducive to such aims.
Under these circumstances, what is required to give pride of place to an assessment system that both facilitates teaching and learning and reveals the distinctive intellectual achievements of individual students? From my perspective, what we need is a change in the public's conception of the mission of the schools. Of course, bringing about such a change is no small task. Yet a shift needs to be made from a conception of schooling as a horse race or a kind of educational Olympics to a conception of schools as places that foster students' distinctive talents. The good school, as I have suggested, does not diminish individual differences; it increases them. It raises the mean and increases the variance.
Bringing about such a change in the conception of schooling cannot be achieved, in my view, without changes in the structure of schooling in America and in the criteria that institutions of higher education employ in making admissions decisions. Universities affect secondary, middle, and even elementary schools in the most conservative of ways. We need to have admissions criteria that are considerably more flexible and open-ended and that do not continue to privilege a narrow range of competencies that are fundamentally reflective of social class advantages. Parents, always concerned about their children's social and educational mobility, will continue to adapt to the existing criteria if they are not changed. Our priorities for schools in America have not grown out of a deep public debate about the mission of education; they have grown out of a desire to improve test scores. Test scores are widely regarded as proxies for the quality of education. But they are utterly inadequate -- as all the authors in this special section testify.
My aim in introducing this special section is to put performance assessment into a broad educational and social context. Performance assessment affords us, in principle, an opportunity to develop ways of revealing the distinctive features of individual students. It affords us an opportunity to secure information about learning that can help improve the quality of both curriculum and teaching. In short, it affords us an opportunity to use evaluation formatively and to treat assessment as an educational medium. But it is unlikely that such opportunities will be realized if the public's attitudes and expectations toward schooling are not changed, and it is unlikely that they will change without revision of the policies that affect the educational and social mobility of students in schools. Our form of school organization gives comfort and support to a comparative, competitive model of education: the bell-shaped curve is an encomium to competition. Our economy is built on a competitive model, and too many education reformers wish to liken schooling to business and to conceive of the factors that they believe should infuse education as similar to those that animate the business world.
My own inclinations are quite the opposite. I believe we need not be motivated through competition in order to provide educational conditions conducive to our children's development. We derive the most satisfaction not from competition, but from the quality of experience afforded by meaningful work. Alfred North Whitehead once remarked that most people believe that scientists inquire in order to know. Quite the contrary, he said. Scientists know in order to inquire. Whitehead's point was that, for scientists -- certainly for the best of them -- the joy is in the journey. That observation is not a bad ideal for education. Whether our meritocratic society will embrace such an ideal and make the changes in schooling and in education policy that are congruent with Whitehead's observation remains to be seen. But we should not kid ourselves about what's at stake. In the end, what's at stake is not only the quality of life our children might enjoy, but also the quality of the culture that they will inhabit.
2. Ibid.
3. From John Dewey to Jean Piaget to Jerome Bruner, major psychologists and philosophers have emphasized the constructive character of cognition. Humans make sense of the events of the world they inhabit. How those events are constructed is influenced by the resources with which people have to work. The transmission model of teaching is built upon a premise that assumes that transmission is possible.
4. For a discussion of the effects of forms of representation on cognition, see Elliot W. Eisner, Cognition and Curriculum Reconsidered, 2nd ed. (New York: Teachers College Press, 1994).
5. David C. McClelland, "Testing for Competence Rather Than for Intelligence," American Psychologist, January 1973, pp. 1-14.
6. Clarence S. Yoakum and Robert M. Yerkes, eds., Army Mental Tests (New York: Henry Holt & Co., 1920).
7. David Tyack, The One Best System (Cambridge, Mass.: Harvard University Press, 1974).
8. John I. Goodlad, The Nongraded Elementary School, rev. ed. (New York: Teachers College, Columbia University, 1987).
9. The concept of "temperature taking" is especially applicable to the approach taken by the National Assessment of Educational Progress in its sampling of student learning.

PDK Home | Site Map
Kappan Professional
Journal
Last updated 1 June1999
URL: http://www.pdkintl.org/kappan/keis9905.htm
Copyright 1999 Phi
Delta Kappa International