
![]() Illustration © 2001 Jem Sullivan |
Quantum Theory, the Uncertainty Principle, and the Alchemy of Standardized Testing By Selma Wassermann It is not the tests that should drive what we do in the act of teaching, Ms. Wassermann reminds us. Rather, all of us in the profession must say: this is what is important to us in the education of our youth. Let the ways of assessment we devise determine whether we have achieved our goals. |
IT WAS LATE spring during my first year of teaching when I was initiated into the world of standardized testing. The school board had ruled, in its infinite wisdom and over strong teacher protest, to administer a districtwide standardized achievement test to all first-grade classes.
On the scheduled day of testing, the principal came to my room to deliver the sealed package of tests for my class. She placed the package in my hands as if she were offering me the Dead Sea Scrolls. With the package came her repeated admonitions about following explicitly the instructions in the Manual of Directions. No deviations were to be allowed. Time limits were to be faithfully obeyed: 10:00, pencils up; 10:30, pencils down. Big Brother was watching us.
At the designated hour, along with every other first-grade teacher in every school in the district, I broke the seal of the test package and distributed copies to each of the children, whose desks were now arranged in rows, as per the instructions. Two sharpened pencils were in the pencil slot on each desk. According to the instructions, the children were bathroomed and watered before the test was to begin. The preparations alone were exhausting and nerve-wracking.
This first-grade class of 26 children from a working-class suburban community just outside New York City was a heterogeneous group of delightful, energetic 6-year-olds, and, like children everywhere, they were full of surprises. Normally spirited, they were now extraordinarily quiet, the stress in the air palpable. Doubtless, they were reflecting my own anxieties, for at this stage of their lives, they hardly knew the difference between a standardized test and an artichoke. But it was clear that I had communicated my feeling that much was riding on the outcome of their performance.
The test began with a list of vocabulary words that the children were to define by blacking-in the space for the correct choice from among four options. In my class of 26 children, about three-quarters had made their way through preprimers and primers and were now considered to be reading at the "first-grade level." Four children were still chugging along in their primers and were likely to move into their first readers by the end of the year. At the far end of the spectrum, Judy Foley and Benny Camareri were struggling to make sense of a few basic words. For them, decoding was one of the great mysteries of life. As instructed, I walked the aisles, ensuring that the children were making their marks properly and seeing that they kept their eyes on their own papers -- not that that was ever an issue.
Jimmy Tully, one of the better readers in the class, took his time, carefully decoding the words in the first column of the test booklet. I could see him struggle with the unknown words, bravely sounding them out and finding the appropriate match. By the time he had gotten to the seventh word on the list, his anxiety level had peaked and his face had turned a deep rose. I touched his shoulder to comfort him, and, in response, he laid his pencil down, lowered his head, and wept. Torn between my human feelings and the detailed prohibitions about test taking, I felt like a killer teacher.
Judy Foley, for whom decoding words was an Everest yet to be conquered, sped through the test as if she had been given all the answers in advance. She blackened slots one after another, never bothering herself with the act of reading. In less than three minutes, she had filled in choices for the entire list of 25 words. For Judy, the test was more like a coloring exercise. For Jimmy, it was a trial by fire.
When the test papers were marked, Jimmy's paper showed that he had correctly defined all of the first seven words on the list -- that is, all of those he had time to complete. Judy, filling in spaces without a clue as to what the words were, managed to defy probability and got 13 correct. It didn't take an expert to know that something was terribly rotten about the use of such a test and about the educational significance of the results.
Ever since we humans began counting, we have become increasingly obsessed with accuracy in measurement. Today, we measure everything -- time, distance, speed, size, weight, temperature, humidity, sound, barometric pressure -- even the wind-chill factor so we can know how cold we should feel. We measure angles, curves, and spatial relationships. Global positioning systems allow us to measure where we are in space. We can now be assured that a 6.5-ounce can of tuna contains 106 calories, 24 grams of protein, and 1.1 grams of fat (of which 0.4 are polyunsaturated fats, 0.2 monosaturated fats, 0.2 saturated fats), 30 mg. of cholesterol, and 0 grams of carbohydrates per 100-gram serving. What's more, with the use of statistics and probability, we measure things that we cannot even see: the likelihood that the stock market will recover during this fiscal year, the chances for rain during the coming year in the Midwest, the numbers of salmon anticipated in the spring run, the force of a hurricane, the rate of global warming, the risk involved in a certain investment.
Our obsession with numbers has led us into the deep waters of quantification, and we now have rating scales for nearly everything -- relying on these measures to guide our life choices. I look to Zagat's, for example, for the ratings of restaurants in San Francisco as I plan for a holiday; to Ebert and Roeper for the "two thumbs up" sign that a film is worth seeing; to the Internet for performance ratings of airlines before buying a ticket. "How well did you like it -- on an 11-point scale?" I used to ask my grandchildren when they were very young, in order to move them from the superficial adjudications of "It was good." "I liked it seven, Gramma," Arlo would say. He loved the idea of numerical assessment, and he is far from alone.
The Dodecahedron of The Phantom Tollbooth describes our love affair with numbers: "If you had high hopes, how would you know how high they were? And did you know that narrow escapes come in all different widths? Would you travel the whole wide world without ever knowing how wide it was? And how could you do anything at long last without knowing how long the last was? Why, numbers are the most beautiful and valuable things in the world."1
Yet the more we strive for accuracy in our numerical ratings, the more it eludes us. Take, for example, the measurement of time, in which nanosecond precision matters for worldwide communications systems and calculations must be constantly adjusted to accommodate for the anomalies in astronomical reference points. "It is the heavens that cannot be relied on. Stars drift. The Earth shivers ever so slightly. With the oceanic tides acting as brakes, the planet slows in its rotation by fractions of a second each year. To compensate, the official clocks must every so often perform a grudging two-step, adding an odd second -- a 'leap second' -- to the world's calendar."2 At the Directorate of Time, Gernot M. R. Winkler, whose position requires the keeping of exact time, by worldwide consensus and decree, says, "A man with a watch knows what time it is. A man with two watches is never sure."3
Although we make implicit assumptions about the accuracy of quantitative measurements that we take largely for granted, our performance on too many measurements gives us no reason to be so certain that what has been said to be x is not actually x -- 1. "Society runs on numbers," writes Gina Kolata in the New York Times --
the number of people residing in the United States, the number of people in Florida who voted for each of the Presidential candidates, the number of unemployed, the percent chance of a crippling blizzard. Those who come up with these all-important digits know they aren't perfect, even if it sometimes comes as a surprise to the public. And some diehard believers in progress cling to the hope that technology will fix our numbers problem. But some numbers can't be fixed; the expectation of endless improvement has faltered before the Law of Unintended Consequences. What looks like a slick solution to a technical problem can introduce new problems, sometimes just as bad as the original ones. Instead of getting a better number, the result is a different number, with its own inaccuracies.4
When it comes to assessing quality, accuracy in measurement is
even more elusive. Take, for example, "expert" opinions
on which nominee should win the Academy Award for best actor.
Or which actors should have been nominated and were ignored. Take,
for example, the widely disparate judgments of informed opinion
about a particular work of art. A piece of music. A novel. A candidate's
ability to govern. A politician's performance in office. An impeachable
offense. A prison sentence for a nonviolent offender. Take, for
example, the widely differing views about what constitutes adequate
health care, adequate provision for the homeless, adequate funding
for education. In some instances, judgments are so far apart that
it's hard to believe we are assessing the same person, place,
or event. Our history is rife with anecdotes about the failed
judgments of experts who could not envision, for example, the
quality of a Van Gogh painting, the beauty of a Beethoven quartet,
the phenomenal commercial success of the Harry Potter books, or
the potential for Harry Truman to win the 1948 Presidential election.
It is understandable that there would be a range of differing opinions in qualitative assessment, for, as an Ojibway chief told us, "It depends on where you sit on the medicine wheel." Assessment of quality lies in the eye of the beholder -- and when looking for a definitive answer, such variability is crazy-making. Perhaps that is why we turn to the presumed certainty of numbers. When even informed opinion is all over the place, numbers give us a sense of security. If parents are upset, angry, doubtful about a teacher's assessment of their daughter's performance in physics ("not working up to her potential"), then a quantitative test that provides a firm, hard number can put doubts to rest. We put our faith in numbers because they remove us from the stressful world of uncertainty into what we believe is the definitive world of truth. Numbers provide us with a sense of security in an uncertain world.
The Quest for Certainty
There is little doubt that we live in times of turbulence -- when things all around us seem to be accelerating beyond our control. The speed at which things change creates great stress. Our lives are full of dissonance and upheaval, and it seems that we no longer can count on anything to be stable and enduring. Even the electoral process, the bedrock of our democracy, turned out in the most recent Presidential election to be full of flaws. We just don't seem to be able to count on anything these days!
We show our anger in new, 21st-century ways. Road rage and air rage tell of humans who have been pushed beyond their limits. Like Peter Finch in the film Network, we seem to be saying: "I'm mad as hell, and I'm not going to take it anymore!" Back in 1971, Louis Raths wrote that "there have been other periods in history characterized by rapidity of change" and that "it seems to be true that in a time of many and significant changes, a great many people feel a sense of strain."5 We want to combat this instability with a new sense of order. And if we need to get that by moving backward, then that is the price some of us are willing to pay. There is the widespread belief that the unalterably fixed and the absolutely certain are one, and change is the source from which all our uncertainties and woes come.6
In the area of human judgment, which is at the heart of qualitative assessment, the risks of uncertainty are very great. "Judgment and belief regarding actions to be performed can never attain more than a precarious probability," Dewey reminded us.7 And this "creates a desire to escape from the vicissitudes of existence by means of measures which do not demand an active coping with conditions."8
It is easy to see how numbers help to relieve the frustrations of the unknown, for nothing feels more certain or gives greater security than a number -- "the invariant rational element within an uncertain physical existence."9 It doesn't seem to matter if neither the numbers nor the act of measurement can be relied on as precise or accurate; it's just that we feel more comfortable in believing that there is something that anchors us, something we can count on with assurance, something we can cling to as we speed ahead into uncertain times.
As educators, we are more comfortable, feel more certain, have fewer disagreements with parents and students when we can say about a student: she's in the 86th percentile, or he scored 6.1 on the standardized reading test, or her SAT score is 770. Hearing the numbers takes us from the unknown to the known. There is a palpable sigh as our tension is released. For once, we can be certain that we have a glimpse into the truth.
The Standards Movement: If It Moves, Test It!
In my file labeled "Evaluation" are three large boxes containing a 40-year collection of material chronicling the ups and downs of educational testing and measurement: articles clipped from respected education journals, articles from newspapers and literary magazines, and other miscellany (including high school report cards that my mother kept all these years -- don't ask!). It was instructive to reread these materials in preparation for writing this article, to see that the issues written about 10, 20, 30, and 40 years ago continue to resurface today; that, despite the serious flaws, limitations, and narrow outcomes of standardized testing documented by substantial educational research, a new standardized testing movement reappears, with nauseating regularity, every few years.
Among dozens of articles in my file telling "war stories" about the many flaws and inaccuracies of standardized tests are those that discuss the narrow outcomes measured by these tests and the potentially unhealthy consequences for student learning. Oscar Buros, editor of the Mental Measurements Yearbooks, made the following comments when he spoke to a special seminar at the University of Iowa in July 1977: "Many of you know that I consider that most standardized tests are poorly constructed, of questionable or unknown validity, pretentious in their claims, and likely to be misused more often than not."10
Buros then went on to provide chapter and verse about the inadequacies of standardized tests, expressing his concern about the "unwarranted optimism about the values of standardized tests in general and in specific kinds of tests." He admonished teachers not to "mistake statistical significance for educational significance," which results in a "great deal of sloppy thinking not only in testing, but in all areas of research in the behavioral sciences."11 Standardized tests, Buros stated, "rarely correspond closely to local instructional programs; they are greatly influenced by instructional materials closely resembling the test items, and they cannot be used to measure the attainment of specific growth over short periods of time."12
A few years later, in a special section on standardized testing in the May 1981 Kappan, George Madaus raised serious questions about the claims for the cogency and relevance of standardized tests in the schools.13 In that same issue, Leslie Salmon-Cox reported that her research found that "student scores on standardized tests are not very useful to the classroom teacher" and that teachers prefer to rely on their own judgment about student weaknesses and areas of needed help.14 Lee Sproull and David Zubrow found that administrators are not major users of test information and argued that test results are not very important to central office administrators.15
Edward Fiske, then education editor of the New York Times, criticized standardized tests for the following reasons:
Les McLean, head of the Educational Evaluation Center at the Ontario Institute for Studies in Education, cautioned in 1982 that "achievement tests as we have known them are obsolete and teachers should discontinue their use as soon as possible."17 But even two decades before that, Willard Waller had questioned the value of tests: "The learning product which is assured by examinations is of the lowest and basest sort. Examinations favor parrot learning, for the most part; and parrot learning is undesirable not only because it is useless but also because the psittaceous habit of mind inhibits deeper learning."18
Reading through more recent issues of education journals, we can see the same criticisms reappearing. Tests are notoriously unreliable; they measure only a small aspect of student learning; what is being measured is not what is significant. Elliot Eisner, writing in the January 2001 Kappan, reminds us once again about the narrow and shortsighted outcomes measured by standardized tests: "Curriculum gets narrowed as school district policies make it clear that what is to be tested is what is to be taught. Tests come to define our priorities."19
For almost half a century, criticisms of standardized tests have remained remarkably consistent. In the face of these criticisms, efforts to reintroduce standardized testing into school settings have also remained remarkably consistent.
A variety of theories can be generated to explain our enduring insistence that tests will do the job of serving accountability and standards, even when these claims fly in the face of what we have already documented as false. Marilyn French writes of our education system built on corporate lines, with centralized control administered through a bureaucracy that was intended for the control of the many by the few. She sees our education system as a way of exercising power and control over the students, a system that teaches students to bend to the will of authority: "Only extraordinary education is concerned with learning; most is concerned with achieving: and for young minds, these two are very nearly opposite. One is dedicated to experience, the other to control."20
One need not buy into conspiracy theories, but French's superbly documented book, Beyond Power, cuts too close to the bone to be rejected out of hand. Whatever is driving the reemergence of standardized testing movements, it is clear that the impetus does not come from teachers. And French may indeed have a point when she expresses the concern that, "in the contention between those who uphold qualitative goals for education and those who focus on quantitative efficiency of administration, the big guns are all on the side of the heavily concentrated controls of the managers."21
Should it come as any surprise that William Bennett -- the former U.S. secretary of education, a promoter of conservative values, and now a budding Internet entrepreneur -- is offering online diagnostic tests, one for every major subject in every grade, that parents can administer to their children and have scored online, for a projected fee of $50 to $100 a test?22 Will he be laughing all the way to the bank?
In reading through my files, I had the feeling that I really was having déjà vu all over again.
Numbers and the Uncertainty Principle
In 1986 Richard Feynman, Nobel laureate in physics, was invited by the National Aeronautics and Space Administration (NASA) to join a Presidential committee investigating the disastrous accident involving the space shuttle Challenger. Feynman, one of the most creative theoretical physicists of his time, is perhaps best known to the general public for conducting a simple demonstration at the committee hearings. He immersed an O-ring gasket in a glass of ice water and showed that the rubber gasket did not immediately bounce back to its original shape. Consequently, it would not be a reliable seal in cold weather. The Challenger tragedy is one of the most infamous incidents of mismeasurement in recent times, in which the reports of expert engineers attested, inaccurately, to the safety of the design and led ultimately to the deaths of the shuttle crew in a violent explosion that shook the world and destroyed public confidence in space research.23
During the data-gathering interval, Feynman's investigations led him to confer with an engineer who worked for a company that had contracted with NASA to build the shuttle. The engineer told Feynman that the company's engineers had come to the conclusion that low temperatures had something to do with the problem of the O-ring seals and that they had been very worried about it before the scheduled flight. On the evening preceding the launch, during the flight readiness review, the engineers told NASA that the shuttle shouldn't fly if the temperature was below 53° F. On the morning of the launch, the temperature was just 29° F.
Regrettably, NASA officials chose to ignore such discrepant data. Feynman states that there was a failure not only in the seals, but in management too.24
It is instructive to examine Feynman's description of the measurement inaccuracies he observed as he investigated the temperature problem:
Then I investigated something we were looking into as a possible contributing cause of the accident: when the booster rockets hit the ocean, they become out of round a little bit from the impact. At Kennedy they're taken apart, and the sections -- four for each rocket -- are sent by rail to Thiokol, in Utah, where they are packed with new propellant. Then they're put back on a train to Florida. During transport, the sections (which are hauled on their side) get squashed a little bit -- the softish propellant is very heavy. The total amount of squashing is only a fraction of an inch, but when you put the rocket sections back together, a small gap is enough to let hot gases through: the O-rings are only a quarter of an inch thick and compressed only two-hundredths of an inch.
I thought I'd do some calculations. NASA gave me all the numbers on how far out of round the sections can get, so I tried to figure out how much the resulting squeeze was, and where it was located -- maybe the minimum squeeze was where the leak occurred. The numbers were measurements taken along three diameters, every 60 degrees. But three matching diameters won't guarantee that things will fit; six diameters, or any other number of diameters, won't do, either.
For example, you can make a figure something like a triangle with rounded corners, in which three diameters, 60 degrees apart, have the same length. l remembered seeing such a trick at a museum when I was a kid. There was a gear rack that moved back and forth perfectly smoothly, while underneath it were some noncircular, funny-looking, crazy-shaped gears turning on shafts that wobbled. It looked impossible, but the reason it worked was that the gears were shapes whose diameters were always the same. So the numbers the NASA gave me were useless.25
In yet another incident in which the human variable confounded the accuracy of the measurements, Feynman related the story of how the rocket sections were put together:
The regular procedure was to stack one section on top of the other and match the upper section to the lower one. If a section needed to be reshaped a little bit, the procedure was to first pick up the section with a crane and let it hang sideways a few days. It's rather simple-minded.
If they couldn't make a section round enough by the hanging method, there was another procedure: use the "rounding machine" -- a rod with a hydraulic press on one end and a nut on the other -- and increase the pressure. The engineer said that the pressure shouldn't exceed 1,200 pounds per square inch (psi). One time, a section wasn't round enough at 1,200 psi, so the workers took a wrench and began turning the nut on the other end. When they finally got the section round enough, the pressure was up to 1,350 psi. "This was another example of the lack of discipline among the workers," the engineer reported.26
In her recent article about inaccuracies in measurement, "Why Some Numbers Are Only Very Good Guesses," Gina Kolata points out how computers record votes:
As with any computer solution, the problem is that not only can the machines introduce errors, but the errors can be much harder to detect. How would you know if there is a tiny glitch in the software or the hardware? How would you know if one out of every 100,000 votes goes to someone other than the intended candidate? What if the misdirected votes were created by programmer sabotage? There is no guarantee that your vote goes into the computer the way it looked on the touch screen.27
I have taken the long way around to show that so many of us in so many professions seem to rely strongly on numbers. Yet, even among experts with the most advanced high-tech tools, measurement is subject to both human and technological error. Moreover, in delivering the numbers, the needs to be self-serving and to protect one's own interests are also critical factors in contributing to error. Even with very sophisticated measurers and measuring devices, accuracy is -- and should be -- suspect.
If we cannot rely on the accuracy of engineers who are building space shuttles or on computer programs that are tallying votes for Presidential elections, what can be said about the less sophisticated measurers and measuring devices of standardized testing? What can be said about our investment in their accuracy and reliability far beyond what the data allow? For not only is there great potential for mismeasurement, but the human and technological factors that enter into the construction, administration, and scoring of these tests give us much reason to suspend our beliefs about their significance and their accuracy. We talk about students' test scores with the arrogance of the self-assured; we are certain when we should, at the very least, be suspending judgment.
In Michael Frayn's brilliant play Copenhagen, actors play Niels Bohr, Werner Heisenberg, and Bohr's wife, Margrethe. They bring to life an account of what might have happened at the critical meeting in Copenhagen when physicists Bohr and Heisenberg, former teacher and student, and now Dane and German on opposite sides in World War II, discussed the splitting of the atom, nuclear fission, and the potential of building an atomic bomb. Although some parts of the play rely on Frayn's conjectures about the physicists' discussions, there are certain events that are known and that have been written into the historical record.
In the play, Heisenberg, the originator of the uncertainty principle, describes its meaning to an audience largely untrained in science: "You can never know everything about the whereabouts of a particle, or anything else . . . because we can't observe it without introducing some new element into the situation -- things which have an energy of their own, and which, therefore, have an effect on what they hit."28 Because of this, he continues, "We have no absolutely determinate situation in the world." Heisenberg's uncertainty principle "shatters the objective universe." The uncertainty principle limits the simultaneous measurement of conjugate variables, such as position and momentum or energy and time. The more precisely you measure one variable, the less precise your measurement of the related variable will be.
Frayn himself tackles the definition:
According to the Copenhagen interpretation of quantum mechanics, the interconnected set of theories that was developed by Heisenberg, Bohr, and others in the Twenties -- the whole possibility of saying or thinking anything about the world, even the most apparently objective, abstract aspects of it studied by the natural sciences, depends upon human observation and is subject to the limitations which the human mind imposes. This uncertainty in our thinking is fundamental to the nature of the world.29
"It starts with Einstein," Bohr explains. "He shows that measurement -- measurement on which the whole possibility of science depends -- measurement is not an impersonal event that occurs with impartial universality. It's a human act, carried out from a specific point of view in time and space, from the one particular viewpoint of a possible observer.
"Then," he continues, "here in Copenhagen, we discover that there is no precisely determinable objective universe. That the universe exists only as a series of approximations. Only within the limits determined by our relationship with it. Only through the understanding lodged inside the human head."30
If quantum theory has any validity, then it should cause us to pause and consider the kinds of numbers that serve as indicators of student performance on standardized tests. What do the numbers mean? In plain language? According to Bohr, the numbers "mean what they mean" in mathematics. They have no meaning beyond that. So long as the mathematics works out, he argues in the play, "the sense doesn't matter." It may be the very height of arrogance to insist that a student's numerical score on a standardized test is free of the variables introduced by human behavior and that the score can mean anything at all beyond the number itself. In investing a student's numerical score with meaning beyond the number itself, we are guilty of performing acts of alchemy -- trying to turn dross into gold.
Where Does This Leave Us?
If we should not trust numbers to mean anything by themselves, if subjective evaluations of professional teachers are unreliable, and if we still need ways and means to ensure that students are learning and that high standards are upheld, what's left for us to do?
First, I think professionals need to make a concerted effort, as Alfie Kohn has told us, to stand up and be counted in opposition to the standardized testing movement.31 We should do so not just for all the reasons we know and for what the data tell us, but because using these tests and pretending to believe in their educational significance demeans us. The tests put the emphasis in education not on what is important but on what is diminishing for students' learning and their overall growth. Instead of raising academic standards, these tests do much to lower them. Beyond just subverting the tests in various ways, perhaps we should all stand up and "Just say no." No, we won't do this anymore. What if they gave a test and nobody came?
Second, I think we need to do more to help parents and students and teachers and school administrators to live more comfortably in a world of uncertainty. Unless some unforeseen disaster overtakes the planet, the high-speed world in which we live is not going to slow down soon. And tension, stress, and uncertainty will, inevitably, increase. Perhaps we are not ready to embrace the uncertainties of life wholeheartedly, but could we not learn more about how uncertainty affects our lives? And could the study of uncertainty not teach us something about our world and ourselves? As far as I know, living with uncertainty is not high on the agenda of any school's curriculum. How about "uncertainty training"? Maybe we should give it more than passing thought.
Feynman applied the uncertainty principle to the government of the United States, in his eloquent and cheeky The Meaning of It All. Feynman was comfortable in the murky waters of uncertainty, and, in fact, thrived there. "Scientists are used to dealing with doubt and uncertainty," he said. "I believe that, to solve any problem that has never been solved before, you have to leave the door to the unknown ajar. You have to permit the possibility that you do not have it exactly right."32
Feynman argued that the scientific appreciation of uncertainty is reflected in the thinking behind the U.S. Constitution:
The Government of the United States was developed under the idea that nobody knew how to make a government, or how to govern. The result is to invent a system to govern when you don't know how. And the way to arrange it is to permit a system, like we have, wherein new ideas can be developed and tried out and thrown away. The writers of the Constitution knew of the value of doubt. In the age that they lived, for instance, science had already developed far enough to show the possibilities and potentialities that are the result of having uncertainty, the value of having the openness of possibility.33
Third, I think that we need to take a stand for the use of informed, evaluative observations by teachers. Yes, it's true that these are flawed; yes, it's true that they are subjective; yes, it's true that teachers' judgments about student performance could be wrong, misguided, or biased. But instead of replacing these judgments with the far less useful alternative of standardized tests, maybe we need to direct our energies toward making teachers' evaluations more helpful to students, to parents, and to the overall learning process, beginning with clearer identification of how evaluative criteria are used to make more informed judgments. At least, efforts in that direction can put us on more solid ground with respect to student learning, for it is the professional teacher who really is in the best position to know, from multiple observations, in multiple contexts, how a student is actually performing and what kinds of assessment tools might be used to find out how better to help him or her.
In the criminal justice system in the United States and Canada, we have allowed ourselves to rely, wholeheartedly, on a "jury of our peers" to examine evidence and make determinations of guilt or innocence. The system is fallible, but we do not consider changing it because it is the best we have. If a jury of one's peers can be trusted to observe, listen, and weigh evidence, why can't we give equal regard to the professional judgment of teachers to observe evidence and make informed assessments of a student's ability to perform in school?
Finally, I believe that, at long last, we need to begin an extensive dialogue about what's important in student learning. This dialogue should include much more than narrowly formed behavioral objectives or the goal statements associated with individual subjects. We are, after all, in the business of preparing students to live fuller, healthier, better informed, and more purposeful lives. And students' competence might be better assessed from a more comprehensive perspective than merely from their ability to master a few facts about organic chemistry. "We must recognize that the education of the understanding -- the only kind of education that will ever give us a real mastery of our world -- does not proceed so much from acquaintance with a broad range of facts and principles as from the assimilation of a few."34
A professional dialogue that begins by asking, What's important? leads to a clearer idea of how we should assess whether we have reached the important goals. It is not the tests that should drive what we do in the act of teaching. Rather, all of us in the profession must say: this is what is important to us in the education of our youth. Let the ways of assessment we devise determine whether we have achieved our goals.
Teachers Who Dare
A few years ago, a group of secondary school teachers in Coquitlam, British Columbia, working together to reform their own teaching practices and to bring new life to the social studies curriculum, chose to rethink their approach to evaluation. If students were going to be asked to work on the development of "higher-level skills" and if teachers were going to stress learning for understanding of big ideas, rather than for the accumulation of isolated facts, then evaluative procedures would have to reflect these new curriculum approaches.
Working as a group, these teachers undertook to design a professional handbook containing several hundred suggestions for assessing student performance in all curriculum areas -- suggestions that went beyond single, correct-answer exams.35 These assessment tools reflected Howard Gardner's notion of "multiple intelligences"36 and offered students options to "show their stuff" in ways that tapped their unique talents and skills. The teachers who used these alternative assessments felt satisfied that their evaluation procedures were more congruent with their educational goals. They also found that these assessment tools provided them with considerable data about student performance. An added factor was that grading -- the bane of teachers' lives -- was never boring because of the variety and appeal of the students' projects. The students reported that they felt more challenged and more interested and saw assessment as more connected to their overall learning.
Prior to beginning work on the assessment tools, the teachers set out to identify what they considered to be some educational principles on which the process of evaluation rested. They specified that whatever was undertaken as part of the evaluative process should first ensure that enhanced student learning would result. To that end, they decided that assessment tools should:
While the evaluative tools contained in this teacher-developed handbook are, in and of themselves, a rich and vital educational resource and a testament to the quality of professional wisdom that teachers bring to such a task, what is even more impressive is the teachers' identification of what they considered to be an evaluative standard -- a means of identifying significant learning outcomes against which student performance could be assessed. This standard would help to inform assessments and would guide the teachers in determining where help was needed.
In tackling this task, the teachers created Profiles of Student Behaviors -- identifying 20 behavioral patterns that were considered evidence of effective student functioning in knowledge, in attitude, and in skill areas at the secondary school level. The Profiles give teachers a diagnostic tool to determine where students need help and what kind of work might be needed to promote student growth. This diagnostic tool helps to pinpoint the strengths and weaknesses of student performance, but it also provides a key for writing meaningful evaluative reports to parents and conducting parent/teacher conferences. The Profiles come in two forms -- one for use by teachers and the other for students. They include the following standards for student growth and development in the secondary years.
I. INTELLECTUAL DEVELOPMENT
1. Quality of Thinking
Can see the big idea. These students are able to see the larger picture in the examination of topics or issues. Their arguments are centered in the big ideas, and they are able to appreciate the complexities of these big ideas. In presenting arguments or points of view, these students go after the issues of substance and are unafraid to deal with what is important.
Show tolerance for the ideas and opinions of others. These students are open to and respectful of the ideas of others. While others' ideas may not agree with theirs, they are able to listen respectfully, see other points of view, and consider those viewpoints rationally and thoughtfully. In addressing others with different ideas, these students are thoughtful, respectful, and rational in their responses.
Know the difference between fact and opinion, between assumption and fact. These students use such qualifying words as "perhaps" or "it seems to me" when there is reason to be cautious in offering opinions or stating assumptions. In their arguments, they indicate when they are sure and when they need to exercise caution in their reasoning.
Show tolerance for contrary data. While holding strong beliefs of their own, these students are nevertheless open to considering ideas and data that are discrepant with their own. When these students examine discrepant ideas, they do so thoughtfully and rationally, checking carefully to determine how the discrepant data may fit into their own belief system.
Are able to provide appropriate examples to support ideas. These students are able to provide relevant examples to support a point of view. When asked to give examples to demonstrate the validity of an idea, these students can provide them, and the examples they offer are relevant to the argument. There is a clear relationship between the point of view and the examples that support it.
Can interpret data intelligently. These students are able to read, view, or observe data and draw intelligent and rational meanings from that data. The interpretations they make are based in the data, and they are cautious about drawing conclusions from data where there is insufficient evidence. The meanings they derive from the data are consequential and reflect what is important.
Produce original, inventive, creative work. These students are able to go beyond the ordinary and to create new schemes, new forms, and new products. They are original and inventive, and what comes from them is fresh, new, and imaginative. These students are risk-takers in their creative attempts, and they push the limits of what is standard in generating what is new.
Embrace thinking as a way of life. These students value thinking as a means of solving problems and as a way to inform decision making. They want to think for themselves; they want to think their own ideas; they value the power that thinking brings to their own lives. These students are independent and self-initiating, and they value thinking as a tool to enrich their lives.
II. SKILLS
1. Communication of Ideas
Show quality of thinking in writing. These students are able to show evidence of quality of thought in their written work. Their written ideas are presented clearly and are rooted in fact, observations, details, images, quotations, statistics, and many other forms of information. They are able to give examples that clarify what they mean. Written material is well organized; sentences are constructed as units of thought; the mechanics (spelling, punctuation, capitalization, and other conventions) are handled well. They are able to communicate ideas in a way that is interesting to the reader.
Show quality of thinking in speaking. These students show evidence of quality of thought in the oral communication of their ideas. When they make oral presentations in class, their language is clear, and what they say is data-based. They are able to give legitimate examples from valid sources to support their points of view. When they speak, it is easy to understand what they say and to follow their reasoning. The way they communicate their ideas is interesting, even compelling. When they argue a point of view, they make sense.
2. Research Skills
Show ability to collect and organize information. These students are able to locate the data they need. They gather data from many sources, and they use many sources in informing their arguments. In their oral or written reports, the data are organized in a way that makes sense, and the key issues are dealt with logically. They are able to see both sides of an issue, their analyses are literate and intelligent, and their conclusions are data-based.
Can extract important meanings from data and record them accurately. These students are able to extract the big ideas from a variety of sources, such as books, lectures, films, graphs and tables, and statistics. Their ability to extract meaning demonstrates their intelligent determination of what is important and what is of minor consequence. In examining data sources, they are able to get to the heart of the issues of consequence and record them accurately.
3. Interpersonal Skills
Attend to the ideas of others. These students are able to attend to the ideas of other students. What's more, their responses to these students' statements reveal that they have heard and understood the important meanings in what has been said. They not only listen carefully, but they are able to perceive the meanings of the statements being made.
Contribute to the facilitation of group discussions. These
students are "active listeners" -- ready and willing
to hear the ideas of others. They are supportive of others in
the group and show respect for all ideas, whether or not they
are in agreement with their own. These students are good group
members. They are not dogmatic in presenting their ideas, and
their flexibility encourages a sharing of different viewpoints.
These students facilitate the group process.
III. ATTITUDES
1. Personal Perspective
Have a positive outlook. These students tend to look at problems as challenges. They have a positive spirit, a sense of "can do" about what is possible. They are enthusiastic about learning, about school, and about opportunities. These students consistently challenge themselves and, even when unsuccessful, are undaunted.
Have a tolerance for ambiguity. These students have a high tolerance for uncertainty. They are able to suspend judgment and to wait until the data are in before making up their minds. They are able to see the complexities in complex situations, and the complexities do not defeat them. They have a vast appreciation for the "gray areas" and are able to see far beyond the black-and-white, either-or positions.
Have a global perspective. These students tend to see situations, problems, and issues from a wider perspective. They are able to see far beyond themselves, their school, their community, and their city and bring the larger picture into focus. They understand about interdependence and about how world conditions transcend geographical boundaries. They appreciate that all peoples, all countries, and all world events are interconnected in important ways.
2. Beliefs and Values
Beliefs inform their behavior. These students have thought a lot about their beliefs; they have talked about what is important to them; they know what they stand for. Their beliefs seem to have been chosen after some reflection, and there is a clear relationship between their beliefs and actions. If they express a strong belief, you can see that belief in the consistency with which they act on it.
3. Self-Evaluation
Are open to self-evaluation. These students see self-evaluation as a chance to learn more about themselves and as an opportunity to become lifelong learners. They are open to self-examination and are not defensive when they need to ask for help. They welcome the opportunity to participate in a self-evaluative process. Their lack of defensiveness allows them to see their performance more objectively and makes them open to continued growth.
Are skillful in self-evaluation. These students are able to look at their own performance in class with critical self-awareness. They understand what is required in self-evaluation and give thoughtful consideration to their performance. They are able to be self-critical and introspective in their assessments of self, and they do this without being falsely modest or defensively arrogant.38
Conclusion
In developing the Profiles of Student Behavior, the Coquitlam teachers made determinations of the kinds of behaviors they would value in their graduating classes. These Profiles are, as far as I know, the only attempts by teachers to tackle this difficult but significant question. In doing so, they open themselves to criticism and objection. Surely, they will not have included every learning outcome on every teacher's list, but at least they have provided a template for any other group of teachers who might wish to embark on such an exciting journey into the heartland of professional judgment -- where we acknowledge that we live in a land of uncertainty and where our best professional efforts act as rudders that guide students' learning in the most educationally significant ways.
SELMA WASSERMANN is a professor emerita,
Simon Fraser University, Burnaby, B.C.