Literacy in the Information Age By Gerald W. Bracey Illustration © 1998 by Mario Noche | ||
AS THE candidates talk (and talk and talk) about education, it seems appropriate that we consider recent research on one of the most fundamental skills: reading. But that is easier said than done, in part because the definition of literacy just won't keep still.
Once upon a time, a person who could sign his or her name was deemed literate. Later, anyone who had completed third grade was considered literate. Then along came a mushy concept, "functional literacy," and its equally slushy counterpart, "functional illiteracy," about which lots was written in the days of the "minimum competency testing" madness (roughly 1975 to 1985).
Two April 2000 reports, Literacy in the Labor Force (from the U.S. Department of Education's Office of Educational Research and Improvement) and Literacy in an Information Age (from the Organisation for Economic Co-operation and Development), construct a scale that runs from 0 to 500 -- much like the scale used by the National Assessment of Educational Progress. However, both OERI and OECD often collapse the scale into five categories or levels.
The OECD document defines literacy as "the ability to understand and employ printed information in daily activities, at home, at work, and in the community -- to achieve one's goals and to develop one's knowledge and potential." Notice that, by this definition, there is no fun in literacy: reading a novel, a magazine, or a poem doesn't count.
Both studies differentiate three varieties of literacy: prose, document, and quantitative. While the three do present some differences, the correlations between them are quite high, and they share some highly similar patterns as well. Unless specifically noted, the results discussed below come from analyses of prose literacy.
The first results of the OECD study were discussed in the Seventh Bracey Report (October 1997). Since then, 13 other nations decided to participate, bringing the total to 22 countries. The U.S. average for prose literacy is significantly lower than that of four nations, not different from that of six, and significantly higher than that of 11. By contrast, our average for document literacy is significantly lower than that of 11 nations, the same as that of four, and higher than that of six.
The first "principle of data interpretation" in my book Bail Me Out! (Corwin Press, 2000) is "Beware of Averages," and that advice is very useful here. Only one nation, Sweden, has a 95th percentile in prose literacy that is higher than that of the U.S., and only four countries -- Portugal, Poland, Slovenia, and Chile -- have a 5th percentile that is lower. The difference between the U.S. 95th percentile and that of Sweden is just 13 points, but the difference between the 5th percentiles is 78 points. In other words, the dispersion of scores in the U.S. is enormous.
These results are similar to the findings of the 1992 international reading study, How in the World Do Students Read?, which measured reading skills of 9- and 14-year-olds. American students were near the top at both ages and had the highest 95th and 99th percentiles. Our best readers scored higher than any other nation's best readers.
Some of the data are impossible to interpret. For example, the data on scores for people who complete college are difficult to interpret because the proportion of college graduates varies enormously across the nations. Age comparisons are also iffy. Many adults in the oldest age group would have lived through World War II and the subsequent rebuilding efforts. Informal reports also indicate that older Europeans, especially those who would be expected to be on the lower rungs, refused to participate, thereby raising sampling issues.
There are substantial relationships between the proportions of adults at the lowest levels and at the highest levels and a nation's economic productivity. The relationship is negative for the first group, positive for the second. The report suggests that this relationship works in two ways: rich countries can afford to invest more in literacy development, and the literacy skills themselves contribute to productivity. Similar correlations are found between the proportion of poor and good readers and a nation's life expectancy.
Literacy in the Labor Force is a bit dated in that the data come from the 1992 National Adult Literacy Survey, and there are no surprises. Still, it is useful to have the data in one place. Men make more money than women do, regardless of literacy level, but the discrepancy grows with increasing literacy. Women who read at level 1 (the lowest level) are better off compared to men at level 1 than are women at level 5 (the highest level) compared to men at level 5.
Interestingly, the gap between whites and blacks disappears at level 4. (There weren't enough blacks or other minorities at level 5 to provide a reliable estimate.) Asians at level 1 earn less than whites, but they earn more at levels 2, 3, and 4.
Literacy in the Labor Force is report no. NCES 1999-470. It costs $33 but can be downloaded free and printed from www.nces.ed.gov. In hardcover, it is nearly 350 pages long, so think about your toner cartridge before you print it.
Literacy in an Information Age can be obtained from the Washington, D.C., office of the Organisation for Economic Co-operation and Development, 2001 L Street N.W., Washington, DC 20036. It can also be purchased at www.oecd.org, but use French francs if you have them.
Modifying Angoff
DO YOU know what the Angoff procedure is? What a Modified Angoff is? Well, in these days of tests and accountability, you probably need to learn. You can find out easily enough by reading Cut Scores: Results May Vary, the first in a series of monographs from the National Board on Educational Testing and Public Policy. The board is an "independent monitoring system" that monitors testing programs, evaluates the costs and benefits of testing, and assesses whether or not professional standards are being met.
The monograph was written by Catherine Horn, Miguel Ramos, Irwin Blumer, and George Madaus, all of Boston College, where the board is housed. It describes not only the various Angoff procedures, but also how standards are (or should be) set and various other methods of setting cut scores. Then it looks at these procedures in action in several actual high-stakes programs, with emphasis on the MCAS (Massachusetts Comprehensive Assessment System).
The monograph first discusses performance standards and standard-setting procedures in general. Specifically, it raises the issue of the qualification of judges to set such standards and lauds Virginia's approach in its Standards of Learning (SOL) program. "The committees included teachers, curriculum experts, and educators from throughout Virginia and reflected a balance of geographic distribution, ethnicity and race, and knowledge of the grades and the content areas to be tested. Each committee had approximately 20 members, nominated by school district superintendents, educational organizations, institutions of higher education, and the business community."
This was certainly as it should have been, but the authors fail to note that the work of the committees was totally undone by the state board of education. That body essentially ignored the recommendations of the committees and set cut scores sufficiently high that 98% of the schools failed on the first administration and 93% failed on the second.
The committees in Virginia used a Modified Angoff procedure. In the original technique, designed by the late William Angoff of the Educational Testing Service, judges are shown items and asked to judge the probability that a minimally competent person (a mythical being, to be sure) would get each right. These probabilities are then summed across items, and the result is the recommended cut score for a given judge. Since it would be capricious for a single judge to determine the cut score, multiple judges (20 in Virginia) render their verdicts, and the cut score is usually set at the average of the individual judges' scores.
This is where the process in Virginia fell apart. The state board accepted only the most stringent cut score recommendations and set the cut score in the extreme upper range. For instance, recommendations for the cut score for eighth-grade science ranged from 18 items correct to 29 items correct, and the average recommendation was 20 items correct. The state board set the passing score at 29. It did this for all but two of the 27 tests, for which it set the passing score higher than any score recommended by the 20 judges.
A reader might well ask why a state board of education would do such a thing. In my opinion, the reason is that the movement toward high-stakes testing is not about education; it is about power and control and ideological agendas. A number of people feel that, in Virginia, the state board deliberately adopted tough standards to make public schools look bad and thereby to grease the skids for voucher legislation.
The modification of Angoff most often used is to restrict the probabilities judges can use. In the original, judges are free to pick any probability from 0 to 1. In the Modified Angoff, they use seven specific categories from .05 to .95 and can also choose "I don't know." In another modification, not discussed in the monograph, judges pick a cut score and then see actual test results. Then they can modify their judgments based on this "reality check." (The monograph also describes the contrasting groups and the bookmark procedures for setting cut scores.)
Setting a performance level doesn't have to be a pass/fail matter. One can devise categories -- such as those used by the National Assessment of Educational Progress (NAEP) -- that classify students as below basic, basic, proficient, and advanced. Although the NAEP procedures have been roundly criticized, the categories they use have become popular and are being applied to standardized, commercial norm-referenced tests in capriciously arbitrary ways. In one instance, "basic" is any score below the 50th percentile on the Stanford Achievement Test (SAT 9). In another state, it is anything below the 40th percentile. In Massachusetts, students are classified into the categories of failure, needs improvement, proficient, and advanced.
Once the cut scores are set and the results are in, the question becomes "Do the results make sense?" One cannot assume, as the media are wont to do, that the results are meaningful. One must look around for collateral or independent evidence. In the cases of Virginia and Massachusetts, such evidence is hard to find. For instance, in Massachusetts, 71% of the eighth-graders failed or were judged in need of improvement in science. And yet Massachusetts is one of 14 states that would have scored as well as or better than 40 of the 41 nations in eighth-grade science in the Third International Mathematics and Science Study.
Another source of potential validation of a given test is another test that measures similar subject matter. Do the results look similar? In the case of Massachusetts, the correlation between the MCAS and the SAT 9 was not perfect. More disturbing, students from about the 60th percentile to the 99th percentile on the SAT 9 were distributed across all four MCAS categories. That is, while some students with high percentile rankings on the SAT 9 were labeled "advanced," some were also labeled "failure." The last third of the monograph lays out and discusses a number of reasons why the results from the two tests might have been so different.

PDK Home | Site Map
Kappan Professional
Journal
Last updated 11 September 2000
URL: http://www.pdkintl.org/kappan/kbra0009.htm
Copyright 2000 Phi
Delta Kappa International