If you are not reading some of the online topical threads on current education topics, you are missing an opportunity to follow the thoughts of an interesting variety of people who are interested in education. One group I follow is … Continue reading
Courts and Teacher Evaluation Systems
Writing in Education Week, September 18, 2013, James Popham and Marguerita DeSander suggest that “thousands of American teachers will lose their jobs in the next few years because of the recently designed, more demand evaluation systems now found in most of our states.” They point out these systems have been triggered by recent federal policies, and then note that most teachers think the dismissals can be reversed in court. The main point of their essay is that the courts, both state and federal, have “historically refused to substitute their judgement for that of a school board in cases where a teacher (whether tenured or probationary) has been terminated for substandard job performance. This has been true if the termination is based on even a scintilla of evidence.”
Popham is a well-know figure among education professors who are experts on evaluation and assessment, with several books, most recently (2013) Evaluating America’s Teachers: Mission Possible? from Corwin Press. DeSander is an ed admin professor and formerly was an attorney specializing in employment law. These are two folks who should be seen as authoritative figures on this topic.
New York Teachers More Protected?
When I first saw the outlines of New York’s response to Race To The Top teacher evaluation changes, I thought that it might be a good time for college grads to go into the practice of education law, particularly if they had any statistical talent and were willing to delve into the psychometrics of testing and its use in teacher evaluation. As New York’s teacher evaluation system, including the use of test scores in teacher evaluation rolled out, I continued to believe the system would collapse under the weight of court challenges to it’s fairness and it’s validity. I still believe that, but I’m not quite so confident with my opinion after reading this piece. If I look at New York’s system, which has a unique additional element–teachers identified as low performing have to be given opportunities for professional development by their school districts–perhaps that district responsibility is going to prevent willy-nilly teacher dismissals. With the greater than anticipated increased costs of testing, and the Governor’s imposition of a 2% tax cap on raises in school budgets, it’s already estimated that 40% of districts across the state will be in bankruptcy in the next few years. So districts won’t have the money to provide professional development to their low performing staff, meaning they might not be able to dismiss them regardless of evaluations. That’s not the case in other states, where these warnings may be far more telling.
Evaluation Systems Show No Evidence of Efficacy
Popham and DeSander recognize that across the nation, the new evaluation systems have serious flaws–relying “too heavily of traditional achievement tests…” They note that the tests “are unaccompanied by any evidence that they are able to distinguish between well-taught and badly taught students.” They also point out that courts, based on past patterns of refusing to rule on the merits or validity of evaluation systems, “will not rule on the appropriateness of a teacher-evaluation system, or the evidence-collection procedures incorporated in that system, if the procedures are applied in a fair and consistent manner to all teachers affected. Thus, even an inadequate evaluation system will avoid the rigor of court scrutiny when it is applied equally to all teachers.” (Emphasis mine.)
The authors also take issue with the validity of classroom observations when they suggest that rating teacher classroom performance on the bases of “a handful of 30-minute classroom visits..” and even when looking at 40 or 50 dimensions of classroom performance, “the resultant observation data are often of little value.”
Courts May Not Overturn Bad Evaluation Systems
So where does that leave teachers all over the nation? I agree with the authors that the evaluation systems lack validity as they are being designed–the reliance on testing is not warranted psychometrically at all, nor are the tests designed for the purposes being implemented. And if courts are likely to disregard expert opinion because they won’t decide on system appropriateness, there is no venue for expert testimony about why these systems should be disregarded. If the legitimacy of procedures is not about to be adjudicated, there’s no place for teachers to turn. One need only review the multiple interpretations of the evaluation designs across 696 districts currently approved by the New York State Education department to know that there is no consistency in their format across the state. If courts give deference to school boards, teachers might be in far more trouble than they currently expect. Finally, a good civics education will remind all of us that courts are not always about right and wrong–they interpret the law. Where courts defer to school boards and states who have implemented bad evaluation systems, teachers beware.
I’ve lauded Bruce Baker’s blog once already, but day by day I’m becoming an even more devoted follower of his wit and his brilliant exposés of fallacious research, bogus claims, and shallow thinking. I’ve learned/confirmed so much by subscribing to School Finance 101, particularly about value added models and student growth percentile models, as Dr. Baker reexamines the research on these topics and highlights the shortcomings and the unintended consequences of using these measures in states all over the country.
I’m very interested in his work on student growth percentiles, first created by Damien Betebrenner at the National Center for the Improvement of Educational Assessment, www.nciea.org. First used in Colorado, and now spreading to Massachusetts, New Jersey and New York, the proponents of SGPs suggest they fix the problems inherent in value added measures. Baker emphatically says not so, and uses the writings of the fathers of SGPs to highlight how SGPs are being misused.
To get up to speed on VAMs and SGPs and their problems as Baker sees them, go through the blog references found in his Value Added Teacher Evaluation category here, and as you read the articles, click on through to the links within each one. You’ll get quite a wonderful look at the limitations of these statistical models as they relate to teacher evaluation. Baker also takes on state education officials in several states, including Colorado, New Jersey and New York, with his irrevent commentary.
Of particular interest for New Yorkers is his analysis of the preliminary technical reports on the results of NY’s first SGP assessment results. This analysis can be seen in the entry entitled “AIR Pollution in NY State…” He offers graphs showing how factors that cannot be attributed to teachers have an effect on the patterns of SGP scores–students in low income schools generally underperform students in higher income schools, as one already knows, but the SGPs of these students also lag behind higher income schools. This demonstrates that SGPs do not, in fact, account for effects of peer groups, of school effects, or of poverty, as some suggest.
Read Baker and weep about how assessments are regularly being misused by people who should know better,
If you’re following educational reform, you are aware that some flavor of value-added statistical modeling is being pushed throughout the nation as a means to identify good/bad teachers or good/bad schools. Pushed now by the US Department of Education and Race to the Top, and promoted by a specialized cadre of assessment developers who stand to rake in massive profits from the testing required to generate data to be used in value-added analyses, this movement has been sweeping across the country. It has, on the other hand, always raised the suspicions of educational researchers, some economists, and numerous statisticians who have suggested that the models, and the assessments behind them, simply don’t work well as a basis to make high stakes decisions about educational policy, about schools and teachers, or about children.
In January, the Educational Policy Analysis Archives, a peer-reviewed online policy journal, ran several research articles in a special issue covering value-added, called “Value-Added: What America’s Policymakers Need to Know and Understand.” The title is very misleading, since my reading of this research suggests a better title might be “Value-Added: What America’s Policymakers Consistently Ignore.” I’m highlighting the articles here, with links, so readers can pick and choose what interests you. Each separate article offers a substantial list of references for further exploration.
The opening discussion by the editors is “Value-Added Model (VAM) Research for Educational Policy: Framing the Issue.” The editors provide background and relevance of six additional papers included in the special issue.
Diana Pullin, of Boston College, a lawyer and legal scholar, offers “Legal Issues in the Use of Student Test Scores and Value-Added Models (VAM) to Determine Educational Quality.” Given that many states are implementing teacher evaluation models in which VAM measures will be used to dismiss teachers, Pullin analyzes the complexity of predictable court challenges as experts on both sides of the VAM movement offer conflicting testimony about the validity and reliability of tests and the statistics used for employment decisions.
Bruce Baker, Joseph Oluwole and Preston Green of Rutgers, Montclair State, and Penn State respectively, in “The Legal Consequences of Mandating High Stakes Decisions Based on Low Quality Information: Teacher Evaluation in the Ract-to-the-Top Era,” review issues around the utility of student growth models and VA models. Their lengthy appendix outlines the VAM evaluation policies of several states.
Nicole Kersting and Mei-Kuang Chen of U of Arizona, and James Stigler of UCLA, offer a statistically heavy analysis titled “Value-Added Teacher Estimates as Part of Teacher Evaluations: Exploring the Effects of Data and Model Specifications on the Stability of Teacher Value-Added Scores.” They conclude there are several problems with stability and sample sizes which suggest a need for more work to improve the measures. If you’re not statistically competent, read the narrative–the supporting details are a second language for many of us.
Elizabeth Graue, Katherine Delaney, and Anne Karch of the U of Wisconsin, Madison provide a qualitative research paper entitled “Ecologies of Education Quality.” Their analysis suggests that VAMs can’t capture or control for all the variables of effective teaching.
“Sentinels Guarding the Grail: Value-Added Measurement and the Quest for Education Reform,” from Rachel Gabriel of the University of Connecticut and Jessica Nina Lester of Washington State is another qualitative analysis. This is an interesting review of how proponents of VA suggest the methods are scientific and accurate, while the cautions of educational researchers with significantly opposite points of view are ignored or dismissed. This analysis of the effective public promotional efforts of VAM and the glib solutions to educational problems that VAM supporters say will come through it’s adoption highlights how public policy has been shaped by politics and the media.
Finally, Moshe Adler of Columbia University, in “Findings vs. Interpretation in “The Long-Term Impacts of Teachers” by Chetty et al.” offers a a refutation of the claims of an often-cited study suggesting that the positive effects of a singe year with a good teacher last a lifetime, Since the Chetty el al. study is frequently cited as justification of VAM and teacher evaluation changes, this is an important commentary on what Adler says are the false claims of this seminal economic impact study.