New Research on Instability of Teacher Evaluation Metrics


Current Teacher Evaluation Metrics are Unstable The Education Policy Analysis Archives has just published (Oct 6, 2014) “The Stability of Teacher Performance and Effectiveness: Implication for Policies Concerning Teacher Evaluation,” by Morgan, Hodge, Trepinski and Anderson.  This is a well-designed … Continue reading

Bruce Baker on Value Added and Student Growth Percentiles

I’ve lauded Bruce Baker’s blog once already, but day by day I’m becoming an even more devoted follower of his wit and his brilliant exposés of fallacious research, bogus claims, and shallow thinking.  I’ve learned/confirmed so much by subscribing to School Finance 101, particularly about value added models and student growth percentile models, as Dr. Baker reexamines the research on these topics and highlights the shortcomings and the unintended consequences of using these measures in states all over the country.

I’m very interested in his work on student growth percentiles, first created by Damien Betebrenner at the National Center for the Improvement of Educational Assessment, First used in Colorado, and now spreading to Massachusetts, New Jersey and New York, the proponents of SGPs suggest they fix the problems inherent in value added measures.  Baker emphatically says not so, and uses the writings of the fathers of SGPs to highlight how SGPs are being misused.

To get up to speed on VAMs and SGPs and their problems as Baker sees them, go through the blog references found in his Value Added Teacher Evaluation category here, and as you read the articles, click on through to the links within each one.  You’ll get quite a wonderful look at the limitations of these statistical models as they relate to teacher evaluation.  Baker also takes on state education officials in several states, including Colorado, New Jersey and New York, with his irrevent commentary.

Of particular interest for New Yorkers is his analysis of the preliminary technical reports on the results of NY’s first SGP assessment results.  This analysis can be seen in the entry entitled “AIR Pollution in NY State…”  He offers graphs showing how factors that cannot be attributed to teachers have an effect on the patterns of SGP scores–students in low income schools generally underperform students in higher income schools, as one already knows, but the SGPs of these students also lag behind higher income schools.  This demonstrates that SGPs do not, in fact, account for effects of peer groups, of school effects, or of poverty, as some suggest.

Read Baker and weep about how assessments are regularly being misused by people who should know better,

Value-Added Models – New Research

If you’re following educational reform, you are aware that some flavor of value-added statistical modeling is being pushed throughout the nation as a means to identify good/bad teachers or good/bad schools.  Pushed now by the US Department of Education and Race to the Top, and promoted by a specialized cadre of assessment developers who stand to rake in massive profits from the testing required to generate data to be used in value-added analyses, this movement has been sweeping across the country.  It has, on the other hand, always raised the suspicions of educational researchers, some economists, and numerous statisticians who have suggested that the models, and the assessments behind them, simply don’t work well as a basis to make high stakes decisions about educational policy, about schools and teachers, or about children.

In January, the Educational Policy Analysis Archives, a peer-reviewed online policy journal, ran several research articles in a special issue covering value-added, called “Value-Added: What America’s Policymakers Need to Know and Understand.”  The title is very misleading, since my reading of this research suggests a better title might be “Value-Added:  What America’s Policymakers Consistently Ignore.”  I’m highlighting the articles here, with links, so readers can pick and choose what interests you.  Each separate article offers a substantial list of references for further exploration.

The opening discussion by the editors is “Value-Added Model (VAM) Research for Educational Policy: Framing the Issue.”  The editors provide background and relevance of six additional papers included in the special issue.

Diana Pullin, of Boston College, a lawyer and legal scholar, offers “Legal Issues in the Use of Student Test Scores and Value-Added Models (VAM) to Determine Educational Quality.”  Given that many states are implementing teacher evaluation models in which VAM measures will be used to dismiss teachers, Pullin analyzes the complexity of predictable court challenges as experts on both sides of the VAM movement offer conflicting testimony about the validity and reliability of tests and the statistics used for employment decisions.

Bruce Baker, Joseph Oluwole and Preston Green of Rutgers, Montclair State, and Penn State respectively,  in “The Legal Consequences of Mandating High Stakes Decisions Based on Low Quality Information: Teacher Evaluation in the Ract-to-the-Top Era,”  review issues around the utility of student growth models and VA models.  Their lengthy appendix outlines the VAM evaluation policies of several states.

Nicole Kersting and Mei-Kuang Chen of U of Arizona, and James Stigler of UCLA, offer a statistically heavy analysis titled “Value-Added Teacher Estimates as Part of Teacher Evaluations: Exploring the Effects of Data and Model Specifications on the Stability of Teacher Value-Added Scores.”  They conclude there are several problems with stability and sample sizes which suggest a need for more work to improve the measures.  If you’re not statistically competent, read the narrative–the supporting details are a second language for many of us.

Elizabeth Graue, Katherine Delaney, and Anne Karch of the U of Wisconsin, Madison provide a qualitative research paper entitled “Ecologies of Education Quality.”  Their analysis suggests that VAMs can’t capture or control for all the variables of effective teaching.

Sentinels Guarding the Grail: Value-Added Measurement and the Quest for Education Reform,” from Rachel Gabriel of the University of Connecticut and Jessica Nina Lester of Washington State is another qualitative analysis.  This is an interesting review of how proponents of VA suggest the methods are scientific and accurate, while the cautions of educational researchers with significantly opposite points of view are ignored or dismissed.  This analysis of the effective public promotional efforts of VAM and the glib solutions to educational problems that VAM supporters say will come through it’s adoption highlights how public policy has been shaped by politics and the media.

Finally, Moshe Adler of Columbia University, in “Findings vs. Interpretation in “The Long-Term Impacts of Teachers” by Chetty et al.” offers a a refutation of the claims of an often-cited study suggesting that the positive effects of a singe year with a good teacher last a lifetime,  Since the Chetty el al. study is frequently cited as justification of VAM and teacher evaluation changes, this is an important commentary on what Adler says are the false claims of this seminal economic impact study.