Why Not Evidence-Based Educational Policy? I’ve just discovered another writer who has impressed me with the first essay I’ve read. Arthur Camins is the Director of the Center for Innovation in Engineering and Science Education at Stevens Institute of Technology … Continue reading
Courts and Teacher Evaluation Systems
Writing in Education Week, September 18, 2013, James Popham and Marguerita DeSander suggest that “thousands of American teachers will lose their jobs in the next few years because of the recently designed, more demand evaluation systems now found in most of our states.” They point out these systems have been triggered by recent federal policies, and then note that most teachers think the dismissals can be reversed in court. The main point of their essay is that the courts, both state and federal, have “historically refused to substitute their judgement for that of a school board in cases where a teacher (whether tenured or probationary) has been terminated for substandard job performance. This has been true if the termination is based on even a scintilla of evidence.”
Popham is a well-know figure among education professors who are experts on evaluation and assessment, with several books, most recently (2013) Evaluating America’s Teachers: Mission Possible? from Corwin Press. DeSander is an ed admin professor and formerly was an attorney specializing in employment law. These are two folks who should be seen as authoritative figures on this topic.
New York Teachers More Protected?
When I first saw the outlines of New York’s response to Race To The Top teacher evaluation changes, I thought that it might be a good time for college grads to go into the practice of education law, particularly if they had any statistical talent and were willing to delve into the psychometrics of testing and its use in teacher evaluation. As New York’s teacher evaluation system, including the use of test scores in teacher evaluation rolled out, I continued to believe the system would collapse under the weight of court challenges to it’s fairness and it’s validity. I still believe that, but I’m not quite so confident with my opinion after reading this piece. If I look at New York’s system, which has a unique additional element–teachers identified as low performing have to be given opportunities for professional development by their school districts–perhaps that district responsibility is going to prevent willy-nilly teacher dismissals. With the greater than anticipated increased costs of testing, and the Governor’s imposition of a 2% tax cap on raises in school budgets, it’s already estimated that 40% of districts across the state will be in bankruptcy in the next few years. So districts won’t have the money to provide professional development to their low performing staff, meaning they might not be able to dismiss them regardless of evaluations. That’s not the case in other states, where these warnings may be far more telling.
Evaluation Systems Show No Evidence of Efficacy
Popham and DeSander recognize that across the nation, the new evaluation systems have serious flaws–relying “too heavily of traditional achievement tests…” They note that the tests “are unaccompanied by any evidence that they are able to distinguish between well-taught and badly taught students.” They also point out that courts, based on past patterns of refusing to rule on the merits or validity of evaluation systems, “will not rule on the appropriateness of a teacher-evaluation system, or the evidence-collection procedures incorporated in that system, if the procedures are applied in a fair and consistent manner to all teachers affected. Thus, even an inadequate evaluation system will avoid the rigor of court scrutiny when it is applied equally to all teachers.” (Emphasis mine.)
The authors also take issue with the validity of classroom observations when they suggest that rating teacher classroom performance on the bases of “a handful of 30-minute classroom visits..” and even when looking at 40 or 50 dimensions of classroom performance, “the resultant observation data are often of little value.”
Courts May Not Overturn Bad Evaluation Systems
So where does that leave teachers all over the nation? I agree with the authors that the evaluation systems lack validity as they are being designed–the reliance on testing is not warranted psychometrically at all, nor are the tests designed for the purposes being implemented. And if courts are likely to disregard expert opinion because they won’t decide on system appropriateness, there is no venue for expert testimony about why these systems should be disregarded. If the legitimacy of procedures is not about to be adjudicated, there’s no place for teachers to turn. One need only review the multiple interpretations of the evaluation designs across 696 districts currently approved by the New York State Education department to know that there is no consistency in their format across the state. If courts give deference to school boards, teachers might be in far more trouble than they currently expect. Finally, a good civics education will remind all of us that courts are not always about right and wrong–they interpret the law. Where courts defer to school boards and states who have implemented bad evaluation systems, teachers beware.
We are inundated every day with pundits talking about how our school reforms will improve our results, yet after years of more and more testing and more and more accountability based on public reporting or test scores, our schools appear to remain problematic. Two recent articles exemplify the ideas of researchers who offer insights on changing the direction of the reform movement.
David Berliner, in “Effects of Inequality and Poverty vs. Teachers and Schooling on America’s Youth,” provides a strong argument that economic inequality and poverty are the issues that must be addressed to see any wide-spread improvement in student achievement. Go to www.tcrecord.org, look under Articles, and search for Berliner to find an executive summary.
His content derives from his own and other research on how socio-economic issues are the driving forces that work against student achievement. He provides analysis of American student test scores broken out by levels of poverty in schools, and makes a clear case showing that our scores, when comparing like levels of student poverty, rise to the top, or very near the top on the international tests that are regularly used to claim our schools are failing. He makes it clear that our social support systems have failed, and our international competitors have done a far better job of providing supports to families and children in need. So, our schools are not failing upper and middle class students, but our society is failing lower-income families and schools.
What I find encouraging in his work (and he has a long history of debunking the critics of American education, which you might want to examine separately) is his willingness to identify sloppy reasoning and utter nonsense found in the media and in the arguments of reformers with political agendas. He recognizes there are examples of “occasional” (the emphasis is Berliner’s) success stories from individual schools and families who are successful and rise from poverty, or the “occasional” teacher who breaks through with low-income children, but he points out that these are exceptions, not the norms. Americans want to believe in this idealized version of the American Dream, but as policy, it simply doesn’t work. He point out that while we have long-living overweight smokers, no one seriously suggests building public health policy by examining these exceptions—these exceptions don’t drive public health policy, except in education, where reformers hold on to the unrealistic notion that finding only exceptional teachers for low performing schools will render poverty irrelevant, and testing to find great teachers will cure educational ills.
Taking a different, more traditional view on reform, David L. Kirp, a public policy professor at the University of California, Berkley, wrote “The Secret to Fixing Bad Schools” on February 10, 2013, in the New York Times. Kirp talks about the “striking achievement” of Union City, N.J. schools, an urban district that has done school reform successfully. They enroll almost every 3 and 4 year old in prekindergarten, work on character, focus on the individual needs of children, “figuring out what’s best for each child, rather than batch-processing them.” Strong principals encourage teachers to raise expectations, and teachers have responded. In Kirp’s words, “What makes Union City remarkable is, paradoxically, the absence of pizazz. It hasn’t followed the herd by closing ‘underperforming’ schools or giving the boot to hordes of teachers. No Teach for America recruits toil in its classrooms, and there are no charter schools.”
They started transforming their schools years ago when poor performance threatened them with a state take-over. The district designed an evidence based curriculum where “learning by doing replaced learning by rote.” Teachers were encouraged to work together and coaching and mentoring supported staff who struggled. Principals became education leaders.
Now folks from all over are visiting to see what Union City has done. Frankly, the research on what works in education has been rather clear for many years–Union City simply read it and implemented it in their school improvement work. So, here’s an interesting example of a school district that is one of the “occasional” examples of success that Berliner talks about.
Where do these two articles overlap? While at first they seem to reflect opposite approaches, I find them more complementary than not. These researches are not in the testing to find and fire weak teachers camp. They both want to see schools address root causes of academic failure with responsible strategies–Union City brings in high expectations, parent involvement, support for weak teachers, and a cultural shift that reflects a long-term strategy of continuous improvement. They both reject the current direction of school reformers who want to reinvent education with unproved quick fixes.
Kirp’s recent book is Improbable Scholars: The Rebirth of a Great American School System and a Strategy for America’s Schools, a more detailed look at the Union City Schools and what they represent as a legitimate model for school reform. Google him –his vitae lists lots of articles he’s written–many for Huffington Post–on education policy issues.
Now that value added measures are commonplace throughout the country, what can schools do to effectively incorporate them into teacher evaluation? This is the central question that classroom teachers and building principals must answer as they meet requirements for federal funding that is driving this model. I’m enthusiastic about using data for school improvement, and it’s been central to the last 15 years of my career as data warehouses and analysis tools for classroom teachers have emerged. But the way states have adopted the use of VA assessments varies greatly, and the impact of a single round of testing on teacher evaluations also varies. This variance is problematic at several levels.
The critics of VA point out that student scores can vary from year to year–one source of variance. If you read teacher blogs or blogs from assessment critics about how VA assessments are used, or you read some progressive school improvement bloggers, you’ll read many examples of students who scored highly one year and were significantly lower in scores the next–changes that could not be conveniently explained by the effects of teachers or schools. Scores of individual teachers also vary across classes in the same year and from year to year, and examples of this variance fill the pages of critical analyses of VA methodology. In such scenarios, a given teacher of a single subject in the same year can show significant variance between students across multiple sections of the same course. A teacher might also score highly for a year or two and then plummet the following year. Such variance is not explainable or accounted for particularly well in the statistical models. I’ve identified several research analyses on these topics in the Research section of this site.
The concept of Student Growth Percentiles, which is a variant of VA analysis, compares similar student scores and ranks students according to their growth. When the student scores are aggregated into teacher classes, the teacher can then be given a growth score that is intended to fairly account for the population in the teacher’s class. This system weighs the aggregated class groupings to similar students so that teacher X, with 10 ESL students and 20 regular students has the scores of the ESL students compared to other ESL students in the state. This is intended to lessen the impact of second language students on the teacher’s VA score. However, an informed Internet reader will still find examples of teachers in New York (which is using Student Growth Percentiles for the first time this year) who are rated by principals and parents and colleagues as outstanding who have a mediocre VA score. So rating teachers on VA scores, even with Student Growth Percentiles, is fraught with problems.
The widely adopted but simplistic solution is to use a VA score (or any other assessment score) as only a part of a teacher evaluation. Using a summary test score in a subject area for only part of a teacher evaluation while adding classroom observations as a large, or largest component of evaluation is a commonly proposed means to lessen the impact of assessments, and a tacit recognition that there’s a problem with this unproven system. States have jumped into this complex process without any national research on what works, without agreement on how to use student test results to evaluate teachers, and without any real analysis of whether this has affected the quality of student learning where the programs have been implemented. We are dealing with a core shift that is virtually ungrounded in research. Were we to try this in other fields, we would be stopped by regulators and common sense at every level.
ASCD’s November 2012 edition of Educational Leadership is themed: Teacher Evaluation: What’s Fair? What’s Effective? This edition presents a generic overview of the current state of teacher evaluation, and particularly discusses value added assessment measures. The range of articles reflect all the concerns I’ve mentioned. I particularly like several articles that are offering recommendations about what to do with the results of all the testing and teacher evaluation that is going on. The threads in common here are very clear–lessen the impact of summative tests and devise a means to use more formative testing throughout the year in a feedback loop centered on teachers identifying what kids need to learn. Use teacher observations as a means to coach teachers toward improving their instruction. This sounds easy, but in general principals are both untrained to do so and have no time to spend in classrooms anyway. Those who have never been principals or building level administrators with responsibilities for day to day operations and teacher evaluation simply don’t have a clue about how problematic new teacher evaluation requirements are.
Educational Leadership isn’t exactly educational research per se, but a few of the articles come from folks with solid research backgrounds, and in the references at the end of the articles are a few meaty morsels worthy of further reading if you have access to a decent library where you can find the books and journals.
My favorite article is “How to Use Value-Added Measures Right,” by Matthew Di Carlo. He offers suggestions on how VA can be an appropriate element of teacher evaluation. The importance of getting this right is summarized by his comments in the first few paragraphs. He writes: “….there is virtually no empirical evidence as to whether using value-added or other growth models–they types of models being used vary from state to state–in high-stakes evaluation can improve teacher performance or student outcomes The reason is simple:It has never really been tried before.
Your health insurance is unlikely to pay for experimental treatments. We have mandates all over America to implement a diverse range of experimental treatments for teacher evaluation. It would have made sense to try such programs in small scale pilots across several states in multiple district contexts. Instead, we are demoralizing the profession by using unproven and unreliable means to evaluate performance, and then often publishing the results and destroying individual teachers.
You can find these articles on the ASCD website, though if you are not an ASCD member you may only be able to read abstracts. If you are a member, you’ll have the journal in your mailboxes. Share your copies with colleagues, check it out online. If you’re interested in or affected by VA assessments, Di Carlo’s bibliography is a good one–I’ll be adding some of the reference he lists to my own research page here at k12edtalk.