Value Added and Teacher Evaluation

Now that value added measures are commonplace throughout the country, what can schools do to effectively incorporate them into teacher evaluation?  This is the central question that classroom teachers and building principals must answer as they meet requirements for federal funding that is driving this model.  I’m enthusiastic about using data for school improvement, and it’s been central to the last 15 years of my career as data warehouses and analysis tools for classroom teachers have emerged.  But the way states have adopted the use of VA assessments varies greatly, and the impact of a single round of testing on teacher evaluations also varies.  This variance is problematic at several levels.

The critics of VA point out that student scores can vary from year to year–one source of variance.  If you read teacher blogs or blogs from assessment critics about how VA assessments are used, or you read some progressive school improvement bloggers, you’ll read many examples of students who scored highly one year and were significantly lower in scores the next–changes that could not be conveniently explained by the effects of teachers or schools.  Scores of individual teachers also vary across classes in the same year and from year to year, and examples of this variance fill the pages of critical analyses of VA methodology.  In such scenarios, a given teacher of a single subject in the same year can show significant variance between students across multiple sections of the same course.  A teacher might also score highly for a year or two and then plummet the following year.  Such variance is not explainable or accounted for particularly well in the statistical models.  I’ve identified several research analyses on these topics in the Research section of this site.

The concept of Student Growth Percentiles, which is a variant of VA analysis, compares similar student scores and ranks students according to their growth.  When the student scores are aggregated into teacher classes, the teacher can then be given a growth score that is intended to fairly account for the population in the teacher’s class.  This system weighs the aggregated class groupings to similar students so that teacher X, with 10 ESL students and 20 regular students has the scores of the ESL students compared to other ESL students in the state.  This is intended to lessen the impact of second language students on the teacher’s VA score.  However, an informed Internet reader will still find examples of teachers in New York (which is using Student Growth Percentiles for the first time this year) who are rated by principals and parents and colleagues as outstanding who have a mediocre VA score.  So rating teachers on VA scores, even with Student Growth Percentiles, is fraught with problems.

The widely adopted but simplistic solution is to use a VA score (or any other assessment score) as only a part of a teacher evaluation.  Using a summary test score in a subject area for only part of a teacher evaluation while adding classroom observations as a large, or largest component of evaluation is a commonly proposed means to lessen the impact of assessments, and a tacit recognition that there’s a problem with this unproven system.  States have jumped into this complex process without any national research on what works, without agreement on how to use student test results to evaluate teachers, and without any real analysis of whether this has affected the quality of student learning where the programs have been implemented.  We are dealing with a core shift that is virtually ungrounded in research.  Were we to try this in other fields, we would be stopped by regulators and common sense at every level.

ASCD’s November 2012 edition of Educational Leadership is themed: Teacher Evaluation: What’s Fair? What’s Effective?  This edition presents a generic overview of the current state of teacher evaluation, and particularly discusses value added assessment measures.  The range of articles reflect all the concerns I’ve mentioned.  I particularly like several articles that are offering recommendations about what to do with the results of all the testing and teacher evaluation that is going on.  The threads in common here are very clear–lessen the impact of summative tests and devise a means to use more formative testing throughout the year in a feedback loop centered on teachers identifying what kids need to learn.  Use teacher observations as a means to coach teachers toward improving their instruction.  This sounds easy, but in general principals are both untrained to do so and have no time to spend in classrooms anyway.  Those who have never been principals or building level administrators with responsibilities for day to day operations and teacher evaluation simply don’t have a clue about how problematic new teacher evaluation requirements are.

Educational Leadership isn’t exactly educational research per se, but a few of the articles come from folks with solid research backgrounds, and in the references at the end of the articles are a few meaty morsels worthy of further reading if you have access to a decent library where you can find the books and journals.

My favorite article is “How to Use Value-Added Measures Right,” by Matthew Di Carlo.  He offers suggestions on how VA can be an appropriate element of teacher evaluation.  The importance of getting this right is summarized by his comments in the first few paragraphs.  He writes:  “….there is virtually no empirical evidence as to whether using value-added or other growth models–they types of models being used vary from state to state–in high-stakes evaluation can improve teacher performance or student outcomes  The reason is simple:It has never really been tried before.

Your health insurance is unlikely to pay for experimental treatments.  We have mandates all over America to implement a diverse range of experimental treatments for teacher evaluation.  It would have made sense to try such programs in small scale pilots across several states in multiple district contexts.  Instead, we are demoralizing the profession by using unproven and unreliable means to evaluate performance, and then often publishing the results and destroying individual teachers.

You can find these articles on the ASCD website, though if you are not an ASCD member you may only be able to read abstracts.  If you are a member, you’ll have the journal in your mailboxes.  Share your copies with colleagues, check it out online.  If you’re interested in or affected by VA assessments, Di Carlo’s bibliography is a good one–I’ll be adding some of the reference he lists to my own research page here at k12edtalk.

Are schools really failing?

One of the charms of ‘retirement’ will be having the time to read and pass on commentary, blogs, research, and new articles that I find online day by day. An issue I have been following for my entire career revolves around the controversy of whether American schools are failing.

I recall reading Why Johnny Can’t Read when I was in junior high in 1958, though I admit I read the Reader’s Digest Condensed Version of the book and didn’t read the original until I started teaching.  The school success/failure debate rages on, reincarnating every few years, but generally it’s fueled by the same basic issues–why don’t all students succeed, and what can we do about it?  When you’ve worked with educators who comment on the status of school reform over a spectrum of 90 years as I have (the educators I started working with in 1968 had 45 years of experience to share with me, and I have 45 years of my own), you hear again and again that the more things change, the more they stay the same.

I want to feature two authors today who appear to support opposite sides of this debate.  Michael Lind, in Education Reform’s Central Myth, (Salon.com, August 1) points out that when one considers the performance of most American schools where poverty does not affect achievement, our schools are actually pretty good.  He uses South Korea and Finland as examples, pointing out that the percent of GDP spent on education there exceeds that spent in the US.  He suggests the 35% of students in poverty produce the low international rankings, not the 65% of US students who are actually doing well.

Taking the other side, Marilyn Rhames, a Chicago Charter School teacher, in Reforming the ‘Myth’ of America’s Failing Schools (It’s Actually True) (Education Week Teacher, August 8) highlights her experiences in poor urban schools, suggests Lind is off-base suggesting poor kids can’t learn, and offers a coherent commentary on the danger of loosing sight of the problems of bad schools in low-income communities by suggesting schools are not really failing.

Interestingly, both are fundamentally correct, though I think Rhames has misread Lind, which is so often the case when authors point out the links between community incomes and poor schools.  When the reality of poverty’s connection to low achievement is made clear, those who are concerned about how to close this gap quickly raise the race card to say bad schools are the problem, not poverty.  But bad schools are found in communities where poor families have no political clout, fewer connections to schools, less awareness of what a quality education should look like, poor nutrition, children who come to school with large vocabulary gaps compared to middle class children,… the list goes on.  In this chicken or egg question, I believe poverty ’causes’ bad schools, and the effect is immensely challenging to overcome.

Lind does not suggest poor kids can’t learn.  I’ll offer a bit of extrapolation from Lind’s argument–schools are failing for poor children, but not overall. Schools in poor communities are under-resourced, often staffed by less experienced teachers, and have high turn-over rates and leadership issues that are less common in schools from higher income communities.  Schools are doing as well for middle class and well-off kids in America as they are anywhere in the world.  The issue gets back to community poverty.  Where nations deal with poverty (there is a massive social welfare net in Finland–poverty does not spill over to affect classrooms) they have good schools.  Where they don’t, schools suffer.

Lind is correct to point out that America chooses to paint all schools as bad and try economic models of competition to make up for inconsistency in performance.  No nation anywhere has tried this successfully, but we are driven by the notion that competition is the solution to everything, including poverty.  Rhames correctly notes that good teachers can go a long way to improve bad schools, and we can’t give up on kids because they are poor.  They are not in disagreement–they are talking about two different issues.

In future commentaries, I’ll dig deeper into international tests to highlight how American kids actually are doing.  In the meantime, interested readers can Google materials by Gerald Bracey and get his 2004 book, Setting the Record Straight: responses to misconceptions about public education in the U.S.  While it’s 8 years old, he does a fantastic job of skewering the selective and misleading use of data by public education critics.