Why the Opposition to Testing and the Common Core?

I spent the last 20 years of my career dealing with data as a means to improve instruction, and I coached lots of people about how to make sense of data, standardized and local, as a useful component of daily instruction.  As teachers or administrators, we can’t know what our students have mastered without good tests, whether they are formative tests that are not graded, unit tests, interim benchmark tests, or standardized accountability tests.  In my own state of New York, there is a rising movement among some parents, some of whom are teachers, to opt-out of state NCLB tests.  Opposition to ‘national’ testing and the Common Core is growing in other states, supported by various groups who are concerned about increasing federal influence on local educational policies, costs, the misuse of tests for purposes other that what they were designed to do.

The two national testing consortia are talking about designing the tests to improve their utility as vehicles for instructional improvement, but whether these goals will be a part of the basic package of new tests or available at additional cost to local districts remains a bit fuzzy.  States are beginning to question the cost benefit analytics of the programs–both consortia are designing online testing while many school districts, including the low income districts that have always been the targets of the accountability movement, are going to struggle with paying for the Internet infrastructure and hardware required to support online testing.

So in broad terms, the growing opposition to the Common Core and the national testing programs represent multiple issues of concern to a growing variety of stakeholders, and they overlap significantly, making this a very complex situation.  Let’s review some of them here.

1.  Use of data for instructional improvement.  In New York, where at one time all state sponsored tests were released publicly after testing was complete, we could use the results for instructional improvement planning.  We had specific information about how the questions aligned to standards, we could see the question, and we had a p-value which told us how difficult the item was state-wide.  This was very useful for teachers, but that’s gone now that NCLB tests are secure.  Without seeing the questions, the rest of the information (how I did compared to anyone/everyone else) provides very poor instructional information.  I can still see whether I did well or poorly on a particular performance indicator, but I don’t have decent details about exactly what my kids have missed and I have to guess about why.  If I am in a district that has purchased commercial assessments that give me access to the questions, I have this capability during the school year, but many publishers tests are also secure and I have to rely on a generic description of academic trends that isn’t particularly useful.  I get far more information on what a student’s weaknesses are when I can review the test question and analyze the wrong answer responses as I want to plan interventions.

2.  Purpose of the tests.  Critics of NCLB testing write about this regularly.  The tests were not designed for either instructional improvement or teacher evaluation.  They have been co-opted by politicians and businessmen as a means to promote agendas other than school improvement.  Psychometricians who are not involved in developing or marketing these tests have written extensively about their concerns when they are used for high stakes teacher evaluation, or for determining student placement.  We don’t do the latter in NY, as far as I know, though there have been some examples of placement consequences for kids–it’s a local issue.  But in other states, testing occasionally has greater impact–promotion or retention, acceptance in accelerated programs or placement in remediation–based on results.  Often the uses of results doesn’t match the psychometric properties of the tests, which is problematic.  And in New York, where the state says it’s own tests count only for 20 of 100 points in a teacher evaluation, the rubric for the 100 points actually creates situations in which the 20 points on the state test can actually override the other 80 points.  This means everything about New York’s intricate evaluation systems can be blown away by a weak performance on the state 20% measure, meaning this 20% can, for some teachers, amount to the only measure of teacher quality that counts.

3.  Lack of transparency.  I hope this issue will go away with time.  It’s rather difficult to get information about the actual behavior of state tests–how did the results break out by student sub group?  By district demographics?  By years of teacher experience?  By class size?  By SES factors?  How did low income kids do in a district where they are 6% of the population compared to a big city, where they are a majority?

Moreover, everyone involved in rolling out new teacher assessments and new academic goals of the Common Core has been overworked and hard pressed to clarify what is coming.  As we heard repeatedly from the highest levels of New York State Education officials early on, these changes are like an airplane which is being constructed during flight.  Sadly, this is about the closest thing to  transparency I can think of early in the process–there wasn’t much more to be said.  They didn’t know how to get from where they started to the intended goal.

The proper answer to such comments from school boards and the public was often something like this:  “Where is the FAA?  Who approved the takeoff of this untested aircraft in the first place?  There’s too much being done here with no piloting, no research, no shakedown flights. But this concern was shoved under the rug.  New York adopted these changes to get $700 million of federal funds over 4 years, which was less than a half a percent of the per pupil spending on public education in the state.

And this is not just a New York issue, it’s national.  We don’t usually do this kind of take-off in other arenas, but it happens too often in education.  Let’s try the next new idea that some expert says is the solution to our problems, while we ignore the fact that it’s an untested and unproven program or methodology.  Did we know the Common Core was a singular organizational initiative from an educational think tank hired by the Council of Chief State School Officers which has, in effect unilaterally influenced the direction of public education?  Would truly empowered state education officials around the nation have adopted something this massive and unproven if they were not required to do so by other political forces, or if substantial federal funding during an economic downturn didn’t bribe them into agreement with Race to the Top conditions?  Would the public have found this a good use of funds if it were clear that there’s no evidence it would work?

4.  New teacher and principal evaluation systems.  A few districts in my region of New York who closely monitor new evaluation regulations are concerned about how honors class teachers did–they have all the high performing kids, and they think their special ed teachers got higher evaluation growth scores in their state testing measures because a point or two of improvement at the lowest level of performance is easier to get than increases among students already at the top.  This is one of the general criticisms of accountability programs that use any version of growth or value-added scoring.  From state to state, have the data been made available to independent researchers for an objective review? Do we have multivariate analyses available to look deeply at the results?  If not, why not, or when will this happen?  And if we do, so what?  Most states are about to change their tests again, from what they have been using to the new PARCC or Smarter Balanced consortia tests under development.  So the nature of available data is changing from last year, before the Common Core curriculum implementation, to Common Core based testing, and then in two years, to tests from one of the two consortia.  This fundamentally means three versions of accountability tests in as few as 4 years in many states, making comparisons of student achievement a statistical challenge for accountability purposes, to say the least.

5.  Narrowing of the curriculum.  How many fewer art, music, drama, etc. classes are gone, and replaced by supports for ELA and math?  Why does the edu/politico establishment ignore the evidence about the utility of arts and music education and their connection to math and science success?   How many kids have no more recess because they are scheduled for support time?  How many schools have dropped career education options in high schools?  How have the fiscal pressures of the past few years forced districts to make narrowing curricular decisions because of the fear of poor test results?  Is the focus on ELA and math appropriate for all students?

6.  What is college and career ready, anyway?  In the Atlantic Monthly of October 2012, Dana Goldstein wrote an important feature called “The Schoolmaster” about David Coleman, credited by many as the leading creator of the Common Core State Standards.  A nonprofit Coleman founded, Student Achievement Partners, provided the intellectual basis to the Council of Chief State School Officers. If Goldstein’s work is as accurate as it feels, he is far more important than our Secretary of Eduction.  He clearly has had more influence on the direction of US education than any other single individual, likely in our history.  And he’s now the head of the College Board, where his concepts of college and career readiness could transform the nature of SAT tests in the near future.

I happen to agree with the concept of Common Core State Standards, and I think their emphasis on critical thinking skills are long overdue, but defining success with a narrowly defined concept of college and career reading misses some important options for a substantial population of students.  Many educators think this focus rather significantly neglects the notion of career ready–we have been eliminating career and tech education all over, or transforming it into expensive tech honors programs at regional BOCES or Intermediate School or Educational Service Agency  locations as career education begins to emphasize forensics and high tech.  So where will we get our plumbers and carpenters and cabinet makers and auto mechanics and house painters and landscape gardeners?

7.  Local control of education.  Here’s the philosophical issue of the day: Should states/local school districts have virtually given up their role in determining the direction of their children’s education?  Constitutionally, this is a state role (which all but one state, Hawaii, turn over largely to local school boards) and not a federal one.  Though the Common Core is an initiative of states working together, it was not an initiative that involved local professional educators from the beginning.  It was an initiative farmed out to Coleman’s nonprofit business, pushed by the business/political wing of K-12 reformers, gained support from those concerned about the generic failures in urban centers across the country, and then offered as fundamentally a take it or leave federal money on the table proposition to the educational establishment throughout the nation.  The claim that all the states were directly involved in these plans is technically correct, but practically speaking the results were top down impositions, not bottom up reforms.

8.  Lack of funds for professional development.  In New York, teacher and principal evaluation rules mandate districts to provide professional development for teachers in order to be able to terminate them.  Given that schools have to upgrade/update technology infrastructure and hardware to prepare for online testing, and given a new state property tax cap, and given that increasing numbers of districts around New York are facing real economic stress and even bankruptcy, there won’t be any money to provide PD for low performing teachers.  Even a mediocre lawyer can prevent the termination of a teacher because the district’s responsibility for support to a weak teacher will be missing.  So who thought this was a good idea?

Many states claim their test results will be used to guide professional development programs to improve the work of teachers.  However, it’s typical to observe the decline in funding for educational professional development nationwide.  Schools are spending the money on assessments, not on professional development, and the tests themselves, as noted earlier, are both secured and not designed for teacher use to promote instructional objectives.  So two forces disconnect assessments from teacher improvement–funding and inappropriate test design.

Nationally, the movement to tie teacher evaluations to test scores was fueled significantly by Federal Race to the Top eligibility requirements.  This is another ‘reform’ initiative fueled without a quality research base of support.  Proponents of testing, and those wanting to pry public money out of the hands of school boards so that it’s available to alternate commercial programs have jumped on the novel idea that one can appropriately predict a student’s future success based on the scores of a teacher’s students.  The model is appealing, as it suggests an easy metric on which to judge the performance of schools and individual teachers.  But having followed the arguments closely for 10 years, it isn’t working particularly well anywhere.

Today, many groups are finally responding to some or all of these concerns by pushing back.  Some of the parent arguments arise out of fear for the well-being of their children, some out of frustration at loss of local control, some out of growing awareness of rising assessment costs and little demonstrable efficacy in using tests for instructional improvement.  Some opponents are teachers who are fearful of adverse consequences to themselves.  Academics, politicians and the occasional state education official are more openly questioning the speed of implementation, the lack of piloting, the unintended consequences, the public relations disasters in states where Common Core based testing produces a drop in test scores (teachers are not yet trained in the new curriculum objectives), and the mismatch between test design and the use of test scores.

Does the push-back accomplish much?  To date, not really.  Does it make a statement that might force politicians to take a second look at the unintended consequences of parental opposition?  The consequences of testing?  A real cost-benefit analysis of CCSS and national testing?  I personally hope it does, but I don’t think it will amount to much unless it continues to grow.  Get up to speed, and get involved!

Value-Added Models – New Research

If you’re following educational reform, you are aware that some flavor of value-added statistical modeling is being pushed throughout the nation as a means to identify good/bad teachers or good/bad schools.  Pushed now by the US Department of Education and Race to the Top, and promoted by a specialized cadre of assessment developers who stand to rake in massive profits from the testing required to generate data to be used in value-added analyses, this movement has been sweeping across the country.  It has, on the other hand, always raised the suspicions of educational researchers, some economists, and numerous statisticians who have suggested that the models, and the assessments behind them, simply don’t work well as a basis to make high stakes decisions about educational policy, about schools and teachers, or about children.

In January, the Educational Policy Analysis Archives, a peer-reviewed online policy journal, ran several research articles in a special issue covering value-added, called “Value-Added: What America’s Policymakers Need to Know and Understand.”  The title is very misleading, since my reading of this research suggests a better title might be “Value-Added:  What America’s Policymakers Consistently Ignore.”  I’m highlighting the articles here, with links, so readers can pick and choose what interests you.  Each separate article offers a substantial list of references for further exploration.

The opening discussion by the editors is “Value-Added Model (VAM) Research for Educational Policy: Framing the Issue.”  The editors provide background and relevance of six additional papers included in the special issue.

Diana Pullin, of Boston College, a lawyer and legal scholar, offers “Legal Issues in the Use of Student Test Scores and Value-Added Models (VAM) to Determine Educational Quality.”  Given that many states are implementing teacher evaluation models in which VAM measures will be used to dismiss teachers, Pullin analyzes the complexity of predictable court challenges as experts on both sides of the VAM movement offer conflicting testimony about the validity and reliability of tests and the statistics used for employment decisions.

Bruce Baker, Joseph Oluwole and Preston Green of Rutgers, Montclair State, and Penn State respectively,  in “The Legal Consequences of Mandating High Stakes Decisions Based on Low Quality Information: Teacher Evaluation in the Ract-to-the-Top Era,”  review issues around the utility of student growth models and VA models.  Their lengthy appendix outlines the VAM evaluation policies of several states.

Nicole Kersting and Mei-Kuang Chen of U of Arizona, and James Stigler of UCLA, offer a statistically heavy analysis titled “Value-Added Teacher Estimates as Part of Teacher Evaluations: Exploring the Effects of Data and Model Specifications on the Stability of Teacher Value-Added Scores.”  They conclude there are several problems with stability and sample sizes which suggest a need for more work to improve the measures.  If you’re not statistically competent, read the narrative–the supporting details are a second language for many of us.

Elizabeth Graue, Katherine Delaney, and Anne Karch of the U of Wisconsin, Madison provide a qualitative research paper entitled “Ecologies of Education Quality.”  Their analysis suggests that VAMs can’t capture or control for all the variables of effective teaching.

Sentinels Guarding the Grail: Value-Added Measurement and the Quest for Education Reform,” from Rachel Gabriel of the University of Connecticut and Jessica Nina Lester of Washington State is another qualitative analysis.  This is an interesting review of how proponents of VA suggest the methods are scientific and accurate, while the cautions of educational researchers with significantly opposite points of view are ignored or dismissed.  This analysis of the effective public promotional efforts of VAM and the glib solutions to educational problems that VAM supporters say will come through it’s adoption highlights how public policy has been shaped by politics and the media.

Finally, Moshe Adler of Columbia University, in “Findings vs. Interpretation in “The Long-Term Impacts of Teachers” by Chetty et al.” offers a a refutation of the claims of an often-cited study suggesting that the positive effects of a singe year with a good teacher last a lifetime,  Since the Chetty el al. study is frequently cited as justification of VAM and teacher evaluation changes, this is an important commentary on what Adler says are the false claims of this seminal economic impact study.

Is School Reform Failing?

We are inundated every day with pundits talking about how our school reforms will improve our results, yet after years of more and more testing and more and more accountability based on public reporting or test scores, our schools appear to remain problematic. Two recent articles exemplify the ideas of researchers who offer insights on changing the direction of the reform movement.

David Berliner, in “Effects of Inequality and Poverty vs. Teachers and Schooling on America’s Youth,” provides a strong argument that economic inequality and poverty are the issues that must be addressed to see any wide-spread improvement in student achievement.  Go to www.tcrecord.org, look under Articles, and search for Berliner to find an executive summary.

His content derives from his own and other research on how socio-economic issues are the driving forces that work against student achievement. He provides analysis of American student test scores broken out by levels of poverty in schools, and makes a clear case showing that our scores, when comparing like levels of student poverty, rise to the top, or very near the top on the international tests that are regularly used to claim our schools are failing. He makes it clear that our social support systems have failed, and our international competitors have done a far better job of providing supports to families and children in need. So, our schools are not failing upper and middle class students, but our society is failing lower-income families and schools.

What I find encouraging in his work (and he has a long history of debunking the critics of American education, which you might want to examine separately) is his willingness to identify sloppy reasoning and utter nonsense found in the media and in the arguments of reformers with political agendas. He recognizes there are examples of “occasional” (the emphasis is Berliner’s) success stories from individual schools and families who are successful and rise from poverty, or the “occasional” teacher who breaks through with low-income children, but he points out that these are exceptions, not the norms. Americans want to believe in this idealized version of the American Dream, but as policy, it simply doesn’t work. He point out that while we have long-living overweight smokers, no one seriously suggests building public health policy by examining these exceptions—these exceptions don’t drive public health policy, except in education, where reformers hold on to the unrealistic notion that finding only exceptional teachers for low performing schools will render poverty irrelevant, and testing to find great teachers will cure educational ills.

Taking a different, more traditional view on reform, David L. Kirp, a public policy professor at the University of California, Berkley, wrote “The Secret to Fixing Bad Schools” on February 10, 2013, in the New York Times. Kirp talks about the “striking achievement” of Union City, N.J. schools, an urban district that has done school reform successfully. They enroll almost every 3 and 4 year old in prekindergarten, work on character, focus on the individual needs of children, “figuring out what’s best for each child, rather than batch-processing them.” Strong principals encourage teachers to raise expectations, and teachers have responded. In Kirp’s words, “What makes Union City remarkable is, paradoxically, the absence of pizazz. It hasn’t followed the herd by closing ‘underperforming’ schools or giving the boot to hordes of teachers. No Teach for America recruits toil in its classrooms, and there are no charter schools.”

They started transforming their schools years ago when poor performance threatened them with a state take-over. The district designed an evidence based curriculum where “learning by doing replaced learning by rote.” Teachers were encouraged to work together and coaching and mentoring supported staff who struggled. Principals became education leaders.

Now folks from all over are visiting to see what Union City has done. Frankly, the research on what works in education has been rather clear for many years–Union City simply read it and implemented it in their school improvement work. So, here’s an interesting example of a school district that is one of the “occasional” examples of success that Berliner talks about.

Where do these two articles overlap? While at first they seem to reflect opposite approaches, I find them more complementary than not. These researches are not in the testing to find and fire weak teachers camp. They both want to see schools address root causes of academic failure with responsible strategies–Union City brings in high expectations, parent involvement, support for weak teachers, and a cultural shift that reflects a long-term strategy of continuous improvement. They both reject the current direction of school reformers who want to reinvent education with unproved quick fixes.

Kirp’s recent book is Improbable Scholars: The Rebirth of a Great American School System and a Strategy for America’s Schools, a more detailed look at the Union City Schools and what they represent as a legitimate model for school reform. Google him –his vitae lists lots of articles he’s written–many for Huffington Post–on education policy issues.

Race to the Top – A “Race … in the Right Direction?”

New York schools, despite having one of the largest percentages of students in poverty, consistently rate highly in Quality Counts, from Education Week.  The state was a second round winner of $700 million in Race to the Top funds from the Feds, and aggressively rammed new legislation through the state legislature to qualify.  The unintended consequences are now being felt by those responsible for implementation.  Teacher morale is down, the costs of implementation are proving to be unfunded mandates several times greater than what was gained from ‘winning’ the Race, and a tax cap pushed through by the Governor is wiping out district capacity to raise taxes even where the public would be willing to pay more to maintain quality schools.  Moreover, many are estimating that 40% of New York districts face bankruptcy in the next few years.

Dr. Kenneth Mitchell, the Superintendent of the South Orangetown Central School District in Rockland County, NY, contributed a wonderful analysis of the unintended consequences of Race to the Top in New York, published by the Center for Research, Regional Education and Outreach at the State University of New York at New Paltz.  Open ‘Discussion Brief 8’ from this link to read the full article.  With the cooperation of 18 school districts in Rockland, Westchester and Putnam Counties, the three counties bordering New York City to the North, Dr. Mitchell produced an eye-opening analysis of the impact of RTTT in this relatively wealthy region.  These three counties are among the richest in New York–and the 18 districts are representative of the range of wealth found in the 60 districts in the region, though, I think, they would average somewhat above the mean in community wealth because some of the larger and lower income districts did not contribute data to the study.

Financially, Dr. Mitchell reports that the four year return to the 18 districts from RTTT funds will be $520,415, while the expense to districts is estimated to be $6,472,166, for a deficit of $5,951,751, an unfunded mandate of nearly $400 per pupil, to be funded by local taxpayers.  And because of the tax cap, which now requires a 60% super majority to pass a budget above the 2% cap, very few districts will be able to fund the mandates without substantial cuts in programs somewhere.  The cost of implementing the state’s new Annual Professional Performance Review – APPR  (teacher and principal evaluation) by itself represents a 3% increase in local costs to pay for the testing and evaluation training required for implementation without considering the costs of redesigning instruction to move to the Common Core curriculum, which requires new texts, instructional materials, shifts of content from grade to grade, and additional teacher professional development for implementation.

The effects, as gleaned from the 18 districts responses to the program, are disturbing.  Despite Race to the Top’s promotion as a school improvement program, districts will see staff cuts and larger class sizes, and non-mandated programs will be cut.  Districts are cutting maintenance despite this being a wonderful opportunity to get cost effective pricing during an otherwise slow economy.  Priorities are shifting away from instructional services that districts have developed over years of successful local control in order to fund the external mandates.  Internal professional development and staff to provide it is being cut, and training is focusing on teacher evaluation requirements, not on classroom instruction. Some districts will be required to hire additional supervisory staff to complete the evaluation mandates, which require more time than existing programs.  Curriculum is narrowing as districts prepare for extensive testing that will be used for teacher accountability.  And finally, quoting directly from the brief, “..the hidden costs may be greater than the outlay in dollars.  Teachers and administrators, stressed by the rapid change, the demand for accountability via the new testing and observation requirements, and anxieties about receiving low scores, are very likely to abandon initiatives that may be innovative and beneficial for preparing the next generation, but are out of alignment with a narrowed professional agenda for staying within the ‘Effective” range on the APPR.”

Dr. Mitchell proposes some reasonable shifts that New York politicians could make to back off and redirect this initiative.  He also identifies several sophisticated research groups that question the use of tests for teacher evaluation as Race to the Top demanded, and question the massive national shift to the Common Core curriculum.  Both these shifts are untested and lack a research base to justify the depth of the changes.  And since the changes were imposed by the legislature in New York, and by legislatures racing to get Federal funds across the nation during the economic slow-down, the changes are out of the hands of state education departments even if those leaders were inclined to back off in the first place.  Since they were generally the designers of the system, they are unlikely to be responsive–it’s full steam ahead, into the chaos that will ensure.

Download the article pdf at the link above, and see the research citations in the “Works Cited” link as well.