Validity, Assessments, and Teacher Evaluation

Validity is being ignored by politicians and many reformers

Opposition to the Common Core Standards and to the massive testing regime that started with NCLB and continues with Race to the Top and the Common Core continues to grow.  Teacher unions have garnered much controversy with their opposition to using the tests for teacher evaluation, and many reformers discount union opposition as self-serving.  A deeper look at testing and teacher accountability is merited here, and a discussion of validity is the right way to look at this issue.  Unfortunately, validity is a characteristic of assessments that is little understood, if at all, by the politicians backing assessments for teacher accountability, and it is carefully avoided by those who can profit by discrediting public education.

Validity Background

The March 12 back page commentary in EdWeek, by Madhabi Chatterji, is entitled “Validity Counts: Let’s Mend, Not End, Educational Testing.”  The anti-testing movement is growing, finally, though many who oppose testing are doing so with little knowledge of the root problems with using tests as they have been forced upon schools and parents by current political movements.  Chatterji is a professor of measurement, evaluation and education at Teachers College, and is trying to make the point that validity, which “deals with the meaningfulness of test scores and reports” is important, and relates strongly to the design features and purpose of a test.  What we now have in NCLB and standardized tests were designed to measure, in some cases, the individual performance of students, and in others, the large group performance of students.  What kills validity is a mismatch between what the test designers want to measure, what is taught in classrooms, and ultimately how the results will be used.  Generally, there is a weak match at best when tests are new, and the match gets better if the results are going to be used to judge teachers and consequently teachers pay close attention to the test design and, quite naturally, teach to the test. This distorts the purpose of the test and invalidates the results, an unintended but predictable and observable consequence.  Where do we all think that SAT and ACT prep tutoring programs come from?  We did not teach to the California Achievement Tests or the Iowas, because they were generally achievement tests, not accountability tests: Teachers are motivated to teach to NCLB tests and will be to CCS tests as well so long as those tests are used judgmetally–for building or teacher evaluation.

Current tests were not designed for teacher evaluation

None of our current tests were designed to measure teacher performance.  They are too short and too narrow in focus to do anything like measuring the qualities and traits that make a good teacher.  So, if one were to follow the Standards for Educational and Psychological Testing from the American Educational Research Association, the American Psychological Association and the National Council on Measurement in Education, these tests, as currently written, would not be a part of teacher evaluation.  Politicians and educational profiteers don’t pay attention to professional testing standards.

Construct Validity

In general, psychometricians suggest that construct validity is the basis on which all validity issues are built.  It is “the degree to which a test measures what it claims to be measuring…”  So, testing companies are profiting greatly by developing a test to measure Common Core Standards.  The Common Core is not a curriculum, which complicates greatly the ability of an assessment designer to construct an assessment that measures an outline of concepts that are not content.  And, given that every state is allowed to implement the CCSS individually, how teachers are prepared to teach and what curriculum they use contributes to the inability of assessment writers to construct a valid test.  What’s happening is that two large consortia are writing tests for curriculum that doesn’t yet exist, and the Feds and others are suggesting that the tests can be used to identify students with problems and teachers who are ineffective.  Nonsense!  What will happen is more likely this: As folks experience the PARCC and Smarter Balanced tests, the curricula used in schools will adapt to the content of the tests, and the tests will push the evolution of the Common Core.

Let me offer an analogy, exaggerated to make the point easier to understand, of why these tests are too important, too soon.  The US Department of Energy decides to build a cold fusion plant to produce cheap power (the CCSS), so they mandate that all the physicists in American will be graded by how well they can build a cold fusion plant (produce the plant or be called ineffective).  But they don’t have plans for the plant or a model of cold fusion that has been tested and proven workable. They think accountability will motivate people to produce the desired results.  Sane folks won’t do that, of course, but education critics are pushing this method for ‘reform.’

Common Core is too new to be a basis for teacher or student evaluation

What we have is a movement that declares, I think rightly, that our standards should be higher and more uniform across the states, and we need to move toward those higher standards.  But where this goes wrong, and is an invalid construct, is to believe that testing students now or in the near future on the higher standards provides a useful means to make judgments about something.  We are working backwards, as the cold fusion example suggests.  What the current round of testing will tell us for the next several years is how closely the taught curriculum matches the decisions of the test designers.  Without a national curriculum, that’s all the tests can tell us, and that’s useful only to tell us if teachers are teaching what the test measures.  That’s backwards, and invalid.

Rollout of new tests is a mismatch to available curricula

An even greater problem in this massive testing movement is the mismatch between the curriculum available to teachers and the tests that are being given to students.  States have begun widespread implementation of Common Core based tests before the Common Core has been widely implemented.  And even in states that have been working toward Common Core curriculum for a few years, the grade level changes in content have created gaps in the preparation of students who don’t have the prerequisite, harder content of prior grades on which to build knowledge.  In simple terms, we are imposing tests that don’t match the content of what kids have been taught.  Experts have suggested a full match between the new grade level content will take 5 or 6 years for teachers, curricula, and students to transition and fill in the grade level sequential gaps.  Material covered formerly in grade 8 is now found in grade 6, for example, and as a result, students being tested in grade 6 could look like they are two years behind.  That states naively think even the best teachers can enable a grade 6 student who has not been learning the prerequisites from early grades to score well on the new content of grade 6 demonstrates a misunderstanding of testing and measurement.  This catch-up takes real time, and, by any responsible psychometric standard, test results are invalid until curriculum and teachers catch up.  Using the results to evaluate teachers and identify students in need of special services or remediation under this level of testing invalidity cannot be justified.

Bravo to those who are standing against this foolishness.  Teacher unions are correct to complain and oppose this use of testing, whether they understand validity or not.

More resources on validity

In addition to picking up a basic education research textbook and reviewing the chapter(s) on validity, you might also check out the special edition of the Teachers College Record, volume 113, September 2013, where several authors discuss assessment.  One article in particular is by Eva Baker, of UCLA.  Dr. Baker is Co-director of the Center for Research on Evaluation, Standards, and Student Testing (CRESST), a wonderful source of information about assessments.  “The Chimera of Validity” is a good read if you can access it, or you can purchase it for $12 here:

Leave a Reply

Your email address will not be published. Required fields are marked *