A Focus on Value-Added Measures
An Introduction to the Carnegie Knowledge Network
The field of education is awash in data. There are numbers on academic proficiency, numbers on enrollment, numbers on graduation, and more. But until recently, no statistic has measured what experts have come to agree is the single most important factor in boosting student achievement – the effectiveness of the person standing in front of the classroom. A great teacher, research shows, can add substantially to a student’s learning gains, and an especially poor one can actually reverse those gains. Spurred on by these facts, by public pressure, and by the incentives offered by federally funded programs, states and districts are developing ways to measure the value that a teacher adds to her students’ learning based on changes in their annual test scores. Compared to existing evaluation tools, value-added methodology offers the advantage of providing an objective measure of progress students make year to year rather than measuring solely their achievement level.
But as with any system of ranking or scoring, value-added measures are far from perfect; they are subject to bias, distortion and error. Thus, many teachers, even as they embrace the idea of accountability based on more meaningful evaluations, have vocally opposed systems that judge them by value-added measures. In particular, they argue that there is no way to separate true teaching effectiveness from the many factors affecting achievement that lie outside of an educator’s control. Computing value-added estimates is indeed a complex task. And the consequences of those estimates can be huge. If we are going to use value-added measures to make judgments about teacher quality, we owe it to these teachers, to their students, and to the public to do all we can to ensure that the means of computing them are accurate, reliable and fair.
It is, as the following pages will show, a tall order.
When educators are given value-added estimates, the first question they face is how to interpret them. In Knowledge Brief 1, Stephen W. Raudenbush and Marshall Jean give educators some guidance on this essential topic. The key, he says, is to understand that as with a weather forecast or a medical diagnosis, these estimates are just that – estimates. They are subject to uncertainty, caused by bias and imprecision. It is essential to determine the cause and extent of this uncertainty, Raudenbush says; only then can we judge the extent to which value-added can inform different types of decisions.
Another fundamental question is how well valued-added models can level the playing field. How successfully can they isolate the contributions of individual teachers from student background characteristics like poverty or English language proficiency? In Knowledge Brief 2, Daniel F. McCaffrey analyzes the research on this crucial subject and finds that while value-added models do partially level the field, they can’t fully adjust for all the factors outside a teacher’s influence and which differ among classrooms. What if a teacher is consistently assigned classes of students who are advanced or who are struggling? If we do not fully account for how student are assigned to classes, we distort the picture of teacher effectiveness.
In Knowledge Brief 3, Susanna Loeb and Christopher Candelaria take on the matter of consistency – or lack of it – in value-added scores. Teachers’ value-added estimates can vary widely from year to year, raising important concerns about their reliability, and strongly suggesting that several years of data should be used. The researchers find substantial overlap in measures of a teacher’s value added across all areas, but they see substantial inconsistencies, as well. Although further study is needed, the research suggests that much of the inconsistency results from true differences in educator effectiveness over time and contexts.
Although many states and districts agree on the need to, there is no universally accepted method for translating student achievement growth into a measure of teacher performance. In Knowledge Brief 4, Dan Goldhaber and Roddy Theobald look at several models for doing so. Research shows that value-added models that do not account for student background factors (such as race, ethnicity and social class) correlate highly with models that do. Yet even when correlations are high, different models will categorize many teachers differently. The biggest difference between value-added estimates comes between models that ignore possible school effects and models that explicitly recognize them.
Finally, how do value-added measures compare to other gauges of teacher effectiveness? In Knowledge Brief 5, Douglas N. Harris addresses the question of whether other measures, such as classroom observation, are likely to arrive at the same result. He finds that value-added measures are positively correlated with other measures of evaluation, but not very strongly. The measures should be expected to yield some differences if they are weighing different aspects of teaching, but they differ also because each has problems with reliability and validity.
With some districts already using value-added measures to make decisions about hiring, firing and promotion, value added is a sharp tool. And it is being wielded despite many unanswered questions. These questions can’t be answered solely with existing data, or without the exercise of sound judgment. In the following pages, we address the most important issues surrounding value-added measurement today. We review the existing research, draw conclusions, and try to provide educational leaders with guidance on how research can inform the prudent use of this important measure.