What Do We Know About the Long-term Impacts of Teacher Value-Added?
Committee on Education
University of Chicago
Stephen W. Raudenbush
- Two recent studies provide evidence that attending the class of a high-value-added teacher predicts higher-than-expected educational attainment, earnings, and other adult outcomes.
- In one study, part of the impact of attending an effective classroom may have been attributable to small class size; in the other, part of the effect may be attributable to the effectiveness of the school.
- Teacher value-added scores “fade out” over time: knowing that a student had a teacher with a high value-added score one year provides little information about how well that student will fare on achievement tests several years later.
- The studies provide important new evidence on the significance of early classroom experience to later success.
Proposals to evaluate teachers based on their “value-added” to student test scores generate intense debate. Underlying the debate are concerns about three factors: bias, precision, and relevance. Previous Carnegie Foundation briefs have detailed the reasons why the first two are significant concerns. But even if value-added scores were unbiased and reasonably precise, their usefulness for evaluating teaching would still depend on the third factor—their relevance to the aims of schooling. After all, helping children do well on an achievement tests is of little value in itself. The question is whether a test score gain in a given year of schooling represents growth in skills that matter over the long-term.
My aim in the current brief is to consider a key aspect of the relevance of value-added scores: their predictive validity—whether teachers who produce high value-added on achievement tests also engender lasting cognitive and non-cognitive skills that help prepare their students for success in later life.
One measure of the predictive validity of value-added is the extent to which it persists or “fades out” over time. At issue is whether elevated value-added scores displayed during the initial year persist in subsequent years. I review 10 studies of the persistence of value-added scores. In each case, researchers compute value-added scores in an “initial year” and then in subsequent years. All studies show that value-added scores tend to fade out over time. Five years after the initial year, it appears that 75 percent to 100 percent of the initial impact has disappeared.
One might infer from these results that attending the class of a teacher with high value-added in a given year has little consequence beyond that year. However, two careful, large-scale studies, reviewed in detail below, suggest that despite the lack of persistence of value-added on future test scores, one year of experience with a high-value-added teacher predicts higher rates of college attendance and adult earnings, as well as other important outcomes. While the effects are not large for individual students, they become substantial when they are aggregated over the students a teacher encounters. Moreover, the cumulative effects of a sequence of effective teachers may be substantial for an individual student.
The seeming contradiction between the lack of persistence of value-added on achievement test scores and the significant impacts on adult outcomes frames an important puzzle for future research. One possible explanation is that teachers who produce comparatively high gains on test scores are also effective in producing gains in other skills—deemed “non-cognitive” skills—that matter in the labor market and in other aspects of adult life. And one study suggests that high-value-added teachers are indeed comparatively effective in promoting high levels of effort, initiative, and classroom participation among their students. Another possibility is that teachers who produce high gains on test scores also produce high gains on deeper cognitive skills, such as reasoning and problem-solving, that current tests may not fully capture but that pay off in the labor market.
Skeptics may argue that value-added scores in the initial year and estimates of later impacts share a bias. For example, it might be that high-value-added teachers work in particularly effective schools, and that students who attend these schools for sustained periods see not only high initial test scores but also favorable long-term effects. Despite efforts by researchers to identify and remove such biases, they cannot be entirely discounted. More specifically, Chetty et al. (2013) attribute students’ labor market gains to the value-added of individual teachers, despite the fact that some of these gains may be attributable to attending an effective school. And teacher effects estimated by Chetty et al. (2011) appear to include the impact of reduced class size in addition to the impact of individual teacher skill.
The subsequent sections of this brief consider in more detail (a) the size of the initial value-added effects; (b) the persistence of initial value-added; (c) reported impacts on adult outcomes; (d) potential explanations for these findings and suggestions for further research; and (e) implications for school practice.
Magnitude of Initial Value-Added Effects
To calibrate the importance of teacher value-added for students’ future outcomes, we need to think a bit about how much teachers vary in the value-added score for a single year. If these initial scores vary a great deal, they might also strongly predict later outcomes. But if they vary only a little, one would expect them to be of little use in predicting future outcomes. The consensus seems to be that attending the class of a teacher who is one standard deviation above her peers in value-added is associated with a gain in achievement of 10 to 15 percent of a standard deviation in student achievement.
To make this meaningful, consider a comparatively “good” teacher—one who is in the 70th percentile of the teacher distribution in value-added. A teacher one standard deviation below such a teacher is at about the 30th percentile. Studies so far tell us that we can expect the “better” teacher’s students to score about 6 percentile points higher, on average, on a standardized achievement test than the students of the “worse” teacher. This is a difference between being at the 53rd percentile and the 47th percentile. While this impact may seem small, it would be quite significant when aggregated over all the students in a class, if it persisted and laid the basis for lasting differences in socially valued outcomes. Educational interventions that can produce an effect this large in one year are often regarded as quite successful.
Persistence of Value-added
I found 10 studies of the persistence of initial value-added over subsequent years, and these are listed in Table 1. These studies are based on computation of a teacher’s value-added to the scores on tests taken one, two, or more years after a student has encountered that teacher. Looking at the table, consider again two teachers who differ by one standard deviation on value-added in the initial year. How much of this initial difference remains one year after? The lowest estimates suggest that only about 18-25 percent of the initial difference persists. The most optimistic estimates are that 50 percent does. And the median is 24 percent, meaning that about a quarter of the initial value-added remains after one year. Fewer studies follow students for more than one year, but those that do suggest that initial differences continue to fade, albeit perhaps at a slower rate. Two studies follow students for more than 3 years after the initial year, and these suggest that 25 percent or less of the initial difference persists. The consensus seems to be that there is a substantial decay over time in value-added to future achievement test scores.
Table 1: Persistence of Value-Added After Initial Year as Fraction of Value-Added During the Initial Year
|Study||Sample||Yr. 1||Yr. 2||Yr. 3||Yr. > 3|
|Kinsler (2012)||N=689,641 students, grades 3-5, 1998-2005, in North Carolina||.24 (math)
|Master, Loeb, and Wycoff, 2014||N=700,000 students, grades 3-8, 2005-2226 in New York City||.19 (math)
.21 (language arts)
|McCaffrey et al. (2004)||N=678, grades 3-5, large suburban district||.25||.15||–||–|
|Lockwood et al.||N=10,000, grades 1-5, large urban district||.18||.15||.14||.12|
|Kane and Staiger (2008)||97 pairs of teachers, grades 2-5, randomization to students to teachers within pairs||.50|
|Jacob, Lefgren, and Sims (2010)||n=18,240, grades 4-15, mid-size Western District||.20|
|Rothstein (2010)||n=99,071, grades 3-5, North Carolina statewide||.27 (math)
|Measurement of Effective Teaching (2012)||1811 teachers randomized within schools to student rosters, grades 4-8 in 6 school districts||.45|
|Chetty et al. (2012)||10,992 students randomized to classes within 79 schools in Tennessee||0|
|Chetty et al. (2013)||2.5 million children grades 3-8 in New York||.50||.40||.20||.20|
The researchers cited in Table 1 have offered several explanations for the apparent fade-out of value-added scores. One is that high value-added scores in the initial year reflect teacher efforts to “teach to the test” rather than to produce meaningful skills. While plausible in light of the results in Table 1, this finding would suggest that exposure to a teacher who produces high value-added would not increase a range of favorable long-term outcomes. It seems implausible that the teachers who are best at teaching to the test are also best at fostering more general skills. Testing this explanation requires us to ask whether exposure to a high-value-added teacher has long-term benefits, and that is the next topic of this brief.
Second, it may be that tests taken in the initial year produce lasting gains on the content those tests measure but that tests in subsequent years measure different skills. For example, the initial test might measure a child’s ability to add double-digit numbers, while the later test might assess the child’s ability to multiply and divide fractions. Knowing how to add may only modestly predict knowing how to manipulate fractions. This explanation would seem, however, to imply that we would see less persistence in math scores than in reading scores, because the skills required to master a series of mathematical skills are believed to change more rapidly over time than the skills required for reading. Yet, we see low persistence of value-added scores in reading as well as math.
A third explanation is that subsequent teachers may be constrained in their ability to capitalize on what earlier teachers have taught. This explanation would be plausible if a teacher’s class were composed of a small subset of students who gained considerable skill during the prior year and a larger subset of students whose previous gains were modest. The teacher of such a class might be inclined to pitch the instructional level to the larger set of lower-achieving students, preventing those who had benefited from prior good teaching to advance quickly. This explanation seems to predict that high value-added in the initial year would not predict favorable long-term outcomes.
Long-Term Impact of Value-added
Two of the 10 studies listed in Table 2 followed students from their elementary school classrooms into adulthood, obtaining data on long-term outcomes, including college attendance and the quality of the college attended around age 20, earnings at age 28, the quality of the neighborhood of residence during adulthood, and teen parenthood. The two studies differ in their design, but taken together, they suggest that early classroom experience influences long-term outcomes and that teacher skill is a key source of these impacts.
The first of these studies is based on a pioneering experiment in Tennessee that tested the impact of class size on student learning. In this study, which covered a period in the 1980s, students in 80 schools were assigned at random to kindergarten and first- and second-grade classrooms that were designated as large or small. Teachers were also assigned to classrooms at random. Remarkably, a team of researchers was able to obtain extensive administrative data on the long-term outcomes of this experiment. The random assignment of students to classrooms (within schools) provided a solid basis for assessing the long-term impacts of classroom membership. These are reported in Column 1 of Table 2 below.
Importantly for our purposes, these impacts are not necessarily the impact of teacher effectiveness alone. A classroom might be effective because of its small class size or because of random variation in peer composition. However ill-defined the differences, the beauty of this study is that researchers were able to characterize the distribution of classroom impacts. Column 1 indicates that two classrooms that differed by one standard deviation in value-added produced a student learning difference of nearly a one-third standard deviation. In practical terms, this is a difference of about 9 percentile points. More remarkably, the results imply that students attending the more effective classroom will have earned, on average, about $1,520 per year more as young adults than will students who had attended the less effective classroom. The authors were able to establish that only a part of this effect can be explained by class size or peer composition. Moreover, because random assignment of students to teachers occurred within schools, this effect cannot be attributed to the overall effectiveness of the school. It is instead attributable to the impact of the classroom assignment.
The results in Column 1 suggest that the classroom to which students were assigned had an important influence on later outcomes. But they do not suggest that early gains on test scores help us explain those classroom effects. The next question the authors asked was whether classes that specifically boosted test scores were those that also improved long-term outcomes. The answer was that they did. Attending a classroom that increased test scores by one standard deviation (about 8.8 percentile points) during the initial year of the experiment was reported to increase college attendance, the quality of the college attended (as indicated by the mean adult earnings of its graduates), and earnings. The impact on college attendance was small (just over a quarter of one percentage point in a sample of whom 45.5 percent attended college) as was the impact on college quality. However, the earnings impact of $1,619 per year is large when aggregated over all of the students in the classroom.
Three features of this study are notable. First, because it is based on random assignment of students, it is free of key sources of bias (see Note 1). Second, it does not establish the predictive validity of teacher value-added scores per se, but rather, the more global effect of attending a classroom that worked well for a variety of possible reasons, including its size. But the study does suggest that classrooms that produce test score gains also produce valuable long-term outcomes. Third, it compares classrooms within the same school. In contrast, value-added scores are typically computed for teachers who work in different schools—comparisons that pose special challenges to the validity of conclusions drawn.
The second study of long-term outcomes directly addressed the question of teacher value-added by comparing teachers who work in different schools. It used a sample of 2.5 million children attending New York City schools in grade 3-8, and it used administrative records to follow them into adulthood. This study was not based on random assignment of children to classrooms. However, authors took care to identify and control for the sources of bias that might arise when it is not possible to conduct a randomized experiment.
Table 2: Impacts of Value-Added on Adult Outcomes
|Impact of classroom quality overall (Chetty et al. 2011)||Impact of classroom value-added (Chetty et al. 2011)||Impact of teacher value-added (Chetty et al. 2013)|
|Initial test scores||8.8 percentiles (.32 sd)|
|College Attendance||0.28% above mean of 45.5%||0.82% above mean of 37.22%|
|College Quality index||0.06 sd||0.02 sd|
|Earnings||$1520 (8.8% above mean)||$1619 (11.1% above mean)||$350 (1.65% above mean)|
|Teen parenthood||0.61% below mean of 14.3%|
|Other outcomes||Increases in neighborhood quality, saving with 401K|
The results of this study (Column 3 of Table 2) in some ways mirror those of the Tennessee experiment (Columns 1 and 2). We see small but statistically significant effects of teacher value-added on college attendance and college quality. We see a statistically significant but much smaller impact on earnings (having a teacher with one standard deviation higher value-added predicted earning $350 per year more than expected at age 28). The authors also reported a reduction in teenage parenthood and increases in neighborhood socioeconomic status (as measured by the fraction of neighbors with a college education) and savings (as indicated by having a 401k retirement account). These, too, were small but statistically significant effects.
The authors devised an ingenious test of the validity of these findings. They asked whether a cohort of children in a particular school in a particular grade achieved less the year after a high value-added teacher left than did the previous cohort of students, and, conversely, whether children gained more the year after a high-value-added teacher joined the staff. Their findings essentially replicated the results shown in Table 2 (Column 3) for college attendance and college quality; however, findings regarding earnings were too imprecise to test impacts. A key assumption in this analysis is that the movement in and out of schools by effective (or ineffective) teachers is a cause of subsequent student achievement more than a result of past school effectiveness. This research strategy is ingenious, yet we cannot rule out the possibility that at least part of the impact that the authors ascribe to teachers is actually generated by an effective school. I say this because the authors did not control for the school a child attended when they assessed the association between teacher value-added and long-term outcomes. (See Note 11).
This remarkable study suggests that teacher value-added has long-term consequences for children. Although the results are non-experimental, they are consistent with the experimental findings (Columns 2 and 3 of Table 2) concerning the impacts of effective classrooms. Taking these studies together, I conclude that classroom effectiveness in the early grades and, more specifically, teacher effectiveness as indicated by value-added have some influence on important life outcomes.
Questions for Future Research
The research reviewed here suggests that teacher value-added explains a modest, but not negligible, fraction of variation in student test scores during the initial year. However, the effects of a teacher on later test scores are much smaller, and most of the initial effect on test scores has faded out after three years. Despite the failure of value-added impacts to persist with respect to test scores, the research reviewed here has established that indicators of classroom effectiveness predict a range of adult outcomes. How can we reconcile the lack of persistence of impacts on test scores with the later emergence of impacts on important life outcomes?
Several explanations for the fade-out of test score effects fail to account for the emergence of later outcomes. Teaching to the test might account for ephemeral effects on test scores but can hardly account for long-term benefits. The same is true of explanations that emphasize the inability of later teachers to capitalize on the gains produced by effective early teachers. The possibility that later tests measure skills not captured by earlier tests remains somewhat plausible.
The single explanation posed by the authors of the long-term studies reviewed here is that teachers who are effective at producing initial gains in test scores are also effective in producing gains in non-cognitive or “soft skills.” This is the same explanation that researchers have drawn regarding the long-term effects of several experimental early childhood interventions. These interventions showed early impacts on test scores that faded completely over the next several years. Nevertheless, they produced favorable outcomes over the life course. Evidence suggests that the effects of early and sustained interventions on non-academic skills accounts at least in part for these long-term benefits.
Chetty et al. (2013) tested the impact of teacher value-added on an index of non-academic skills, including measures of initiative, effort, and collaboration. They found that teachers who produced high initial value-added on test scores also produced favorable non-academic skills and that these were correlated with the adult outcomes of interest. These findings are consistent with the notion that teacher impacts on non-academic skills may help us understand the puzzle of fading test score effects and the emergence of long-term impacts.
I would urge caution, however, in inferring that the skill gains not measured by later achievement tests are all “non-cognitive.” At best, achievement tests capture some important aspects of cognitive skills needed in the labor market. It is quite plausible that teachers who are effective at producing gains on a given test are also good at producing gains in deeper cognitive skills not captured by standardized tests.
Considerably more research is needed on how specific aspects of teaching contribute to the range of skills that pay off in adult life. The research reviewed here has shown that it is possible to trace long-term effects of classroom experience, so we can anticipate more studies of this type. I would encourage researchers to think about the connections between academic learning in various subjects and the development of non-academic skills such as effort, initiative, persistence, and collaboration. In school settings, much of what we ask of students concerns academic learning. Reasoning and problem-solving skills appear ever more important, and substantial effort, initiative, and collaboration are likely essential for developing these skills. At the same time, it is likely that success in developing academic skills reinforces determination, effort, and initiative. In sum, it seems likely that academic and non-academic skills that matter in the labor market develop together and are mutually reinforcing.
The researchers cited above note that achievement tests may poorly reflect these non-academic skills. I would emphasize that these tests may also fail to capture key cognitive skills, and, in particular, reasoning and problem-solving. In sum, it seems that we need a theory to explain how effective teaching fosters a range of skills and dispositions that, together, shape prospects for future success. It will take more long-term studies of the impact of teaching to test such theories. Moreover, the evidence that teachers vary in the extent to which initial value-added persists suggests the need to assess the impact of teachers and schools on a wide range of outcomes.
Implications for Policy and Practice
Teacher value-added scores, computed with care, should be taken seriously because these scores serve as meaningful signals of long-term benefit to students. The caveat “computed with care” is important. The researchers cited here took great care to identify and control for potential sources of bias. At the same time, teacher value-added scores are not precise. As a result, even those who advocate using value-added in teacher evaluation emphasize the importance of combining value-added with data from other measures of classroom effectiveness.
The research suggests another way that we can and should enrich data on effective teaching: examining the value that teachers add to outcomes other than standardized test scores. The evidence seems to suggest that teacher effectiveness contributes to long-term outcomes in ways that are imperfectly captured by test scores. Effective teachers likely assist their students by producing a range of skills that support later success. Many school districts already have data that can help them assess teacher contributions to achievement in later grades, course-taking, high school graduation, and even college attendance and completion. We can thus see potential for policymakers, practitioners, and researchers to collaborate in constructing a richer set of effectiveness indicators so we can better appreciate the impact of teaching.