Is Value-Added Accurate for Teachers of Students with Disabilities?
Daniel F. McCaffrey
and Heather Buzick
- Standardized test scores for students with disabilities pose challenges for calculating teacher value-added for special education teachers, and some general educators who teach many students with disabilities.
- Scores for students with disabilities can be very low, so that misspecification of the value-added models might attribute their low scores to their teachers. Low scores can also increase the random errors in value-added.
- Inconsistent use of testing accommodations across years for students with disabilities can create variability in their growth that can be incorrectly attributed to teachers.
- Teachers’ value-added might not represent their contributions to the learning of students with disabilities because these students are less likely than others to have test scores that are used in calculating it.
- Including disability status and other factors, such as accommodation use, in the models may reduce systematic errors in the value-added for teachers with large proportions of students with disabilities, but doing so could create incentives for improper placement of students into special education.
- Value-added for many general education teachers changes very little when scores from students with disabilities are excluded from calculations.
- States and districts may find it useful to monitor teachers’ value-added for connections to the proportion of students with disabilities in their classes and other evidence of systematic error. They should be prepared to revise their accountability systems if needed.
Under laws providing for equal access to quality education, most American students with disabilities are taught largely in general education classrooms. And along with all other students, they are included in standards-based reforms—policies that hold their schools and teachers accountable for what they do or do not learn. In more and more cases, policies require that a teacher be judged partly by his value-added—a measure of what he contributes to student learning as determined by student scores on standardized tests. States and districts, of course, want to make sure they have accurate measures of these contributions. But students with disabilities pose several challenges for calculating value-added. They tend to score low—often very low—on regular state assessments, and most receive accommodations, such as extra time. The result can be scores that are unreliable or not comparable to those of other students or across years. In other cases, students with disabilities have test scores that cannot be used to calculate value-added because they take alternative assessments. At the same time, many students with disabilities are taught by multiple teachers; they may be taught by two teachers in the same classroom or by different teachers in separate general and special education classrooms. Students with disabilities also often receive help from aides and special services. Disentangling the contribution of each teacher from these other factors may be difficult.
In this brief, we discuss the challenges of using value-added to evaluate teachers of students with disabilities. We consider the limited empirical research on the potential for systematic errors in value-added for these teachers, either because the models do not adequately account for the likely achievement growth of their students, or because they do not account for teachers being more or less effective for students with disabilities than they are for other students. We also consider the comparability of value-added for special education teachers and the value-added for other teachers.
What is known about value-added for teachers of students with disabilities?
Students with disabilities contribute not only to the value-added of special education teachers, but also to that of many general education teachers. These students account for about 14 percent of all students in the U.S., and even more in urban districts; 18 percent of students in Detroit Public Schools, for instance, receive special education services. These 5.7 million students  are a diverse group. The 13 disabilities listed under the federal Individuals with Disabilities Education Act (IDEA) are autism, deaf-blindness, deafness, emotional disturbance, hearing impairment, intellectual disability, multiple disabilities, orthopedic impairment, other health impairment, specific learning disability, speech or language impairment, traumatic brain injury, and visual impairment. Because the majority of students with disabilities spend most of their instructional time in a general education classroom, at least 80 percent of teachers will have one or more students with a disability in their classrooms.
Nearly all students with disabilities are tested in some way. Nationally, 79 percent take their state’s regular standardized assessment with or without accommodations and 21 percent take an alternative assessment. Historically, the alternative assessments for students with the most significant cognitive disabilities have not been designed to produce scores that can be directly compared with scores on the regular assessment. Sometimes the range of scores did not even overlap. When they did, equal scores did not indicate equal levels of achievement. Another type of alternative assessment was designed for students who demonstrate persistent academic difficulties. These assessments cover the same grade-level achievement standards as the regular assessments, and they use a familiar format of multiple choice and constructed response items. What sets them apart from the regular assessments is fewer items, simplified language, and fewer options on multiple choice questions. Unlike some other alternative tests, scores from these assessments for students with persistent academic difficulties can be linked to scores from the regular assessment and included in value-added modeling. But the test is being phased out within the year.
Of students with disabilities who take the regular assessments, around 60 to 65 percent receive testing accommodations. Accommodations take many forms. In addition to extra time allotted to complete the test, they include changes in the presentation of the test, different response types, and changes in scheduling and setting. The accommodations and the type of test can vary from year to year, depending on a student’s disability classification, changes in test rules, and degree of adherence to the rules.
Scores on regular assessments are typically used to calculate a teacher’s value-added, and accommodations are rarely taken into account. This is the case even though scores can be very sensitive to accommodations and even though inconsistent use of accommodations can result in predictable patterns in achievement growth. One study, for instance, found that average growth in math for students with disabilities was less than the average growth of all students, but for students getting accommodations (in the current year but not in the previous year) growth in math was larger than it was for all students. For students who received an accommodation in the previous year, but not in the current year, average growth in math was much less than it was for all students. Overall, students with disabilities had greater than average growth in reading, but the patterns with respect to inconsistent use of accommodations were the same as they were for math. That is, year-to-year changes in the use of accommodations predicted achievement growth. This predictable variation in growth is unrelated to teacher effectiveness, and if value-added modeling does not account for it, it can result in errors.
For many general education teachers, the impact of the test scores of students with disabilities on their value-added scores is relatively limited. Because they have few of these students in their classes, their value-added scores do not change much when students with disabilities are included or not. The situation is different for teachers with substantial numbers of students with disabilities. The value-added for these teachers can be more sensitive to these students’ test scores and to the way the model accounts for disability status. It is also less accurate, because of the very low test scores earned by many students with disabilities. Most standard tests are designed to provide accurate scores for students near proficiency; because there are not enough items measuring performance at the lower end of the scale, the tests may provide very limited information about students who score substantially below average. Thus, scores for students with disabilities tend to have larger measurement errors; they deviate more from the students’ true level of achievement than do the scores of other students.
As result, the value-added of teachers with a greater percentage of students with disabilities will have more random error than will teachers with similar-size classes but fewer students with disabilities. Random errors are differences between the value-added and a teacher’s true contribution to student achievement gains. The errors stem from many chance factors, such as the students who happen to be assigned to the teacher, students being sick at test time, or error in the test. Random errors are not systematic, in that they will not be similar across years or across teachers of similar students. They contribute to year-to-year instability in value-added, so value-added for teachers with more students with disabilities may have greater annual fluctuations than that of other teachers.
If the value-added model does not appropriately control for disability status, it can overestimate the growth of students with disabilities. This failure may contribute to systematic errors in which teachers are disadvantaged by having more students with disabilities in their classrooms. One study, for example, found that under value-added models that had very limited controls for student background variables, the two percent of teachers who had half or more students with disabilities received value-added scores that, on average, were low; they ranked in the 20th to 25th percentile in math and the 25th to 33rd percentile in reading. Under a model that controlled for students’ special education status and for inconsistent use of accommodations, these same teachers ranked in the 38th and 49th percentile in math and 40th and 54th percentile in reading. By contrast, the rankings of teachers who had classes with less than 20 percent students with disabilities changed very little under the two models. Similarly, adding controls to the model resulted in a modest change, from about the 48th to the 52nd percentile, for teachers whose classes were made up of 20 to 50 percent students with disabilities.
Having small numbers of students with useable test scores can also add to random errors. Because scores on alternative assessments may not be comparable to scores on the regular assessment, students with disabilities are more likely than other students not to have test scores that can be used for value-added calculations. In particular, special education teachers, who often teach small classes of students with the most severe disabilities, may have very small numbers of students with which to calculate value-added; they may even have fewer than the minimum required by the state. Thus, their scores may be very imprecise, as well as unstable across years.
A teacher may not be equally effective with students with disabilities as other students. Research has shown that direct, explicit, and systematic instruction, which includes practicing with students until they understand a concept, is particularly beneficial for students with disabilities. But this kind of instruction might not be used for general education students. Widely used observation protocols do not rate teachers on explicit systematic instruction, and instead emphasize “student-centered instruction” that gives students more control over their learning. Teachers whose classes have both students with and without disabilities will need to choose one or the other type of instruction, and–depending on the choice–they might be less effective with one or the other group of students. Research has also found that students with disabilities have fewer opportunities to learn in general education classrooms than do students without disabilities.
A possible implication of this research is that, to truly assess a teacher’s impact on students, his value-added needs to take account of the gains made by his students with disabilities. Excluding these students because they lack test data may distort the value-added measures. Calculating a separate value-added for students with disabilities and general students might seem desirable on the basis of this research, but because many teachers teach very few students with disabilities, the random errors from this subgroup would be too large for the measures to be useful.
Students with disabilities commonly receive instruction, as well as other services, from multiple education professionals, and this also poses challenges for calculating value-added. With co-teaching, for instance, a general education and special education teacher work together in the same classroom. In this case, two teachers contribute to a student’s learning. Dual contributions are also made when students with disabilities are taught in both general and special education classrooms. Students with disabilities may also benefit from aides and other staff. These situations complicate the computation of value-added because it is hard to separate one contribution from another.
These challenges are not unique to students with disabilities. In some schools, both general education students and students with disabilities are taught by multiple teachers and, although co-teaching is typically aimed at special education students, all students might be affected by multiple teachers. But these challenges may be most pronounced for special education teachers. Detailed data on teacher assignments and teachers’ shares of instruction are necessary to allow value-added models to account for students who are taught by multiple teachers. Information from the teacher himself is essential; in many states and districts, teachers can  review enrollment rosters to certify that the lists include the students they actually taught. The teachers then report the share of instruction they provided to each student. Confirmation of rosters, with safeguards to ensure accuracy, gives us richer data on student-teacher assignments than that which now exists in most data warehouses. Value-added models that make adjustments for co-teaching have been proposed, but they cannot fully distinguish the contributions of each teacher to student scores.
What more needs to be known on this issue?
Much more needs to be known about the achievement growth of students with disabilities and their teachers’ value-added. We know little, for instance, about how services change over the years because a student’s needs change or because a student changes schools. We also know little about how certain services contribute to students’ achievement growth and how best to account for these services. We need information on the difference in the achievement growth between students with disabilities who do or do not contribute to value-added. We also know little about how a student’s use of accommodations varies across years, how students receiving different accommodations compare with each other, and how students who receive consistent accommodations compare (in achievement growth) with those who receive inconsistent accommodations.
We also need more information about how instruction in both general and special education classes correlates with the achievement growth of students with disabilities. This information would help us assess whether methods to instruct students with disabilities are distinct from those used to teach other students. The data could help states and districts determine whether to calculate separate value-added for students with disabilities. We would also benefit from knowing how teachers’ contributions to the achievement gains of their students with disabilities compare with their contributions to gains made by their other students. If the contributions are very similar, there would be little need to calculate separate values. Such a study might require multiple years of value-added measures to obtain accurate values using only students with disabilities because many classes have few such students.
Along these lines, we need to know how the material taught to students with disabilities differs from what is taught to other students, and how the material taught to each group aligns with the content covered on assessments. If it does not align, the teacher’s value-added might prove inaccurate. These inaccuracies may be more pronounced for teachers who teach more students with disabilities.
We know that we can remove apparent systematic errors in value-added errors when we account for detailed information on disability status, especially for teachers whose classes have majorities of students with disabilities. Doing so, however, might affect the classification of students with disabilities or the placement of these students in classes. If disability status is included as a control variable, classifying more low-achieving students as students with disabilities might increase teachers’ value-added. Including disability status in value-added models might create an incentive to increase special education referrals among low-achieving students, many of whom are minorities from low-income families and are already at risk for such referrals along with their sometimes negative consequences. There is some data to suggest that educators have manipulated special education referrals in response to accountability pressures. However, not controlling for disability status might reduce the value-added scores of teachers of students with disabilities, possibly discouraging teachers from teaching classes with many of these students. To understand the potential for such consequences, we need studies on changes to the classification of disabilities and on how students with disabilities are placed after a new teacher evaluation system has been introduced.
We do not know how students with disabilities will perform on the tests now being developed to align with the Common Core State Standards. The new tests aim to improve upon existing assessments by providing more reliable measures of student achievement for all students—especially very low- or high-achieving students—and they may provide better information about students with disabilities. But the new tests will be more difficult. They will also have new procedures for accommodations, which may make the use of accommodations more consistent. Further, because several states are dropping one type of alternative assessment, a greater number of students with disabilities are expected to complete the regular assessments. These changes could increase the number of students with disabilities who can contribute to value-added, thus improving the reliability of value-added for this group.
What can’t be resolved by empirical evidence on this issue?
As discussed, there is risk both in accounting for disability status and not accounting for it. Accounting for disability status in value-added modeling might harm students by referring them to special education unnecessarily. Not accounting for disability status might harm teachers with large numbers of such students in their classroom by unfairly depressing their value-added scores, and it could harm students with disabilities if teachers do not want them in their classes as a result. Ideally, schools or districts would prevent unnecessary referrals to special education, but if they can’t, the choice is of which students or teachers to put at risk. Empirical data on the cost of either choice would very difficult to obtain, and it would not help us make the choice in any case.
To what extent, and under what circumstances, does this issue impact decisions and actions that districts and states can make on teacher evaluation?
If we are to include students with disabilities in value-added measures, these students must take assessments that are comparable to the tests taken by other students. States may want to choose tests that can provide accurate data for low-achieving students who include many students with disabilities. They might also ask test developers for tools with which to compare scores on alternative assessments with those on regular assessments. States and districts might promote consistent use of accommodations. Greater consistency will reduce the errors in measurement of achievement gains and, thus, value-added.
Value-added calculations will be most accurate if models use detailed information on disability status, as well as services received and accommodations used in the current and previous years. Many states and districts do not maintain this sort of data, at least not in a readily accessible format. They may need to update their data systems to include it and to make it more useful. Value-added for all teachers, but especially special education teachers, will also be more accurate if the models account for co-teaching and other forms of shared instruction. To obtain these data, districts may need to survey teachers.
States will need to decide how to account for disability status in their value-added models. Models that do not account for disability status may underestimate the achievement growth of students with disabilities and the contributions of their teachers to that growth. Such models include Student Growth Percentiles, which many states are using or are considering using for teacher evaluations. However, as noted, controlling for disability status could have negative consequences for students. So states that decide to control for disability status might want to monitor the number of special education referrals and how closely districts follow the guidelines for making them. Increases could indicate referrals motivated more by teacher accountability than by student need. States might want to re-evaluate students if these trends occur.
The new Common Core tests hold promise as more accurate assessments of lower-achieving students and students with disabilities. Thus, they should lead to more accurate value-added for almost all teachers. But better test data will not be enough to improve precision for special education teachers who have very few students or students who are tested by alternative means. For these teachers, other measures of teaching effectiveness, such as observations, may be particularly valuable. Special education classrooms may also demand a different observation protocol, since the exemplars of good practice used for general education may or may not apply.
There is still much work to be done on methods for assessing the teachers of students with disabilities. Student achievement growth may have a role to play, but using test scores of these students to determine that growth presents unique challenges to value-added modeling. These challenges may be particularly damaging to the accuracy of value-added for special education teachers and less of a problem for most general education teachers. However, decisions about how best to use the achievement growth of students with disabilities for evaluating their teachers also must consider the accuracy of other measures of teaching and the costs of relying on these rather than on value-added. States and districts may find it useful to monitor measures of teacher performance for relationships between these measures and the proportion of students with disabilities they teach. They might watch for systematic errors as well, and be prepared to revise their accountability systems if needed.