Getting 
Classroom 
Observations 
Right

Sep 23, 2014 by

It is widely understood that there are vast differences in the quality of teachers: we’ve all had really good, really bad, and decidedly mediocre ones. Until recently, teachers were deemed qualified, and were compensated, solely according to academic credentials and years of experience. Classroom performance was not considered. In the last decade, researchers have used student achievement data to quantify teacher performance and thereby measure differences in teacher quality. Among the recent findings is evidence that having a better teacher not only has a substantial impact on students’ test scores at the end of the school year, but also increases their chances of attending college and their earnings as adults (see “Great Teaching,” research, Summer 2012).

Additional Sources: Haberman Foundation

In response to these findings, federal policy goals have shifted from ensuring that all teachers have traditional credentials and are fully certified to creating incentives for states to evaluate and retain teachers based on their classroom performance. We contribute to the body of knowledge on teacher evaluation systems by examining the actual design and performance of new teacher-evaluation systems in four school districts that are at the forefront of the effort to evaluate teachers meaningfully.

We find, first, that the ratings assigned teachers by the districts’ evaluation systems are sufficiently predictive of a teacher’s future performance to be used by administrators for high-stakes decisions. While evaluation systems that make use of student test scores, such as value-added methods, have been the focus of much recent debate, only a small fraction of teachers, just one-fifth in our four study districts, can be evaluated based on gains in their students’ test scores. The other four-fifths of teachers, who are responsible for classes not covered by standardized tests, have to be evaluated some other way, including, in our districts, by basing the teacher’s evaluation score on classroom observations, achievement test gains for the whole school, performance on nonstandardized tests chosen and administered by each teacher to her own students, and by some form of “team spirit” rating handed out by administrators. In the four districts in our study, classroom observations carry the bulk of the weight, comprising between 50 and 75 percent of the overall evaluation scores for teachers in non-tested grades and subjects.

As a result, most of the action and nearly all the opportunities for improving teacher evaluations lie in the area of classroom observations rather than in test-score gains. Based on our analysis of system design and practices in our four study districts, we make the following recommendations:

1) Teacher evaluations should include two to three annual classroom observations, with at least one of those observations being conducted by a trained observer from outside the teacher’s school.

2) Classroom observations that make meaningful distinctions among teachers should carry at least as much weight as test-score gains in determining a teacher’s overall evaluation score when both are available.

3) Most important, districts should adjust teachers’ classroom-observation scores for the background characteristics of their students, a factor that can have a substantial and unfair influence on a teacher’s evaluation rating. Considerable technical attention has been given to wringing the bias out of value-added scores that arises because student ability is not evenly distributed across classrooms (see “Choosing the Right Growth Measure,” research, Spring 2014). Similar attention has not been paid to the impact of student background characteristics on classroom-observation scores.

Observations vs. Value-Added

The four urban districts we study are scattered across the country. Their enrollments range from about 25,000 to 110,000 students, and the number of schools ranges from roughly 70 to 220. We have from one to three years of individual-level data on students and teachers, provided to us by the districts and drawn from one or more of the years from 2009 to 2012. We begin our analysis by examining the extent to which the overall ratings assigned to teachers by the districts’ evaluation systems are predictive of the teacher’s ability to raise test scores and the extent to which they are stable from one year to the next. The former analysis can be conducted only for the subset of teachers with value-added ratings, that is, teachers in tested grades and subjects. In contrast, we can examine the stability of overall ratings for all teachers included in the districts’ evaluation systems.

via Getting 
Classroom 
Observations 
Right – Teacher evaluation research : Education Next.

Print Friendly, PDF & Email

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.