An Interview with Catherine McClellan: Lessons Learned from Measures of Effective Teaching Project.

Apr 10, 2012 by

Michael F. Shaughnessy –

1)       First of all, could one of you summarize the Lessons Learned from the Measures of Effective Teaching Project.

 Wow–there were so many, I’m sure we’ll miss some.  One that was hard to realize if you weren’t inside the project was just that it could be done at all–that something of this size, scale, and complexity could be done, and done well.  We were making up so many of the techniques and tools as we went, I don’t think we were certain it would work until it did.

We learned that we could train hundreds of observers to score video of teachers teaching their classes with high levels of accuracy and consistency.  We learned just how important the all quality control measures we used are if you want to be sure the data are what you think they are.  We were reminded of just how complex the act of teaching is; watching the videos was fascinating!  We learned that students have important and valuable opinions about teaching quality that should be heard and used; and that the ways that questions are asked of students is essential to prevent the survey from being just a popularity contest.  We learned that developing high-quality observation instruments is extraordinarily difficult, and that building the necessary structures to turn them into an observation system is a staggering amount of hard work, even for experts!  And we learned that statistical modeling of these data isn’t perfect (well, we knew that), but that it is valuable and informative.

2)       Were the lessons different for elementary vs. middle school vs. high school?

I don’t think the broad lessons were different.  Obviously, there are some specific things that are different–most elementary teachers teach all subjects, so they have different data than teachers who specialize in one content area.  Some observation instruments had observers who specialized in elementary or middle/high school, but most did not.  We haven’t seen the high school results reported yet, so maybe we’ll learn more soon.

One thing to remember is that high school subjects like algebra and biology are what is called “untested” because in most jurisdictions there are no large-scale standardized tests that can be used to build statistical models like value-added.  The untested subjects and grades present a significant challenge, since they are the majority of teachers and students in the US.  The “tested” subjects and grades  are the “No Child Left Behind” ones, typically grades 4-8 math and ELA only; everything else, about 70% or so of the school population, is untested.

3)       Tough question- what about special education- were different levels of special education observed and evaluated?

Special education classes were not selected for the study specifically, since MET focused on math and English Language Arts (ELA) classes in grades 4-8, adding in algebra and biology in grade 9.  There were some special-needs students in the “mainstream” classes in the study, obviously.  Different observation instruments had components that would capture how teachers and classmates interacted with each other: things like the climate or atmosphere of the classroom, engagement of all students in the class, formative assessment, and whether everyone was treated with respect.  Special education teachers present a challenge for value-added modeling on a couple of fronts: they tend to teach small numbers of students, and those students often do not take large-scale standardized assessments.

4)       How were these 3000 classrooms chosen for this MET project and how long did it last?

The classrooms were selected by other partners in the project, so for details you’d do better to ask them.  Teachers volunteered to be in the study–that’s important to know.  The districts that participated (Charlotte-Mecklenburg, Dallas, Denver, Hillsborough (Tampa), Memphis, and New York City) worked with the Gates Foundation and RAND to identify schools and teachers that met criteria they had set for participation in the study.  One very interesting aspect of selection that will be highlighted in the next MET report will be the “randomization” component.  Teachers were chosen in pairs (or small groups of 3 or 4) so that each partner in the pair taught a parallel, comparable schedule of classes.  Basically, they should be able to swap over and teach each other’s schedule.  Between the first and second year of the study, MET worked with the schools to “randomize” the assignment of classes of students to these pairs of teachers.  The goal was to examine something that comes up a lot in conversations about evaluating teaching: that some teachers get “better” or “worse” classes of students and that confers an advantage or disadvantage in the evaluation of teaching practice.  In year 2 of the MET study, since the classes of students were randomly assigned to the teachers, this factor could be eliminated as a driver of the evaluations.  I’m really excited to see the results of this analysis!

5)       What would you say were some “ practical insights “?

This stuff is HARD!  We had the good fortune to be working with some of the top talent anywhere in every aspect of this project, and it was still incredibly difficult.  It really illuminated the level of expertise required to build the tools to support a teaching evaluation system.

Even though building an excellent teaching evaluation system is hard, it can be done, and done well.  Don’t underestimate what it takes, but don’t believe it is impossible!

I hate to be a naysayer, but trying to build an observation instrument and especially the supporting systems at home is risky.  The expertise and effort required to do a good job are amazing, and most districts don’t have the resources to devote for the time required to do it well.  The people who truly are experts are experts for a reason–they have invested years and huge amounts of work into their instruments.

Training observers to do this work well is challenging, and observation is remarkably difficult.  You have to keep a multifaceted instrument in mind while watching the complex interactions of 30 people. The cognitive demands are massive.  Thorough training is essential.  Checking for mastery at the end of training is essential.  Monitoring to verify that observers’ skills are still on target on a regular basis is essential.  Controlling sources of observer bias is essential.  These observations rapidly are becoming high stakes, and we need to treat them that way.

There are aspects of this work that are greatly facilitated by having video to watch rather than live observations.  You can rewind video, others can watch it later, and the teacher can watch him- or herself teach–it is not ephemeral like a live observation.

 6)       Can we discuss how the lessons were chosen? Did you randomly pick and choose subject areas? Were math, science, music, art, and P.E. included ? Or just the content areas of language arts and reading?

The teachers were chosen so that they taught ELA or math in grades 4-8, plus algebra and biology in grade 9.  Those are the only content areas captured in MET.  There were a set of “target” content areas that teachers were asked to capture if possible; otherwise the lessons were on a random topic.  In Year 1, the lessons were concentrated in the second semester due to timing of the start of the study; in Year 2 they were spread out throughout the academic year.  Teachers were asked to avoid classes where they gave a test for obvious reasons!

 7)       Were there clear differences in how teachers coped with discipline and behavior problems?

Yes, definitely.  Some of the differences likely were attributable to variation in school policies.  Others seemed more based in the rules and expectations that individual teacher had set and enforced, and that is part of what the observation instruments were capturing.  And anyone who believes that the students are angels when the camera is on has not seen some of those videos!  As a former high-school teacher myself, I found the opportunity to watch the videos enthralling.  I learned a great deal about how to handle a huge range of situations–and some things that didn’t work–from watching our brave MET teachers do so on camera.

8)       Was there any evaluation of students being inappropriately placed in regular or general education?

That’s not a judgment we were placed to make.  Our assumption was that the students were placed as their IEP or 504 plan directs.  We were observing how a teacher interacts with the students, conveys content, engages the class, manages time and behavior, and many other things–but all of those observations are made for the class sitting in front of the teacher on the video.  Whether it is an advantage or disadvantage (and everyone has an opinion on this point), the fact was that the MET observers knew nothing about the teacher or the students or the school or the policies except whatever was on that video–period.  Their judgments were made strictly on the evidence in front of them as the instrument they used specified.

9)       What have I neglected to ask ?

 This isn’t a question you missed, but a personal opinion: We need to learn from No Child Left Behind as we build these systems.  Accountability measures in NCLB started with each state building an assessment system for their students and determining progress and achievement of standards.  10 years and who know how many taxpayer dollars later, we have learned that a lack of comparability across those state systems and content standards is frustrating, and the Common Core Standards have arisen as a way to address that need.  Yet, as we build a system of measuring teaching practice, we seem to be starting–not even at the state level–at the district level.  The same issues with lack of comparability across districts and across states soon will frustrate us in the same way, not to mention the level of effort and cost expended on creating these hundreds or thousands of individual systems.  It would be better to invest those resources in building a really terrific teaching evaluation system up front for everyone to use, rather than spend the next decade discovering–again–that we don’t have comparable data but we want them.

10)    Where can people get a copy of this report?

 This report can be found at:


Print Friendly, PDF & Email

Related Posts


Share This

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.