Back to Blog
Leadership

Stop Grading Teachers. Start Trusting Them.

Teacher evaluation reform failed at scale—but not everywhere. Learn how principals can shift from compliance to growth by trusting the teachers who know students best.

School Solutions
February 1, 2026
teacher evaluationinstructional coachingprofessional growthclassroom observationteacher trust

By School Solutions | February 2026 | 10-minute read

Picture this. It's a Tuesday in March. You've got a stack of post-observation reports to finish by Friday, a parent meeting in twenty minutes, and a discipline referral waiting outside your door. You glance at the evaluation rubric on your screen—four domains, sixteen elements, dozens of indicators—and you think: I was in that classroom for thirty minutes. I watched a teacher do something remarkable with a room full of teenagers. And now I'm supposed to reduce it to a number.

If that sounds familiar, you're not alone. And you're not wrong to feel the tension.

Teacher evaluation is one of the most consequential tools in a building leader's toolkit. Done well, it drives professional growth, strengthens instruction, and retains the teachers students need most. Done poorly—or done merely as compliance—it becomes something else entirely: a bureaucratic exercise that damages trust, wastes time, and changes nothing.

Here's the uncomfortable truth the research now confirms: the way most of us have been doing evaluation hasn't worked.

  • $1.4 billion spent annually on teacher observations across the U.S. (Brookings, 2016)—with no measurable aggregate impact on student outcomes.

  • The landmark Bleiberg et al. study, published in the Journal of Political Economy: Microeconomics in 2024, analyzed data from 44 states and found precisely estimated null effects from the teacher evaluation reforms that swept the country between 2009 and 2017. Billions of dollars. A decade of effort. No aggregate impact on student achievement, graduation rates, or college enrollment.

But before you throw your rubric out the window, keep reading. Because the story doesn't end there.

Where Did We Go Wrong?

The Race to the Top era was built on a seductive premise: if we could just measure teaching precisely enough—with rubrics, value-added scores, and high-stakes consequences—we could engineer better outcomes at scale. Forty-three states adopted student-growth measures in teacher ratings by 2015. The Gates Foundation invested $575 million in its Intensive Partnerships for Effective Teaching initiative. The federal government dangled billions in incentive funding.

And it didn't work. Not because the idea of evaluation is flawed, but because the execution treated teaching like a prescription—something you could standardize, dose, and measure with clinical precision.

"I'm deeply troubled by the transformation of teaching from a complex profession requiring nuanced judgment to the performance of certain behaviors that can be ticked off on a checklist."

— Charlotte Danielson, creator of the most widely used teacher evaluation framework

That's the architect of the dominant evaluation framework saying the system missed the point. And the data backs her up. RAND's final evaluation of the Gates initiative found it did not achieve its goals for student achievement. Evaluation ratings remained inflated—fewer than 2% of teachers rated at the lowest level—even as independent observers identified far more teachers struggling. The "widget effect," first identified by TNTP in 2009, never really went away. By 2017, researchers found evaluators perceived more than three times as many teachers to be below proficient as they actually rated.

What went wrong? Bleiberg and his colleagues identified five factors: political opposition from teachers who felt targeted rather than supported, the impossibility of implementing complex systems uniformly across thousands of decentralized districts, severe capacity constraints on evaluators, limited transferability of results from well-resourced pilot sites, and—critically—no increased compensation to offset the reduced job satisfaction that came with feeling surveilled.

In short, the reform era did something to teachers rather than with them. And thirty states have since rolled back at least one evaluation reform adopted during that period.

Education Is Not Prescriptive—So Why Should Evaluation Be?

Here's the thing we all know but the policy world has been slow to accept: teaching is not a standardized act. A brilliant Socratic seminar in a 10th-grade English class looks nothing like a brilliant guided reading lesson in first grade. A veteran teacher managing a complex behavioral dynamic does not use the same moves as a first-year teacher executing a carefully scripted lesson plan. Both can be excellent. Both can serve students powerfully.

The teachers in your building know their students. They know which kid needs a quiet redirect and which one needs to be pulled into the hallway for a two-minute conversation. They know when to push and when to hold back. That knowledge—earned daily, refined over years—is the engine of effective instruction. And it can't be captured on a checklist.

This doesn't mean evaluation is pointless. It means evaluation has to be designed around what we actually know about how professionals grow. And the research is remarkably clear on this: teachers improve when they receive specific, observation-based feedback in an environment of trust. IES research confirms that the specificity of feedback is positively associated with teacher job satisfaction, motivation, and self-efficacy. RAND found that 93% of teachers who believed their evaluation system was designed to promote growth rated it as fair. Only 66% of those who didn't believe that said the same.

93% of teachers who believed evaluation was designed for growth rated the system as fair (RAND).

Trust is not a soft concept. It is the operational infrastructure of growth. When teachers trust that their evaluator is there to help them improve—not to catch them failing—they open up about what's not working, they take risks in their instruction, and they engage in the kind of reflective practice that actually moves the needle.

The Proof: Where Evaluation Actually Works

If evaluation reform failed at scale, why did it succeed in specific places? Because those places invested in it as a professional growth system, not a compliance mechanism.

Washington, D.C.'s IMPACT system is the most studied example. Launched in 2009, IMPACT combines classroom observations by trained master educators (not just principals), value-added measures, and student surveys. Teachers rated Highly Effective earn bonuses up to $25,000. Those rated Ineffective face separation. Over a decade of independent research confirms results: Minimally Effective teachers improved an average of 12.6 points on the IMPACT scale within one year. When low-performing teachers exited, student learning increased by at least four months in math and reading at high-poverty schools. A 2021 UVA/Brown/Stanford study found the system continued to improve teacher effectiveness more than eight years in.

But here's what makes IMPACT instructive for the trust argument: it evolved. After a comprehensive review in 2020 that addressed teacher concerns about stress and racial equity, IMPACT reduced emphasis on test scores, added informal observation opportunities teachers valued, and restructured coaching support. By 2021, 60% of teachers said IMPACT helped their growth—up from 54% in 2019. Not perfect, but improving, because the system listened.

Dallas ISD's Teacher Excellence Initiative (TEI) tells a complementary story. Dallas replaced its traditional salary schedule with a performance-based system using more than ten classroom observations per year—including unannounced visits. Master Teachers earn base salaries over $100,000. The results, published by Hanushek and colleagues in Education Next (2025), showed math scores improved by 0.16 standard deviations compared to similar districts, top-teacher retention hit 90%, and more than 90% of schools with state "Improvement Required" status were removed from the list within four years.

What do D.C. and Dallas have in common? Frequency of observation, investment in evaluator training, meaningful consequences paired with meaningful support, and—crucially—real compensation that honored teacher professionalism. They didn't just evaluate teachers. They invested in them.

Five Shifts You Can Make This Year

You may not have D.C.'s budget or Dallas's structure. But you can begin shifting evaluation in your building from compliance to growth, starting now. Here are five evidence-based moves.

1. Lead with purpose, not paperwork.

Before your next observation cycle, tell your staff directly: evaluation in this building exists to help every teacher improve. That's it. RAND's data shows this framing alone changes how teachers experience the process. Be explicit. Be consistent. Say it in faculty meetings, in pre-observation conversations, and in your written communications.

2. Make feedback specific and immediate.

A score without a conversation is just a judgment. IES research is unambiguous: the single strongest predictor of whether teachers find evaluation useful is the specificity of the feedback they receive. After every observation, provide at least one concrete, targeted action the teacher can apply in their next lesson. Not "your questioning could be stronger" but "I noticed three of your questions were recall-level—try opening tomorrow's discussion with a question that asks students to evaluate or connect."

3. Get into classrooms more often—informally.

Dallas's ten-plus annual observations may seem extreme, but the principle is sound: one or two formal observations per year cannot capture the complexity of a teacher's practice. Build a habit of short, informal walkthroughs. Five minutes, three times a week, followed by a brief note of something you noticed. This normalizes your presence, reduces the performance anxiety of formal observations, and gives you richer data for meaningful conversations.

4. Leverage peer observation and coaching.

RAND and NCTQ research show teachers respond more positively to feedback from experienced peers and instructional coaches than from formal administrative observations. You don't have to carry this alone. Build structures for peer walkthroughs, co-observations, or mentoring pairings. Research on peer mentoring models found that pairing high-performing teachers with those who were struggling led to measurable academic gains for the lower-performing teachers' students.

5. Connect evaluation to differentiated professional development.

NCTQ found that all six of their exemplar evaluation systems tied professional development directly to evaluation results—rather than offering a buffet of open-ended PD choices. Use your observation data to identify two or three high-leverage growth areas for each teacher, then direct resources accordingly. When a teacher sees their evaluation translate into targeted support rather than a filed-away form, the process earns credibility.

The Question Worth Asking

As states continue retreating from top-down mandates—Michigan capped student test scores at 20% of evaluations, New York replaced its entire system with locally designed plans, Illinois eliminated its student-growth mandate entirely—building leaders face a rare window of autonomy. The prescriptive era is fading. What comes next is up to us.

And what should come next is an honest reckoning with a simple question: Does our evaluation process help teachers get better?

Not: does it satisfy the state? Not: does it generate defensible documentation? Not: does it sort teachers into tidy categories? Those are administrative concerns, and they matter—but they are not the point.

"Most teacher evaluations are just another check box on an administrator's to-do list and a lesson in futility for those on both sides of the table."

— George Philhower, Superintendent, Eastern Hancock, Indiana

The teachers in your building chose this profession because they believe they can make a difference in the lives of children. They bring content knowledge, pedagogical skill, cultural awareness, and deep relational intelligence to their classrooms every day. They deserve an evaluation process that recognizes that complexity—one that trusts their professional judgment while still holding them to high standards.

Because education is not prescriptive. It never has been. The best teaching happens when skilled professionals are empowered to respond to the students in front of them, in real time, with the full range of their training and experience. Evaluation should reflect that reality, not fight it.

Your next step: Before your next observation, sit down with that teacher for ten minutes. Don't bring the rubric. Ask two questions: What are you working on right now in your practice? and How can I help? Then listen. That conversation—not the form, not the score, not the rating—is where evaluation starts to matter.