Tuesday, October 2, 2012

Why no one in his right mind should believe the school grades OR the teacher growth scores

UPDATE: Here is Gary Rubinstein's new graph of school ranks. this year compared to last, according to the Progress reports.  The "x" axis is 2011; the "y" axis is 2012. 
The new DOE school grades for elementary and middle schools, euphemistically called “Progress reports,” came out with much fanfare, with 217 schools potentially put on the closing list because they either received failing grades or three “Cs” in a row, more than ever before.  As parent leader Shino Tanikawa pointed out in the New York Post, DOE is using unreliable grades to implement terribly misguided policies. (If you'd like to see your school's grade anyway, you can find it here.)

Last year’s headlines ran like this: School report cards stabilize after years of unpredictability. This year again, reporters cited claims of DOE officials, who “highlighted the stability of this year’s reports.”
It wasn’t true last year and it isn’t this year either.  Here is a figure from Gary Rubinstein’s blog from last year, showing no correlation between the rank order of schools in 2010 and 2011. I suspect the Gary’s illustration will be similar for this year, once he gets around to making it.
As InsideSchools reported, 24 out of the 102 schools that received “D”s or “F”s this year had received top grades of “A” or “B” the year before. Other high-performing schools, such as PS 234 in Tribeca that received an “A” last year, fell precipitously to a “C” with the same principal, same staff and most of the same students.  The school plunged from the 81st to the 4th percentile, which would have meant a “D,” if not for the DOE rule that no school that performs in the top third citywide can receive a grade lower than C, as Michael Markowitz pointed out in a comment on GothamSchools.
According to the DOE formula, 80-85% of school’s grade depends on last year’s test scores on the state exams.  Most of that figure is based on supposed “progress,” i.e. the change in test scores from the year before.  As I have pointed out many times, including in this 2007 Daily News oped, Why parents and teachers should reject the new grades”, experts in statistics have found that the change in test scores at the school level from one year to the next is highly erratic and up to 32-80% random. 
In recognition of this fact, Jim Liebman, who developed the school grading system, originally told skeptical parents that the system would eventually incorporate three years of test scores, which would lessen the huge amount of volatility, a promise that DOE has failed to abide by.  For proof of this promise, see Beth Fertig’s book, Why Can’t U teach me 2 read?:
“…[Liebman] then proceeded to explain how the system would eventually include three years’ worth of data on every school, so the risk of big fluctuations from one year to the next wouldn’t be such a problem. (p.121)”
To make things worse, the state exams last year and the scoring guides were riddled with errors, as many parents, teachers and students noted . Finally, test scores are not a good way to assess school quality, for myriad reasons, even if the tests were perfect and the formula based on multiple years worth of data, as I pointed out in this NYT Room for Debate column
LESSON: Anyone who believes in the accuracy of these school grades is sadly misinformed, and DOE’s attempts to dissuade parents from sending their kids to schools with low grades or to close schools based upon such unreliable system is intellectually and morally bankrupt.
In another untenable move, the teacher “growth scores,” also based on the one year’s change in test scores, but this time at the classroom level, have been released to principals outside NYC.  These growth scores will be incorporated in the new teacher evaluation system to be imposed statewide.  See this excellent column by Carol Burris about how the heedless use of growth scores is likely to damage the education of our kids. 
Here are the detailed results of the statewide survey by principals, including their comments, showing that the vast majority believe that these scores are NOT an accurate reflection of the effectiveness of individual teachers.  More than 70% of principals also said they were “doubtful” or strongly opposed to any use of growth scores in teacher evaluations.  Aside from the annual volatility in these scores, there are many other problems with relying upon such flawed measures of teacher quality:
The growth scores, developed for the state by the consulting company AIR, attempted to adjust only for the following demographic factors: 

  • Economic disadvantage (ED), but without differentiating free lunch or reduced lunch students – very different, with very different expected outcomes;
  • Students with disabilities (SWDs), but not types or severity of disability;
  • English language learners (ELLs).

And even though AIR did attempt to control for the above factors, they still admitted that teachers who work at schools with large numbers of students who were poor or had disabilities tended to have lower growth scores. 
AIR made NO attempt to control for classroom characteristics such as class size, or the racial/ethnic background of students, or any other variable that is known to affect achievement. This is different from the NYC value-added teacher data reports, released last year by DOE, widely derided as unfair and unreliable, which at least attempted to control for many of these factors.
Even so, Bruce Baker, professor at Rutgers, found that while the teacher data reports claimed to control for class size, teachers at schools with larger class sizes were significantly more likely to be found ineffective than those who taught at schools with small classes, and as “class size increases by one student, the likelihood that a teacher in that school gets back to back bad ratings goes up by nearly 8%.”  The situation with these growth scores is yet worse, with no attempt made to control for class size at all.  Is it fair to deny NYC teachers tenure and/or risk losing their jobs because they are saddled with larger classes than teachers in the rest of the state?
AIR also found that teacher growth scores tended to be higher in “schools with students with higher mean levels of ability in both ELA and in Mathematics.” This shows that teachers who work in schools with large numbers of low-achieving students are more likely to be found ineffective. 
Apparently NYC principals have not yet been offered the opportunity to examine the growth scores of their teachers, unlike principals in the rest of the state. As far as I know, DOE has not explained why.  With far larger numbers of struggling students who are economically disadvantaged and crammed into larger classes, it is quite likely that a higher proportion of NYC teachers will be found ineffective than elsewhere in the state– and thus unfairly penalized for teaching disadvantaged students in worse conditions.
Here is the link where you can find the AIR technical manuals on growth scores.  


Anonymous said...

It is impossible for a school to get an A one year and then with the same staff,children and curriculum go to a C the next year. The problem is the Bloomberg administration and his clueless followers use this incorrect information to rate schools for possible closure. Very Sad!

Anonymous said...

That red-dotted graph has more value as a piece of artwork than informational data. Until we put the feathers in the right education caps, all we're doing is bending to the power whims of leaders who are in it for the wrong reasons. Teachers, parents and students...show the corporate misfits where the real power and value exists...in your efforts, intellect, will and determination to do well for the right reasons. Learning is a marathon and requires stamina and fitness. Sharpen your minds and follow your hearts into the success YOU want for yourself, not based on some silly data that gets us nowhere. So they know what you're doing, you will need to speak up. Politely say, "No thanks. I can do this my way."

Anonymous said...

While I don't agree with the formula determining the overall grade and progress, I think looking at the performance grade (yes, much maligned state tests, but at least it's a universal measuring stick) and at the learning environment survey has been very informative.