Only a few problems: the state tests this year were shorter, they were given untimed, and the translation from raw scores to proficiency levels radically eased. We have apparently entered a new era of test score inflation.
See what happened in NYC schools between 2002-2009, with especially sharp gains in math scores between 2002 and 2009:
Predictably, Bloomberg crowed and rode the wave to re-election and the renewal of mayoral control:
At a news conference at a school in the Bronx, Mr. Bloomberg trumpeted the results as evidence that mayoral control had produced revolutionary improvements and brought city students within spitting distance of state averages after years of mediocrity. “Our reforms are working,” Mr. Bloomberg said. “Our schools are heading in the right direction.” Even Randi Weingarten, the president of the teachers’ union, lavished praise on the mayor and his chancellor. “What we’ve seen in the last seven years is a cohesion and a stability and resources that we did not have beforehand,” she said.
Only as many of us knew then, and was eventually confirmed, the gains weren't real. The improvements were not matched by the results on the NAEP, the more reliable national exams, and the tests and the scoring were shown to be easier each year.
Erin Einhorn, then a reporter at the Daily News, did an eye-opening experiment in 2007. She gave the 2002 and 2005 4th grade math tests to a bunch of 4th and 5th graders taking summer school at Brooklyn College. The kids did far better on the 2005 exams, with 24 out of 34 getting higher scores, and only eight students getting lower ones. The p-values, or percent of questions answered correctly on field tests, and thus easier to get right, also increased rapidly between 2002 and 2005.
The test score inflation continued up through 2009, with sharp increases in proficiency similar to this year due to more predictable and easier tests. The scoring also got easier; Fred Smith, testing expert, and others discovered that random guesses would yield a Level 2 on the reading exam.
When the bubble finally burst in 2010, and the scores were re-calibrated, this is what occurred:
So how do we know we have entered another era of test score inflation?
When one compares NYC and NY state scale scores on the more reliable NAEP exams between 2013 and 2015, the trend lines don't match up. On the NAEPs, NY State and NYC 4th grade ELA, average scores declined slightly while on the state tests they increased. Because the NAEPs are only given every two years, we have no results for 2016. (The state scores are in dark green and dark orange; the city in lighter colors.)
But what about the claim made by DOE and the media that for the first time, NYC scores matched the state's? See for example, this headline from Chalkbeat: NYC reading scores leap, matching state average for first time. This may have been true in proficiency rates, but certainly not in scores, which are considered a more reliable way to track achievement.
If you look at the charts above, you can easily see that NYC's average scores matched the state's last year in 4th grade reading and surpassed the state in 8th grade ELA. In addition, NYC matched the state's average scores in 4th grade math in 2013, and in 8th grade math in 2015.
Yet as we see NYC did not match the State scores on the NAEPs, in 2013 or 2015, in any subject. This creates even more doubt on the reliability of the state metrics and provides evidence that we have entered a new era of test score inflation.
Yet another reason to doubt ANY comparisons between city and state scores or proficiency rates is that 95% of the state's districts had more than 5% opt out rates -- and the 95% participation rate is supposedly required for accurate conclusions. Meanwhile, NYC's opt out rate remained relatively low at 2.4% -- which makes any comparisons between NYC and the rest of the state even more questionable.
But perhaps the smoking gun is this: the state cut scores much lower this year, with far lower raw scores translated into higher scale scores which were then equated with higher Proficiency levels, meaning Level 3 or above. [Note: this may not be called cut scores, which according to some test experts refers to the proficiency levels set to the scale scores, rather than to the raw scores. In any case, the effect is the same.]
See the analysis done by Michael O'Donnell of New Paltz Board of Education and checked by NYSAPE members. See also the chart below -- showing the systematic lowering of the number and percent of raw scores equated to proficiency out of the total possible in 11 out of the 12 exams. [Clarification: the numbers below in the boxes are the percentage of points out of the total number of points needed to get a Level 3. For example, in 3rd grade ELA, a student had to get 34 points out of 55 (62%) to get a Level 3 in 2015, compared to 28 out of 47 (60%) in 2016. And so on. The state data showing the conversion of raw scores to scale scores and the conversion of scale scores to proficiency levels is here; feel free to do your own analysis! ]
The fact that the number of points needed for a student to be considered proficient was so much lower this year is very fishy, and only justifiable if the questions were much harder. Yet few if any teachers reported that the exams were more difficult this year, and the Commissioner herself insisted the exams were "comparably rigorous" to last year. Indeed, she had tried to mollify parents who complained they were too long by shortening the exams and giving them untimed -- both changes that would be apt to boost results, all things being equal.
What is somewhat different from the last time we experienced test score inflation is that the NYSED presentation to the public had a clear disclaimer that any trends in test scores over time could not be ascertained. See this, from slide 5 of the NYSED powerpoint:
But then the Commissioner proceeded to ignore this disclaimer, and showed charts showing gains each year from 2013-2016, albeit with a tiny asterisk at the bottom, that said:
What to make of these logical fallacies? What to make of the fact that not only the Commissioner but the NYC press corps almost uniformly glossed over the contradictions in the NYSED presentation, by omitting any mention of the state's history of test score inflation and confining any reservations to the 7th or 8th paragraph of their stories? Indeed, the only reporters to confront the glaring unreliability of the data head-on wrote worked for the Buffalo and Rochester papers.
2. Allow more time to take them
3. Make the questions easier
4. Change the cut scores and/or translation from raw scores to performance levels.
So yet again, we have a Mayor using these unreliable and possibly invalid test results to "prove" that Mayoral control works and using it for his political advantage. Again, we have a State Commissioner who says the results show that the state's educational "reforms" are leading to more learning. Again, the NYC Chancellor and a UFT president are drinking the Koolaid, to justify their preferred policies. This time, in addition, the charter schools are also touting the results to "prove" their superiority. Will we have to wait years until a new Commissioner until the state admits the truth, as we did last time when Rick Mills was replaced by David Steiner?
Following the Chancellor Farina's press conference, a journalist wrote me, "Twilight zone. ...Collective self-delusion with the kids used as political chess pieces." It appears we are living through Groundhog's Day, over and over again. As the well-known saying goes, “Those who cannot remember the past are condemned to repeat it."