NYC Public School Parents: How the NYS Exams are Turning our Children into Zeroes

The following is from Fred Smith, testing expert who in earlier years worked for the NYC Board of Education in its Assessment office. You can also check out news articles about the findings of this report here and here.

The latest report issued by SUNY, New Paltz’s Benjamin Center for Public Policy Initiatives, Tests are Turning Our Children into Zeroes: A Focus on Failing, confirms the misgivings of parents about problems with the state exams in 2012 and 2013, as well as the Common Core standards with which they were aligned. Robin Jacobowitz, the Center’s director of education projects, worked closely with me in preparing the report, but I take full responsibility for its contents.

Data for the study came from two sources, the New York State Education Department (SED) and the New York City Department of Education (DOE). Obtaining information from the state required several Freedom of Information law requests, was difficult, time-consuming and only partially successful. The DOE provided our data promptly without a FOIL.

Background and Rationale. We studied the results of state exams for grades 3-8 from 2012 through 2016, starting when SED transitioned to the Common Core Learning Standards.

NCS Pearson, Inc. was awarded a $32 million contract to develop Core-aligned measures for students in grades 3 to 8 in 2011. Pearson’s exams were billed as more “rigorous,” in keeping with the tougher learning standards, the heart of the so-called “education reform agenda” of Commissioner John B. King and Regents Chancellor Merryl Tisch.

Test results were to be used to see where students stood in English and math, and to follow their progress in meeting the standards. In addition, overreaching efforts were made to incorporate the scores into formulas for rating teacher effectiveness, judging principal performance and justifying actions to reorganize or close schools.

The very first day of testing in 2012 saw 192,000 students take Pearson’s English Language Arts (ELA) tests in Grade 8. An eye-opener came two just days later when Leonie Haimson reported on this blog about “The Hare and the Pineapple,” a preposterous reading passage, a story which quickly went viral. The next day, Education Commissioner King, a champion of the more stringent testing, threw out the rotten pineapple along with its six confusing multiple-choice (MC) items. This led to colorful parent-children protests outside Pearson’s New York City offices and gave impetus to the growth of the opt out movement.

The next year, 2013, was SED/Pearson’s foundational year, to establish a Common Core baseline against which to measure growth. Parents in growing numbers throughout the state began to realize that testing was becoming the expanding center of the school universe—too much classroom time spent in preparing students for the exams and conducting them; too much import given to the results of a single test on which to base high-stakes decisions; consequent concern over the pressure children felt during the lengthy testing period. A watchword was born: “My child is more than a test score!”

After the April 2013 ELA was given, further criticisms arose from a larger number of parents, teachers and principals, including that the tests were too long; many items didn’t have clear answers; reading passages were developmentally inappropriate for the students, especially the youngest ones; English Language Learners and students with disabilities were left in a daze; and too many children were experiencing stress before, during and after the exams. The scaling of these exams was also controversial and linked to excessively high SAT scores.

SED dismissed the negative reactions as being anecdotal, coming from the usual naysayers and not backed up by any evidence. Yet at the same time, SED had stopped revealing the kind of data needed to evaluate the quality of the exams and to determine whether the parents’ charges were well-founded. This Catch-22 fueled my indignation.

Up until 2011, SED had posted complete copies of the exams on its website shortly after they had been given. [See this page for 2006-2008 ELA exams, for example, when CTB/ McGraw-Hill was the test publisher.] In addition, statistics for all questions were made available in annual Technical Reports a few months after test administration. Anyone with a certain level of expertise could see how many students chose the “right answer” on multiple choice questions, how many chose each distractor (wrong answer), as well as how many failed to answer at all.

The other part of the ELA consisted of open-ended questions, referred to as constructed response questions (CRQs). These require students to read texts and write answers, which are scored by teachers trained to rate them. In the years prior to the new standards, SED also released these questions and presented a breakdown of the scores for all CRQs, making them openly reviewable in a manner parallel to the multiple-choice items.

SED gave the score distributions for each CRQ, which were worth either two to four points, depending on the question. In all cases, children’s answers could be scored zero, if the writing was judged to be incoherent, irrelevant or unintelligible.

With the advent of Pearson and the emergence of Common Core-based testing, the disclosure of material and data contained in prior Technical Reports was abruptly curtailed, starting in 2011. After 2015, SED stopped posting the annual Technical Reports.

The Common Core-based exams attached increased importance to the CRQs. They represented the highest reflection of the learning standards intended to assess students’ ability to think critically, analyze reading matter, use evidence to support their answers, and respond in an organized and logical way. And, from 2012 through 2016, the CRQ scores assumed greater weight in determining overall performance on the ELA. They accounted for 30% of the points students could earn in 2012, rising to 41% in 2016. Their weight was heaviest—47%—Grades 3 and 4 in 2016.

Because the ELA exams bore the brunt of parental complaints, coupled with my curiosity, CRQs became my area of interest. When we finally received data from the state and the city, we looked at the unanswered CRQs, in addition to the percentage of students who gave answers to questions but received scores of zero.

Findings.

Here are seven takeaway points with regard to student performance on the ELA exams, more specifically how many students received zeros on the CRQs:

There was a steep and immediate increase in the percentage of zeroes that New York state students received on the CRQs in 2013, when the Common Core-aligned tests debuted.
Particularly sharp increases were evidenced over time in the percentage of zeroes for students in Grades 3 and 4 statewide; this was also true for English Language Learners and students with disabilities in New York City. (The data DOE provided allowed us to analyze the ELA’s impact on subgroups of students.)
For 3^rd graders, the percentage of zero scores rose from 11% in 2012 to a plateau of 21% in the state and city from 2013 to 2016.
In addition to zero scores, there were increasing percentages of unanswered questions statewide (a different category than zeroes) in 2013 over 2012, particularly for Grade 3.
SED removed time limits from the state exams in 2016. This reduced the number of unanswered CRQs but the number of zeros stayed the same.
There were high percentages of students in grades 3 citywide who got zeroes on half or more of the CRQs, ranging from 5% in 2012 to 13% in 2016. The percentages who got five or more zeroes were much higher for ELLs (33%) and students with disabilities (35%).
DOE data revealed that higher percentages of Black and Hispanic students wrote answers deemed incoherent, irrelevant or incomprehensible than did their Asian-Caucasian counterparts. The gap in zero scores between minority group and white students widened between 2012 and 2015, especially in Grades 3 and 4.

Though our findings for subgroups are based on New York City data, a reasonable assumption can be made that they hold for groups statewide since the city forms a major share (37%) of the state’s test population and its constituent groups. Parents throughout the state must let education officials in Albany, their legislators and the Governor know they are dissatisfied with an unaccountable, damaging testing system that has devolved over time. We cannot let another year go by without significant changes in the testing program.

Thoughts, Warnings and Suggested Follow-Up.

Lack of Transparency - Clearly, SED’s suppression of data and lack of transparency is a policy that must be reversed. Lack of timely information allows poor practices to continue unchallenged by objective data. While SED may be the custodian of the information, it is owned and paid for by New York taxpayers. We must demand that NYSED post complete Technical Reports, along with full information about item responses and scoring distributions, promptly after tests are given, as occurred in the past. Questar, the current test vendor, replaced Pearson in 2017, and must provide the kind of item-level data that were previously made public and eventually obtained concerning Pearson’s exams during its five-year run.

Many complaints about current Questar’s exams are similar to those concerning Pearson’s tests. See the observations of teachers who participated in NYSUT’s “Share Your Story” campaign, as compiled in their report, The Tyranny of Testing.

Yet new problems have emerged with the untimed exams. Some children took up to six hours or more to complete the exams, missing lunch in the process. In the many of the 263 New York schools conducting Questar’s computer-based exams, there were glitches and disruptions. It was later revealed that there were Questar breaches of student personal data, in New York and Mississippi. In other words, children were subjected to beta testing for SED and its vendor.

Tennessee has announced it will seek a new test vendor because of all the problems with the administration of the Questar computerized exams. What mechanism exists in New York State to recover money from Questar for poor performance or to terminate the contract? Is SED’s working relationship with test publishers so close that they are, in effect, partners and SED cannot reject Questar without implicating itself in any misdeeds? Can the offices of the State Comptroller and AG intervene?

Warning# 1 – Stand-Alone Field Tests. SED continues to target a large number of schools to try out test material for future operational tests. Separate, no-count field tests are administered at the end of the year on school time. Data gained in this process comes from unmotivated students and yields items that don’t predict how students will respond when publishers select items to go on the “real” exams. This has led to poor operational tests being built by the likes of Pearson and Questar.

Warning#2 – Don’t Attach New Purposes To Bad Tests. Amidst the controversy over the Specialized High School Admissions Test (SHSAT), the Mayor has proposed using students’ state test scores as part of the selection process. This would put higher stakes on these unreliable exams and add an excuse to keep the state testing program alive in its present form.

In addition, the 4^th grade ELA is used to screen children for middle schools. Our report finds this test has posed sharp problems for many fourth graders. In effect, the 4^th grade test has become a gatekeeper for the “better” New York City middle schools—which are known for getting their kids into the selective high schools. Would any of this solve the underlying problem of quality education for all children?

As long as an ill-conceived testing program exists, reasons will be found to justify its continuance. Though federal law requires standardized tests for grades 3-8 and once in high school, we should ask ourselves what constitutes a legitimate assessment process. Eliminating standardized tests would save vast amounts of money, lower the fever caused by tests, focus on the whole child, and value teachers, other subject matter and different forms of expression..

Having spent 50 years in the testing trenches, and as I look at two dust-covered books, Jonathan Kozol’s Savage Inequalities (1991) and Andrew Hacker’s Two Nations (1992), I would hate to see us wake up in three years, up to our eyeballs in computerized testing, with an ever-increasing achievement gap, still enslaved by the simple-minded dogma that we must have quantitative data, however unreliable and invalid, to make decisions. We have to be smart enough to change an injurious mindset—one that has struck me over time as more intentional than misguided.

~Fred Smith 7-15-18