Wednesday, September 5, 2007

“Here are the numbers. Everything is improving. Trust us.”

“Tell me the result you want, and I’ll find a way to make the numbers show it."

Anyone with a good feel for numbers and a background in math or statistics will recognize the ironic truth underlying this statement about mathematical deception. Nowhere in recent years has this approach seemed more evident than in the NYC Department of Education, most recently in the NY Daily News story about the decreasing level of difficulty in the NY State 4th Grade math exams.

As Erin Einhorn reported in yesterday's Daily News , NYC pass rates on the 4th Grade exams in math since 2002 have tracked very closely to the degree of difficulty of the exam itself. As the tests have gotten easier, the NYC (and NYS) pass rates have increased almost in perfect lockstep. This determination was made by obtaining information from the State Education Department, but the Daily News supplemented it with their own small-scale lab experiment with 34 youngsters who were participating in a Brooklyn College summer program. The kids took the 2002 and 2005 NYS math exams and, for the most part, concurred that the 2005 test was easier. In fact, 24 of the 34 young students did better on the 2005 exam.

Even for those well-versed in mathematical statistics, psychometrics (the science of testing and intelligence measurement) is an arcane subject. Suffice to say, all standardized tests are created from large banks of potential questions, each of which is field tested to determine its degree of difficulty, called a p-value. A question with a p-value of 0.65 means that 65% of students tested will likely get that question right, based on the field test results. A test can therefore be “rated” with a p-value according to the values of the questions chosen to make up that test.

Since 2002, the p-value of the state 4th Grade math exam has risen from 0.61 to 0.74 in 2007. Remarkably, the number of questions needed to get a passing score has actually decreased at the same time, from 40 in 2002 to 39 in 2007. Not surprisingly, the NYC pass rate has increased from 52% to 74.1%, with the largest year-to-year increase occurring in 2003 when the p-value had its biggest increase (from 0.61 to 0.68). Defenders of the tests argue that the pass rate for an easier test can be adjusted by requiring more correct answers to pass, but how many of us (adult or child) would prefer an easy test requiring 40 correct answers to a more representative test requiring 39 or 38 correct answers? I certainly would.

In The Truth about Testing, author W. James Popham points out that questions with p-values between 0.4 and 0.6 are best suited to creating the degree of score spread necessary for a standardized test questions to be reliable differentiators of student performance (higher score spreads are interpreted as indicators of higher test reliability in measuring whatever it’s supposed to measure). It seems odd that the State can be creating tests with a p-value of 0.74, suggesting that 74% of students should be able to pass (and, in fact, exactly 74.1% did just that last year in NYC). Yet the Mayor and Chancellor have been solely crediting their own new policies for increases that would be expected just by the increasingly easier nature of the exam itself.

Mayor Bloomberg and Chancellor Klein have aggressively adopted the business model mantra that whatever isn’t measured can’t be managed (and incentivized) to the world of public school education. This policy has resulted in more and more standardized testing of students and increased collection and manipulation of school-level data on every aspect of school operations. Success or failure of Mayoral control, we are led to believe by the Chancellor, is entirely contingent on these measures: test scores, attendance levels, graduation rates, school report cards, and the like.

Bad enough that this is the factory model par excellence applied to education – you can’t measure a student’s found passion for theater or chorus, a discovered pleasure in reading, the classroom excitement over a new insight into the process of history, the joy of seeing one’s poem or short story in print, or the deep self-satisfaction that comes from further, teacher-led independent exploration of a science topic. Worse, it is solely the DOE that collects, cleans, massages, selects, and controls this information behind closed computer room and office doors. Since no external oversight, audit agency, independent review or verification exists, the message is clear: “Here are the numbers. Everything is improving. Trust us.”

Disturbingly, it’s the fox who is not only running the hen house but also reporting the excellence of the results to the farmer.

-- Steve Koss, PTA President of Manhattan Center for Science and Math

Update: see today's Daily News, where leading experts call for an independent audit agency as exists in other states, to verify the validity of these exams.

1 comment:

Anonymous said...

Thank you for alerting us. The amount of anecdotal evidence that something is wrong is great, as are the reasoned analyses. But having data that says that the tests are inconsistent is valuable.