I've
looked at the information SED (Commissioner Elia) set forth last week on
EngageNY. It consists of a grade by grade release of 75% of the reading passages and multiple-choice items that appeared on the April ELA exams--by
way of Pearson to Questar. All of the material and questions that
required constructed responses have been provided, as well. I did not
look at the math test.
It is
true that the amount of operational test material and the number of items
disclosed is more than was given out in each of the prior three years of
Pearson's core-aligned testing. And since 2012, this is the earliest
this has happened. [Note: When CTB/McGraw-Hill was the test publisher
during the NCLB years, the complete test was accessible to the public on SED's
web site within weeks of its administration, along with answer keys. Item
analysis data followed shortly thereafter.]
Upon
review of the just-released spring 2016 testing output, however, certain
useful data have not been made available. SED has been moved
to offer us a translucent view of the exams, but it still is not
being entirely transparent.
In
order to make the SED information more accessible to reviewers, I re-cast
it in the attached Excel workbook. It shows the name of the reading
passages that were released, the type of item involved (M-C or CR), the
sequence number of each item and the word count of each
passage.
In
addition, there are four separate measures of readability (also
referred to by Pearson/SED as the "complexity metrics"). And beside
these numerical indexes, there is a column called Qualitative Review,
where the appropriateness of the material is judged. The outcome of
SED's review process was that all 55 ELA reading passages were deemed to
be appropriate. The keyed correct answer to each M-C item is presented at the
end.
I
have four misgivings about SED's presentation:
1-
Twenty-five percent (25%) of the passages and multiple-choice items that
counted on the tests remain unreleased. This amounts to six reading
selections and a combined 40 items that appeared in Book 1 on the operational
exams given on April 5th. Why should that be the case in
light of the Commissioner's goal of providing more information?
Non-disclosure raises questions about the quality of the material being
withheld. If you remember the contract, the 2016 operational material
came from Pearson's item bank. Since this is supposedly Pearson's
last year and since NYS owns the material, why hold
back?
As a
consumer issue, how can SED justify that we, taxpayers, purchased a product we
cannot see? [And now that Pearson is on the way out, what about
releasing the mounds of 2013-2015 test content that no one has been allowed to
talk about? Or did Pearson’s 5-year contract concede ownership of the
still hidden material to the vendor?]
2-
Item statistics have not been made available. This kind of
overarching data based on how the test population performed on each item is
useful to researchers, analysts and anyone interested in seeing how the items
functioned. The items are the bricks that go into constructing the exams
and on whose strength and quality decisions about children, teachers and
schools have come to depend.
Prior
to Pearson, CTB/McGraw-Hill posted the item-level analytic data within months
of test administration. This included item difficulties (p-values =
percentage of children choosing the correct answer) on multiple-choice items
or the average score on constructed response questions. CTB also showed
how students responded to each distractor—i.e, the proportion of students
choosing the wrong answers. Such data provide insights into possible
weaknesses in the items (ambiguous choices, more than one best answer,
distractors that are non-functional). And a correlation was provided
(known as the item discrimination index) showing the relation between
performance by students on an item and their performance on the entire test.
The expectation is that students who do well on an item also do well on the
test.
This
full set of statistics—referred to as classic item analysis data—ceased being
presented after Pearson won the testing contract. Since then, only some
statistics have been provided—and more than a year after the operational tests
have been given—ensuring that the exams could not undergo scrutiny until after
Pearson’s next round of testing had taken place.
In
the absence of empirical data, a vacuum that SED created, the department has
been able to blunt criticisms of the exams—at first, dismissing them as
anecdotal, and then, when the complaints became widespread, providing partial
data (p-values and discrimination indexes) but well past the time SED had
complete information readily on hand, yet didn’t make any available to those
who might otherwise have had facts with which to challenge the exams. [Aside:
I remember when a member of Governor Cuomo’s Task Force attempted to stifle
Lisa Rudley’s critique that the Common Core Standards and core-aligned exams
were being advanced without sound research data to prove their efficacy.
He pointedly asked where her evidence was to support complaints about flaws in
the exams. Given SED’s reluctance to dispense information, this was a
preposterous question.]
The
demand for complete timely data is not academic or trivial. Let’s look
at just-released Item# 37 from the Grade 6 ELA passage Weed Wars.
You may recall that Leonie Haimson came upon and brought to light information
that one passage contained the confusing concept of “impossibly improbable”.
Here now from SED’s rush to divulge 2016 information is the
statement in question and the multiple-choice options 11-year olds had to
choose from in answering.
Once
in a while, changes to a weed’s DNA would allow that weed to survive the
glyphosate. The chances of changes like this were very, very
small. But when farmers used glyphosate years after year on millions of
hectares1 of crops, “what seems almost impossibly improbable
becomes more probable,” Duke says.
37.
What is the meaning of the phrase “impossibly improbable” as it is used in
lines 21 through 23?
A usually certain
B highly unlikely
C extremely slow
D rarely noteworthy
This
item won’t reach the game-changing level of ridicule that Leonie’s exposure of the Pineapple and the Hare did in 2012. But it underscores the
value of having statistics available to evaluate items. What percentage
of kids chose the correct answer (B)? How did the distractors
work? That is, what proportion of children chose each of the wrong
answers? Having analytic data would enable us to see how this dubious
item played out.
According
to the Learning Standards, #37 is coded RI. 6.4, which means it was classified
as a Reading for Information item to see whether sixth graders can “determine
the meaning of words and phrases as they are used in a text, including
figurative, connotative and technical meanings.”
My
guess is that most kids got the item right—making this an “easy” item—because
the distractors seem implausible. I would mark it as a poor item for two
reasons: It can be answered correctly without reading the passage from which
it is drawn; and the distractors likely didn’t carry much weight. Of course,
my hunch may be off. Perhaps many children chose A in response to this
convoluted question. We shouldn’t be left to speculate, however, in the
absence of data that SED has in its possession. Note: SED already has
the statistics sought virtually as soon as the tests are scored or else it
couldn’t issue the instructional reports it just distributed as referenced in
Elia’s June 2016 letter to colleagues,
3-
SED took away information it provided from 2013 – 2015 when it released
questions with annotations. The information was posted in August of
those years in EngageNY. In this year’s zeal to reveal more items sooner, SED
has not presented the statewide p-values for the items as it had over the last
three years. Significantly, the annotations, which SED described as teaching
tools, are also gone. So, thus far this year SED has offered more items
but has not included a rationale “to demonstrate why any of the released
questions measures the intended standards; why the correct answer is correct;
and why each wrong answer is plausible but incorrect.” It is helpful to
gain SED’s perspective about the material and its defense of the correct
answer choice. Unfortunately these explanations have not come out.
I think we should campaign to have SED’s annotations for the 2016 material
released immediately,
4-
SED failed to follow its own decision-making rules regarding which reading
passages were appropriate to include on the operational exams. Four ways
to estimate the readability of potential passages were used in constructing
the ELA tests: the Lexile Framework, Flesch-Kincaid, the Degrees of Reading
Power and the Reading Maturity Metric (a Pearson measure). Each involves a
scale that can be applied to reading material and sets forth a range that is
appropriate for each grade. For example, the Lexile Framework indicates
that reading material ranging from 740L – 1010L is appropriate for
4th and 5th graders. Ergo, material outside that
band may not be right for children in these grades.
SED
and Pearson applied three of the four methods to each selection and said in
releasing this year’s material that “to
make the final determination as to
whether a text is at grade-level and thus appropriate to be included on a
grade 3-8 assessment, all prospective passages undergo quantitative text
complexity analysis using three text complexity measures…. Only passages that
are determined appropriate by at least two of three quantitative measures of
complexity and are determined appropriate by the qualitative
measure of complexity are deemed appropriate for use on the
exam.
In
reviewing SED’s latest data on the 2016 exams, I counted 11 of the 55
operational passages as failing to meet the criterion that they had to be
found to be appropriate by at least two of the methods. I don’t know how
SED will resolve this contradiction.
~~~~~~~~~~~~~~~~~~
Two
final observations: The material just released by SED makes no mention of the
Common Core Learning Standards—as had been the case in the Released
Questions with Annotations of 2013-2015. Instead, SED has kept the
boilerplate found in these releases and reverts to New York state p-12
Learning Standards as the framework it follows. Nor could I find any
reference to “college and career readiness. I guess they have been discarded
due to the botched implementation of the Common Core.
Finally,
I think we should press SED for the missing information outlined above and
keep demonstrating how the department and commissioner continue to pose as
being responsive, while taking business as usual actions. Once we let
up, they will fall back to disdaining that messy part of democracy—the will of
the people.
- Fred Smith
Id like to ask Fred Smith to evaluate and comment on this question. It's only one item, but every year that we have had access to the tests there have items have had overly plausible distractors or question stems that ask for one thing and reward another. In this case pvalues may work out, but nonetheless, every child who gets this question wrong for the right reason has been unfairly assessed as weaker by one question. I would love to get expert opinion on this question.
ReplyDeletehttps://docs.google.com/presentation/d/1Pqnc3rSoaBTgW9fxkfoWfr9IuHymOfTLHrUv228726g/present?slide=id.gae5b27721_0_9