I've 
  looked at the information SED (Commissioner Elia) set forth last week on 
  EngageNY.  It consists of a grade by grade release of 75% of the reading   passages and multiple-choice items that appeared on the April ELA exams--by 
  way of Pearson to Questar.  All of the material and questions that 
  required constructed responses have been provided, as well. I did not 
  look at the math test.
  
It is 
  true that the amount of operational test material and the number of items 
  disclosed is more than was given out in each of the prior three years of 
  Pearson's core-aligned testing.  And since 2012, this is the earliest 
  this has happened.  [Note: When CTB/McGraw-Hill was the test publisher 
  during the NCLB years, the complete test was accessible to the public on SED's 
  web site within weeks of its administration, along with answer keys. Item 
  analysis data followed shortly thereafter.] 
Upon 
  review of the just-released spring 2016 testing output, however, certain 
  useful data have not been made available. SED has been moved 
  to offer us a translucent view of the exams, but it still is not 
  being entirely transparent.
In 
  order to make the SED information more accessible to reviewers, I re-cast 
  it in the attached Excel workbook. It shows the name of the reading 
  passages that were released, the type of item involved (M-C or CR), the 
  sequence number of each item and the word count of each 
  passage.
In 
  addition, there are four separate measures of readability (also 
  referred to by Pearson/SED as the "complexity metrics"). And beside 
  these numerical indexes, there is a column called Qualitative Review, 
  where the appropriateness of the material is judged.  The outcome of 
  SED's review process was that all 55 ELA reading passages were deemed to 
  be appropriate. The keyed correct answer to each M-C item is presented at the 
  end.
 I 
  have four misgivings about SED's presentation:
1- 
  Twenty-five percent (25%) of the passages and multiple-choice items that 
  counted on the tests remain unreleased.  This amounts to six reading 
  selections and a combined 40 items that appeared in Book 1 on the operational 
  exams given on April 5th.  Why should that be the case in 
  light of the Commissioner's goal of providing more information?  
  Non-disclosure raises questions about the quality of the material being 
  withheld. If you remember the contract, the 2016 operational material 
  came from Pearson's item bank.  Since this is supposedly Pearson's 
  last year and since NYS owns the material, why hold 
back?
As a 
  consumer issue, how can SED justify that we, taxpayers, purchased a product we 
  cannot see?  [And now that Pearson is on the way out, what about 
  releasing the mounds of 2013-2015 test content that no one has been allowed to 
  talk about?  Or did Pearson’s 5-year contract concede ownership of the 
  still hidden material to the vendor?]
2- 
  Item statistics have not been made available.  This kind of 
  overarching data based on how the test population performed on each item is 
  useful to researchers, analysts and anyone interested in seeing how the items 
  functioned.  The items are the bricks that go into constructing the exams 
  and on whose strength and quality decisions about children, teachers and 
  schools have come to depend.  
Prior 
  to Pearson, CTB/McGraw-Hill posted the item-level analytic data within months 
  of test administration.  This included item difficulties (p-values = 
  percentage of children choosing the correct answer) on multiple-choice items 
  or the average score on constructed response questions.  CTB also showed 
  how students responded to each distractor—i.e, the proportion of students 
  choosing the wrong answers. Such data provide insights into possible 
  weaknesses in the items (ambiguous choices, more than one best answer, 
  distractors that are non-functional).  And a correlation was provided 
  (known as the item discrimination index) showing the relation between 
  performance by students on an item and their performance on the entire test. 
  The expectation is that students who do well on an item also do well on the 
  test.
This 
  full set of statistics—referred to as classic item analysis data—ceased being 
  presented after Pearson won the testing contract.  Since then, only some 
  statistics have been provided—and more than a year after the operational tests 
  have been given—ensuring that the exams could not undergo scrutiny until after 
  Pearson’s next round of testing had taken place.
In 
  the absence of empirical data, a vacuum that SED created, the department has 
  been able to blunt criticisms of the exams—at first, dismissing them as 
  anecdotal, and then, when the complaints became widespread, providing partial 
  data (p-values and discrimination indexes) but well past the time SED had 
  complete information readily on hand, yet didn’t make any available to those 
  who might otherwise have had facts with which to challenge the exams. [Aside: 
  I remember when a member of Governor Cuomo’s Task Force attempted to stifle 
  Lisa Rudley’s critique that the Common Core Standards and core-aligned exams 
  were being advanced without sound research data to prove their efficacy.  
  He pointedly asked where her evidence was to support complaints about flaws in 
  the exams.  Given SED’s reluctance to dispense information, this was a 
  preposterous question.] 
The 
  demand for complete timely data is not academic or trivial.  Let’s look 
  at just-released Item# 37 from the Grade 6 ELA passage Weed Wars.  
  You may recall that Leonie Haimson came upon and brought to light information 
  that one passage contained the confusing concept of “impossibly   improbable”.  
Here now from SED’s rush to divulge 2016 information is the 
  statement in question and the multiple-choice options 11-year olds had to 
  choose from in answering.
Once 
  in a while, changes to a weed’s DNA would allow that weed to survive the 
  glyphosate.  The chances of changes like this were very, very 
  small.  But when farmers used glyphosate years after year on millions of 
  hectares1 of crops, “what seems almost impossibly improbable 
  becomes more probable,” Duke says.
  
37.   
  What is the meaning of the phrase “impossibly improbable” as it is used in 
  lines 21 through 23? 
  
         
  A   usually certain
  
         
  B   highly unlikely
  
         
  C   extremely slow
  
         
  D   rarely noteworthy
This 
  item won’t reach the game-changing level of ridicule that Leonie’s exposure of   the Pineapple and the Hare did in 2012.  But it underscores the 
  value of having statistics available to evaluate items.  What percentage 
  of kids chose the correct answer (B)?  How did the distractors 
  work?  That is, what proportion of children chose each of the wrong 
  answers?  Having analytic data would enable us to see how this dubious 
  item played out.  
According 
  to the Learning Standards, #37 is coded RI. 6.4, which means it was classified 
  as a Reading for Information item to see whether sixth graders can “determine 
  the meaning of words and phrases as they are used in a text, including 
  figurative, connotative and technical meanings.” 
My 
  guess is that most kids got the item right—making this an “easy” item—because 
  the distractors seem implausible. I would mark it as a poor item for two 
  reasons: It can be answered correctly without reading the passage from which 
  it is drawn; and the distractors likely didn’t carry much weight. Of course, 
  my hunch may be off.  Perhaps many children chose A in response to this 
  convoluted question. We shouldn’t be left to speculate, however, in the 
  absence of data that SED has in its possession.  Note: SED already has 
  the statistics sought virtually as soon as the tests are scored or else it 
  couldn’t issue the instructional reports it just distributed as referenced in 
  Elia’s June 2016 letter to colleagues,
3- 
  SED took away information it provided from 2013 – 2015 when it released 
  questions with annotations.  The information was posted in August of 
  those years in EngageNY. In this year’s zeal to reveal more items sooner, SED 
  has not presented the statewide p-values for the items as it had over the last 
  three years. Significantly, the annotations, which SED described as teaching 
  tools, are also gone.  So, thus far this year SED has offered more items 
  but has not included a rationale “to demonstrate why any of the released 
  questions measures the intended standards; why the correct answer is correct; 
  and why each wrong answer is plausible but incorrect.”  It is helpful to 
  gain SED’s perspective about the material and its defense of the correct 
  answer choice.  Unfortunately these explanations have not come out.  
  I think we should campaign to have SED’s annotations for the 2016 material 
  released immediately,
4- 
  SED failed to follow its own decision-making rules regarding which reading 
  passages were appropriate to include on the operational exams.  Four ways 
  to estimate the readability of potential passages were used in constructing 
  the ELA tests: the Lexile Framework, Flesch-Kincaid, the Degrees of Reading 
  Power and the Reading Maturity Metric (a Pearson measure). Each involves a 
  scale that can be applied to reading material and sets forth a range that is 
  appropriate for each grade.  For example, the Lexile Framework indicates 
  that reading material ranging from 740L – 1010L is appropriate for 
  4th and 5th graders.  Ergo, material outside that 
  band may not be right for children in these grades.  
  
SED 
  and Pearson applied three of the four methods to each selection and said in 
  releasing this year’s material that “to 
  make the final determination as to 
  whether a text is at grade-level and thus appropriate to be included on a 
  grade 3-8 assessment, all prospective passages undergo quantitative text 
  complexity analysis using three text complexity measures…. Only passages that 
  are determined appropriate by at least two of three quantitative measures of 
  complexity and are determined appropriate by the qualitative 
  measure of complexity are deemed appropriate for use on the 
  exam.
In 
  reviewing SED’s latest data on the 2016 exams, I counted 11 of the 55 
  operational passages as failing to meet the criterion that they had to be 
  found to be appropriate by at least two of the methods.  I don’t know how 
  SED will resolve this contradiction.
~~~~~~~~~~~~~~~~~~
Two 
  final observations: The material just released by SED makes no mention of the 
  Common Core Learning Standards—as had been the case in the Released 
  Questions with Annotations of 2013-2015. Instead, SED has kept the 
  boilerplate found in these releases and reverts to New York state p-12 
  Learning Standards as the framework it follows.  Nor could I find any 
  reference to “college and career readiness. I guess they have been discarded 
  due to the botched implementation of the Common Core.
Finally, 
  I think we should press SED for the missing information outlined above and 
  keep demonstrating how the department and commissioner continue to pose as 
  being responsive, while taking business as usual actions.  Once we let 
  up, they will fall back to disdaining that messy part of democracy—the will of 
  the people.
- Fred Smith
 
 
 
 
 
 
1 comment:
Id like to ask Fred Smith to evaluate and comment on this question. It's only one item, but every year that we have had access to the tests there have items have had overly plausible distractors or question stems that ask for one thing and reward another. In this case pvalues may work out, but nonetheless, every child who gets this question wrong for the right reason has been unfairly assessed as weaker by one question. I would love to get expert opinion on this question.
https://docs.google.com/presentation/d/1Pqnc3rSoaBTgW9fxkfoWfr9IuHymOfTLHrUv228726g/present?slide=id.gae5b27721_0_9
Post a Comment