How Common Core-Based Tests Damage Instruction in the Central Component of Reading Comprehension

Oct 20, 2015 by

How Common Core-Based Tests Damage Instruction in the Central Component of Reading Comprehension

Sandra Stotsky – We know from a hundred years of research that knowledge of word meanings is the key component of reading comprehension (see, for example,; So it stands to reason that a reading test should assess this component of reading instruction directly, in addition to assessing word knowledge indirectly in test items that purport to assess comprehension of selected passages. This essay looks at the choice of words and the format for direct assessment of vocabulary knowledge in Common Core-based online practice tests in PARCC’s 2015 PBA and EOY assessments (both computer- and paper-based) ( and in Common Core-based online practice tests provided by SBAC in 2015 ( A more detailed analysis will appear in a forthcoming Pioneer Institute White Paper titled How the false “rigor” of PARCC retards the growth of all students co-authored by Mark McQuillan, Richard Phelps, and Sandra Stotsky.

Choice of Vocabulary and Format for Assessment in 2015 PARCC Practice Tests

PARCC claims it assesses “words that matter most in the texts, which include words essential to understanding a particular text and academic vocabulary that can be found throughout complex texts.” PARCC explains that “Assessment design will focus on student use of context to determine word and phrase meanings.” As a result, it uses one particular format for its vocabulary items fairly consistently. Part A of a two-part multiple-choice answer format typically asks directly for the meaning of a word or phrase, as on MCAS tests. PARCC then requires students in Part B to locate “evidence” or “support” in the text (guided by the content of the four optional answers—or Evidence-Based Selected Responses, or EBSR) for their choice of answer to Part A. PARCC consistently uses this two-part multiple-choice answer format for assessing both vocabulary and reading comprehension.

It should be pointed out that Part B is less critical than Part A for the test score. If the test-taker gets Part B correct, he must also get Part A correct to get full credit. If the test-taker gets only Part B correct, he gets no credit for Part A or B. How many children know that when they take the test is unknown.

The impetus for this design feature is most likely Common Core’s vocabulary standards. Its general standard is: “Determine or clarify the meaning of unknown and multiple-meaning words and phrases based on [grade-level] reading and content, choosing flexibly from a range of strategies.” Despite the word “flexibly,” the first strategy at every grade level is always “use context as a clue to the meaning of a word or phrase,” regardless of the source of the unknown word or genre in which it is used. As PARCC indicates, “Tier III vocabulary—also referred to as domain-specific vocabulary—may also be assessed, when the meaning of the word(s) can be determined through the context of the informational text.

In examining its vocabulary practice test items, I encountered a number of issues in addition to PARCC’s consistent use of a misguided and misleading format for assessing vocabulary knowledge: developmentally inappropriate test directions; a puzzling choice of words for vocabulary assessment; and a mismatch between answer options and a dictionary meaning of the word. The following examples illustrate these issues.

First question set in grade 3
(For a story about animals by Thornton Burgess)

Part A. “What does cross mean….?” The answer to Part A has to be a synonym for cross in the context of this story. “Upset” is the only one that could make sense as the other options are “excited,” “lost,” and “scared.” However, that is not what cross means in a dictionary. Google gives us: “marked by bad temper, grumpy” or “annoyed, angry.” Moreover, a grade 3 student might have picked up some understanding of the meaning of the word from hearing about a cross grandmother or a cross look on someone’s face (or from hearing Burgess stories as a pre-schooler when read to from their picture book editions). Since the three wrong choices are much farther away in meaning than is “upset,” “upset” might well be chosen as the correct answer by a process of elimination even if not quite right in the reader’s own experience with the word.

The most serious issues concern the wording and meaning of the question in Part B. The question in Part B is: “Which statement best supports the answer to Part A? The correct answer is: “…hadn’t found The Best Thing in the World.” But two of the four choices, including the correct answer, are phrases, not “statements” Moreover, it is not clear what the question itself means. What can it mean to a third grader to have a question about a “statement” that “best supports the answer to Part A” (keeping in mind that the correct answer in Part A may not seem to be the correct answer to some children)? This is not child-friendly language.

A grade 3 teacher would have asked orally something like: “Why were all the animals unhappy or angry at the end of the story? Agreed, it doesn’t force the reader to go back to the story to find specific words that some test-item writer thinks “supports” the answer. But a grade 3 teacher would have been unlikely to use “supports.” If a teacher had written the test question, she might have worded the Part B question as “What phrase (or words) in the story best explains why the animals were unhappy at the end of the story?” In this case, the child doesn’t have to look back at the answer to Part A. The question is about comprehension of the story, not the question and answer in Part A. And the point is still made that the answer to the Part B question is in the text.

First question set in grade 4
(For a 2012 story about children in an elementary classroom by Mathangi Subramanian)

Part A asks for the meaning of drift. The correct answer is “wander.” The other choices are “hover,” “consider,” and “change.” Google gives us: “to move slowly, esp. as a result of outside forces, with no control over direction.” As in: “He stopped rowing and let the boat drift.”

Part B asks “Which detail from the story helps the reader understand the meaning of drift?” Alas, none of the answers is correct. The intended correct answer is: “Lily, Jasper, and Enrique make comments about the drawings as the students come close enough to see them.” But only Lily and Jasper make comments in the story. Enrique asks a question. A careful reader would be very bothered by a poorly-worded question and no fully correct answer. These details (not just one detail, as the question implies) do not help any reader to understand the meaning of drift.

First question set in grade 8
(For a 2009 novel about a Hispanic American teen-ager by Diana Lopez)

Part A asks for the meaning of sarcasm as used in the Lopez novel. The correct answer is “a remark indicating mockery and annoyance.” However, Google defines the word as “the use of irony to mock or convey contempt. Synonyms: derision, mockery, ridicule, scorn, sneering, scoffing.” The word “annoyance” is not there. It is not clear why the test-writer didn’t use “contempt” instead of “annoyance”? The right answer would then have accurately pointed to the young girl’s lack of respect in speaking to her father, an attitude that helps to explain why she uses the book her father gave her as a coaster for a glass of soda. Although the right answer for another question points to resentment rather than contempt as the motivation for her behavior, her behavior may be better understood as contempt. The questions thus frame a somewhat inaccurate interpretation, confusing children who have been taught that sarcasm to parents or other elders is a sign of disrespect.

Question set for an informational article on elephants in grade 8
Confusion may also result from the answer to the Part A question about the meaning of anecdotal observations in an article on elephants. The only possible right answer is “a report that is somewhat unreliable because it is based on a personal account.”

The test developers had to have known they were skating on the edge of a precipice. Personal accounts of the texts students read or of the events in their neighborhood have been pedagogically elicited for decades in efforts to “engage” students through their own daily life while, at the same time, relieving them of learning how to make supported interpretations of texts, events, people, and/or movements in place of groundless or simply emotionally-driven personal opinions—something Common Core promised to remedy.

Part B asks for the “best evidence” for this meaning in the article. But the correct answers for both Part A and Part B are misleading and scientifically wrong. A report based on an anecdotal observation is unreliable not because it is based on a personal account (which is characteristic of observation-based field reports in many disciplines) but because it has too few subjects (maybe only one, idiosyncratically chosen) and does not include a large enough random sample to serve as the basis for a defensible generalization. Thus, the only sentence that seems to make sense as the right answer for Part B (“But it’s one thing to witness something that looks like consolation, and another to prove that this is what elephants are doing.”) is highly misleading. Unreliability does not necessarily result from someone witnessing as opposed to proving something. In this article it refers to claiming that some observed animal behavior demonstrates consolation, implying that animal behavior is motivated in the same way that human behavior is. Students are misled by Part A to think that the unreliability of an anecdotal observation is a result of its being a “personal account,” not a result of a false assumption. They are then led to choose a largely wrong answer because the author of the article didn’t indicate correctly why an anecdotal observation of elephant behavior is scientifically unreliable.

First question set in grade 10
(For an excerpt from a story by an American writer about Japanese children and cranes.)

Part A asks for the meaning of resonant. The choices are “intense,” “familiar,” “distant,” and “annoying.” Google offers this definition:

“(of sound) deep, clear, and continuing to sound or ring as in “a full-throated and resonant guffaw”
deep, low, sonorous, full, full-bodied, vibrant, rich, clear, ringing;
loud, booming, thunderous such as in “a resonant voice”
(of a place) filled or resounding with (a sound) as in “alpine valleys resonant with the sound of
church bells”
reverberating, reverberant, resounding, echoing, filled as in “valleys resonant with the sound of church bells”

By a process of elimination, the word least likely to be wrong in the four choices is “intense,” even though resonant usually refers to the continuing nature of a sound. “Intense” is not the right answer to a student who knows from experience or reading that a resonant sound is one that continues long after the action that caused the sound.

Thus, there is no answer in the choices in Part B, which asks: What quotation from Paragraph 3 helps clarify the meaning of resonant? None does. The intended right answer: “they’re so loud…” doesn’t clarify the meaning of resonant. A more relevant choice is: “I wasn’t sure where their calls were coming from,” although it does not so much clarify the meaning of resonant as reflect it. (In other words, the cranes’ calls last so long that it is difficult to figure out where they actually are as they fly around.) This sentence also precedes resonant in the text so that if the reader doesn’t already know the meaning of resonant as continuing sound, then the child’s comment in the story makes little sense to the reader. In a reading lesson, the word would be one of the vocabulary items that are pre-taught before students read the selection (which is more suitable for middle than high school students).

Choice of Vocabulary and Format for Assessment in 2015 SBAC Practice Tests

I explored the practice tests provided by the other consortium developing tests of Common Core’s standards to find out its format for vocabulary assessment. SBAC does not seem to assess as many vocabulary words as PARCC does. When it does assess the meaning of a word or phrase, it may ask directly for the meaning or, in a format reversal, for a word in the selection that means what the question itself provides as its meaning (see the format for stacked in grade 3 and for whispered in grade 4). SBAC does not use for vocabulary test items the two-part multiple-choice answer format used by PARCC for all vocabulary test items and for all other multiple-choice test items. SBAC uses the two-part format occasionally but only for other types of items.

Interestingly, SBAC does expect an advanced vocabulary to be used by teachers and test-item writers even in the primary grades and even when below-grade-level reading passages are used, and it provides a long list of such words described as “construct relevant vocabulary” ( This list “refers to any English language arts term that students should know because it is essential to the construct of English language arts. As such, these terms should be part of instruction.” For example, the list for grade 3 includes: source(s), specific word choice, spelling errors, stanza, supporting details, trustworthy source, and verb tense. SBAC notes that these words will not be explained on the test. It expects teachers to “embed” them in instruction.

These grade-level lists for teachers and test-item writers help to explain the absence of child-friendly language in both SBAC and PARCC test items. They also send the strong message that elementary teachers are going to have to learn precisely what these terms mean and use them regularly as part of daily instructional talk. It is not at all clear where that learning is to take place. Our elementary teaching force does not normally take the kind of linguistics coursework that helps them internalize the exact meanings of many of these terms (e.g., verb tense and tense shifts).

However, while SBAC is very clear about the language that teachers and test-item writers can use in test items and instruction, it is not clear about its criteria for choosing words to assess for student knowledge of their meanings. It is possible that the difficulty of the words chosen for assessment or for an answer option was determined by the reading level of the selections in which they were found. In addition to a readability formula, SBAC seems to be using a set of subjective variables for determining “text complexity” (e.g., knowledge demands, language features, and text structure). No clear statements can be found on criteria for determining grade-level difficulty of literary or informational reading passages or for words and phrases to be assessed.

Final Observations

Format for Assessing Vocabulary
It is not clear why PARCC consistently focuses on student use of context to determine word and phrase meanings.” Such a focus assumes context can be relied on to determine word and phrase meanings, most if not all of the time. Indeed, the assumptions seem to be that the acquisition of most reading vocabulary depends on use of context and that context is there for most reading vocabulary, in literary as well as informational text. These two huge and different assumptions raise unanswered but answerable questions. First, is it the case that we learn the meaning of most new words by using context? (The general consensus is that we learn the meaning of most new words in context, a very different statement.) Second, is an informational text apt to provide a context, never mind enough context, for determining the meaning of new, domain-based words? And third, is the use of context a sound strategy to promote pedagogically for determining the meaning of unknown words in any text, regardless of genre, discipline, domain, or research evidence?

Unfortunately, no body of research shows that we learn the meaning of most new words by using context for that purpose. (We may learn them in context, but that is not the same thing as using context to learn them.) Or that most new and difficult words students encounter in their literary reading have sufficient context (never mind any relevant context) to enable students to determine the meaning of these words from this context. Indeed, it is more likely that some understanding of the meaning of a new word helps students to understand the context. How they learned its meaning we may never find out. This seems to be the theory underlying National Assessment of Educational Progress (NAEP) assessments of vocabulary knowledge (a new feature since 2009). “NAEP assesses vocabulary in a way that aims to capture students’ ability to use their understanding or sense of words to acquire meaning from the passages they read….Students are asked to demonstrate their understanding of words by recognizing what meaning the word contributes to the passage in which it appears.”

In sum, there is NO research showing that sufficient context exists in literary or informational texts to justify an assessment format implying that students of any age can determine the meaning of a hard word by using its context. Worse yet, the almost exclusive use of this format may encourage teachers to teach students not to use a dictionary or other references for determining the meaning of an unknown word or phrase but to use its context instead, as if there would always be useful and sufficient context available for that purpose.

According to one well-known reading researcher, “there are four components of an effective vocabulary program: (1) wide or extensive independent reading to expand word knowledge, (2) instruction in specific words to enhance comprehension of texts containing those words, (3) instruction in independent word-learning strategies, and (4) word consciousness and word-play activities to motivate and enhance learning ( What are those “independent word-learning strategies” teachers should teach? The National Reading Panel’s 2000 report recommended four “Word-Learning Strategies”: 1) dictionary use, 2) morphemic analysis, 3) cognate awareness for ELL students, and 4) contextual analysis. Yet, PARCC stressed only one pedagogical model, with nothing in its documents indicating that the best (though not the most efficient) way to acquire new vocabulary is through wide reading, followed by advice to teachers on ways to stimulate leisure reading. Nor did PARCC (or SBAC) in their practice tests assess dictionary skills, morphemic analysis, or cognate awareness. Assessments that influence classroom teachers to ignore the critical importance of broad independent reading, the need for specific vocabulary instruction, and the information provided in a dictionary do an enormous disservice to those children who most need to expand their vocabulary.

The pedagogy that the vocabulary standards promote, as well as the need for students to locate “evidence” in the text to show that they have determined the meaning of an unknown word from what is in the text (or could determine it if need be), more often than not seem to have led to poorly constructed test items and incorrect information on word meanings. What this pedagogical and assessment model is leading teachers to do in their classrooms as part of test preparation is unknown.

Words/Phrases Selected for Assessment
Although PARCC claims it assesses “words that matter most in the texts, which include words essential to understanding a particular text and academic vocabulary that can be found throughout complex texts,” it is not apparent why many of the chosen words were selected. For example, cross in the grade 3 Thornton Burgess story and drift in the grade 4 selection about children in an elementary classroom are not important to the meaning of these reading selections for children, nor are they apt to be considered part of an academic vocabulary. The plot in these selections helps young readers to understand these words if they don’t already know their meanings by the age of 8 or 9.

Why SBAC seems to have decided to assess few word meanings directly is not known (recall that I was looking only at its practice tests). Maybe SBAC does not view vocabulary teaching worthy of highlighting by assessments despite the major conclusion reached many years ago by researchers on how to develop students’ reading and writing vocabularies—that while no one method is superior to other methods, some attention to vocabulary is better than no attention.
In addition, the words selected for the high school grades in SBAC (Grade 11: touts, lethargic,
problem, stick, encore, reprise, mass-produced) do not seem to be advanced high school English vocabulary.

It is not known exactly who chose the selections or the vocabulary for PARCC’s and SBAC’s sample tests for high school, and the public can never see all the actual test items at the high school level. Regardless of who chooses the reading passages, there should be a match between what is in Google’s definition (which incorporates what is in major dictionaries) and the correct answer to a test question on the meaning of a word.

Language of Assessment
It is also not clear why the questions in Part B in PARCC were worded as they were. In the early grades, they do not reflect how a teacher talks. Nor were they always precise. The lists of “construct relevant vocabulary” on the SBAC website explain the presence of difficult or cumbersome terminology in PARCC and SBAC practice test items, while the use of the Part A/Part B format in both tests suggests joint planning. But we may never have any definitive body of observational research showing whether teachers embed this terminology (correctly) in their daily instruction and whether students understand them.

Print Friendly, PDF & Email