Morgan Polikoff: Looking at State Tests

Mar 18, 2016 by

An Interview with Morgan Polikoff: Looking at State Tests

Michael F. Shaughnessy

1) Professor Polikoff, your work , Pencils down: What I learned from studying the quality of state tests has just been posted on the Fordham website. What led up to this research?

​This blog post is a final summary of my lessons learned from co-leading the project Evaluating the Content and Quality of Next Generation Assessments. ​That project, which took place over the last two years, was a response to the tremendous demand for evidence about the quality of new state assessments. While a great deal of money and time have been spent developing new assessments to measure the content and skills in the Common Core standards, there has yet been very little evidence about the quality of these new tests. Our report was a first attempt to take a comprehensive look at the quality of the assessments in order to advise policymakers and state leaders.

2) What specific factors did you look at in terms of the quality of state tests?

​We used a methodology developed by the Center for Assessment, ​and based on the Council of Chief State School Officers’ Criteria for Procuring and Evaluating High Quality Assessments. The methodology focuses on multiple dimensions of test content and quality, which are broadly classified as “Content” and “Depth.” In general terms, the ​content ratings reflect the degree to which the assessment measures student knowledge of the material most needed for readiness (in other words, the most important content in the standards). So this includes things like close reading, writing to sources, and coverage of the major work of each grade level. The depth ratings reflect the degree to which the assessment measures the full depth and complexity of the standards.

So, this includes things like item and passage quality and the overall cognitive demands required by the assessment. Our study is not a typical validity/reliability study (though certainly content is an element of validity)–while such studies are important, we thought it was important to focus specifically on content.

3) Let’s look at extremes first—best state and worst state in terms of validity, reliability or any other factors?

​We didn’t look at individual states’ tests, but we did examine four assessments: PARCC, Smarter Balanced, ACT Aspire, and Massachusetts’ MCAS. In general, PARCC and Smarter Balanced scored the highest in terms of their coverage of the key content in the standards in both subjects. In contrast, the depth ratings were relatively consistent for all four assessments. ​

4) Is there variance in terms of the amount of time allocated to these tests?

​Yes, PARCC and Smarter Balanced are typically longer than both ACT Aspire and MCAS. So that is a trade-off that states would need to consider. It may be difficult for a test to measure all of the most important content and still be short in duration.​

5) Obviously states differ in terms of makeup and heterogeneity. Do state tests seem to take this into account at all?

​This is not really something our study looked at. That said, my own opinion is that states may say they do things like this, but almost certainly do not.​

6) Funny things those fonts and formats–are all state tests consistent or do the fonts and format and sizes seem to differ? Or did you not look at these factors?

​We did not really look at these factors either. PARCC, Smarter Balanced, and ACT Aspire are all computer-based tests, whereas the old MCAS was not–that is certainly one difference. ​

7) Weighting (perhaps a new politically incorrect word) is often seen in certain tests in terms of math and reading. Did you look at this factor? What did you find?

We didn’t specifically look at weighting in our study. We did have one criterion in mathematics that was focused on the balance of conceptual understanding, procedural fluency, and application. However, for reasons that are spelling out in the report, our reviewers found it hard to apply this criterion, so we do not report those results. That’s an important area for future work, however.

8) Education is more than just reading, spelling and math- but written expression also- are any states looking at written expression?

​Absolutely. This was, in fact, one of the biggest differences between PARCC and Smarter Balanced on the one hand and ACT Aspire and MCAS on the other. PARCC and Smarter Balanced both had writing prompts that required students to read one or more texts and construct a response that drew on textual evidence to support their argument. ACT Aspire’s prompts did not typically require citing evidence from sources, nor did MCAS’s (MCAS also had a very limited range of writing types and only tests writing in certain grades).​

9) What have I neglected to ask?

​I think these are good questions.​

10) Is there a link where our readers can get more information?

​I’ve included some key links in the above responses. I encourage readers to read the report (or at least the executive summary) and judge for themselves.​

Print Friendly, PDF & Email

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.