The Rot Festers: Another National Research Council Report on Testing

Hout, M., & Elliott, S.W. (2011). Incentives and test-based accountability in education. Committee on Incentives and Test-Based Accountability in Public Education. Board on Testing and Assessment, Division of Behavioral and Social Sciences and Education, National Research Council. Washington, DC: The National Academies Press.

reviewed by Richard P. Phelps Footnote

In research organizations that have been “captured” by vested interests, the scholars who receive the most attention, praise, and reward are not those who conduct the most accurate or highest quality research, but those who produce results that best advance the interests of the group. Those who produce results that do not advance the interests of the group may be shunned and ostracized, even if their work is well-done and accurate.

The prevailing view among the vested interests in education does not oppose all standardized testing; it opposes testing with consequences based on the results that is also “externally administered”—i.e., testing that can be used to make judgments of educators but is out of educators’ direct control. The external entity may be a higher level of government, such as the state in the case of state graduation exams, or a non-governmental entity, such as the College Board or ACT in the case of college entrance exams.

One can easily spot the moment vested interests “captured” the National Research Council’s Board on Testing and Assessment (BOTA). BOTA was headed in the 1980s by a scholar with little background or expertise in testing (Wise, 1998). Perhaps not knowing who to trust at first, she put her full faith, and that of the NRC, behind the anti-high-stakes testing point of view that had come to dominate graduate schools of education. Proof of that conversion came when the NRC accepted a challenge from the U.S. Department of Labor to evaluate the predictive validity of the General Aptitude Test Battery (GATB) for use in unemployment centers throughout the country.

Fairness in Employment Testing, 1989 Footnote

From the 1960s to the 1990s, the field of personnel psychology (a.k.a. industrial-organizational psychology) produced an impressive body of technically advanced research on the costs and benefits of testing for personnel selection. Thousands (yes, thousands) of empirical studies were conducted in the United States alone, demonstrating that a fairly general aptitude or achievement test is the best single predictor of performance for the overwhelming majority of jobs, better than all other factors that employers generally used in hiring. The estimated net benefits of using tests for personnel screening were huge, with costs minuscule and benefits enormous.

In the late 1980s, the U.S. Department of Labor considered providing the federal government’s GATB, which was used for hiring in federal jobs, to local employment offices for use in hiring outside the federal government. The test would have been made available to job applicants who wished to take it, and test results would have been made available to employers who wished to review them.

The Labor Department asked the Board on Testing and Assessment at the National Research Council to review the question. Their report is extraordinary. In the face of overwhelming evidence to the contrary, the Board declared the following: there was only negligible evidence to support the predictive power of the GATB, and tests in general provided no benefits in personnel selection. Their conclusions were reached through tortuous illogic and contradiction, and a judicious selection of both committee members and research sources (see Phelps, 1999).

For example, not one of the hundreds of academic psychologists who studied personnel selection was invited to participate in writing the report, whereas several education professors who were well-known opponents of high-stakes testing were. Many of the world’s most-respected personnel and GATB testing experts were appointed to a “Liaison Committee”, but it was never consulted; their names, however, were then published in the final report, as if to imply they approved of the report.

They did not. Members of the Liaison Committee accused the NRC of choosing deliberately a committee they knew would be hostile toward the GATB research.

Moreover, only one of the thousands of empirical studies on personnel selection was discussed. In the face of thousands of predictive validity studies on general aptitude tests in employment, the study committee wrote: “very slim empirical foundation”, “the empirical evidence is slight”, “fragmentary confirming evidence”, “very little evidence”, “no well-developed body of evidence”, and “primitive state of knowledge.”

The Board dismissed the benefits of hiring better qualified applicants for jobs by arguing that if an applicant were rejected for one job, the applicant would simply find another somewhere else in the labor market, as all are employed somewhere. (No matter that the other job might be less well-paid, in an undesirable field or location, part time, temporary, or even nonexistent.

In the view of the report, “unemployment is a job.” The Board continued with the astounding contradiction that, whereas selection (and allocation) effects should be considered nonexistent because all jobs can be considered equivalent, general tests, like the GATB, cannot be any good as predictors because these tests do not account for the unique character of every job.

Constants on NRC testing study committees for the past quarter century have been the multiple participation of members of the federally-funded Center for Research on Educational Standards and Student Testing (CRESST), headquartered at UCLA, and members of an even more radical (anti-) testing research center at Boston College. Footnote Committee memberships are then rounded out with scholars known in advance to support CRESST biases and a few others with recognizable names and ideological sympathies, but little familiarity with the study topic. The many scholars who disagree with CRESST’s point of view are neither invited to participate nor cited in the study reports.

High Stakes, 1999 Footnote

The most revealing aspect of the National Research Council’s 1999 report, High stakes: Testing for tracking, promotion, and graduation (Heubert & Hauser) is its choice of source material. Sources were included that buttressed the views of the BOTA and hundreds of sources that did not were ignored. The majority of citations went to CRESST research and CRESST researchers. At the time, NRC’s Board was chaired by a CRESST director. The “Committee for Appropriate Test Use”, the entity responsible for the particular study, included three CRESST grandees and one from Boston College.


