Pass or Fail: Is Testing a Valid Way to Measure Student Progress?
In this multi-part series, I provide a dissection of the phenomenon of retention and social promotion. Also, I describe the many different methods that would improve student instruction in classrooms and eliminate the need for retention and social promotion if combined effectively.
While reading this series, periodically ask yourself this question: Why are educators, parents and the American public complicit in a practice that does demonstrable harm to children and the competitive future of the country?
What if the measures we use to determine passing or failing grades are completely skewed? Is standardized testing, or any testing for that matter, the right way to determine student progress?
For obvious reasons, one of the first and most significant concerns for the application of standardized tests is that they are not consistent with the standards for fair and appropriate testing. Of course, educators must first define the standards themselves, and demonstrate them to be relevant. In this instance, we are referring to the standards for fair and appropriate testing as defined by the NRC Report, which says that measurement validity refers to the extent to which evidence supports a proposed interpretation and use of test scores for a particular purpose.
For instance, a measurement validity of the reading section of the SAT I standard test would be assessed to have a reasonable validity for assessment of an individual’s reading comprehension skills, knowledge of grammar rules, and ability to make inferences from texts. The use of scores from this test to determine an individual’s preparedness for entry into a particular college program would also be reasonably good. The component of appropriate testing usually overlaps with this second issue of validity, too, which the NRC Report Standards also outlines, and which is backed up by the findings of various other organizations.
To go back to the more formal parameters, the general rule is that the internal structure of the test, the content of the test, the relationship of the test to other criteria, and the psychological processes and cognitive operations used by the examinee in responding to the test items must all support the purpose of the test.
A test assessing knowledge and skill should target the knowledge and skills specifically; looking, as well, to ensure that the knowledge and skills being assessed are those that have been obtained from appropriate instruction. In some instances, knowledge might depend on poor instruction or on factors that are unrelated to the skills under review. For instance, a student might score poorly on the SAT reading test because their teachers didn’t transfer the necessary knowledge and skill (the students may not have received the targeted knowledge of proper grammar, for instance, or they have received inadequate instruction on how to read critically).
Another example would be that an individual might score badly on the SAT reading test not because they lack reading comprehension skills that the test intends to assess but because they have significant language barriers or because there are cultural differences that have some bearing on the test. For instance, a passage in American history that is being read for comprehension but that in some way relies upon presupposed knowledge of American history or customs might be problematic and undermine the validity and fairness of tests scores, undermining the attribution of cause.
Disabilities can also factor as an issue for the attribution of cause. Several types of cognitive or even physical disabilities can undermine an individual’s performance in a testing scenario without appropriate interventions provided to support the student’s exceptionalities.
In the context of K-12 assessments, the cause component also influences the extent to which students receive adequate opportunity to learn the material for the test. Adequate quality and quantity of instruction become important, as does the alignment of test content and curriculum.
Students need adequate opportunity within the testing scenarios to demonstrate their knowledge. If tests contain irrelevant language or content, for instance, students may not have adequate opportunity to perform and test developers will have compromised the fairness and relevance of the test.
Furthermore, many of the criteria for fairness in testing standards overlap with attribution of cause. In the Standards, overlapping elements include the investigation of bias and differential item functioning, determining whether construct-irrelevant variance differentially affects different groups of examinees, and equal treatment during the testing process.
Circular validity lies within the cause component in the sense that it relates to the alignment between test content and the curriculum taught in class. Chapter 13 of the Standards determines that “There should be evidence that the test adequately covers only the specific or generalized content and skills that students have had an opportunity to learn.”
This goes beyond the criteria outlined here and applies to a broader interpretation of opportunity to learn; one that is not restricted to curricular validity but also inclusive of the consideration of instructional quality as a predictor of student test scores.
Certain polices within the K-12 setting make high-stakes student decisions dependent upon evidence that the student has the educational experience and opportunity to acquire relevant knowledge and skill. Where students have lacked sufficient opportunity to acquire desired skills in an educational context, they may not meet the criteria for grade promotion or graduation.
At the same time, though, it is hardly fair that the student be held accountable for the deficit in their learning. At what point do we say: this portion of education is the responsibility of the schools, of the system and the stakeholders, not just the individual student?
The effectiveness of treatment is the final component of the fair and appropriate test criteria, relating to whether test scores lead to consequences that are educationally beneficial in a given context. Consequences could include placement in a particular academic grouping based on ability or advancement from one level of learning to a higher level based on test achievement. Accountability plays a part here, too, as the criteria for effective treatment determines that it is inappropriate to use tests to make placements that are not educationally beneficial.
When tests are used in placement decisions, they must be fair and appropriate. Students must be “better off in the setting in which they are placed than they would be in a different available setting.” With all of these factors in mind, though, can testing ever truly be trusted as a placement option for students?