As I touched on in How Grades Are Damaging Students, I have become a firm believer in educational reform and getting rid of our current notion of grading altogether. Using some of the same mathematics as in my past last week, Does God Play Dice?, I wanted to explore the role random error plays in grades students receive.

I wanted to simulate classes of students receiving grades by teachers and teachers then using that information, for instance, to rank the Top 10. This practice is still done in many high schools for graduation purposes. But this exploration goes far beyond a Top 10 and touches on every single decision made with grades, from class placement, to demotion/promotion in school, to college choices, career options, and beyond. Truly, grades can have lifelong consequences on young human beings, which is why teachers should be even more cognizant of the actions of which they take.

Using R, I created classes of various sizes (25, 50, 100, and 500) and in each, I picked a Top 10, with the following caveats: Student grades were based on two metrics; first is their true grade, which is modeled by a logit-normal distribution (as recommended here), and also by “random” error, which we should discuss. This error component, in theory, is composed of random error (it’s modeled by a continuous uniform distribution), but in practice, teachers’ implicit biases, feelings about the students, explicit and implicit desires about what work is “excellent”, “good”, “fair”, “poor”, <insert arbitrary quality indicator here>, etc. This “random” error also takes into account confounding factors contributing to the grade, such as distractions, test anxiety, confidence, conscientiousness, stresses, and so on.

The model took the overall grade as 85%, 90%, or 95% based on true ability grade and 5%, 10%, or 15% based on “random” error, respectively. This simulation was then repeated 10,000 times, and the averages taken.

The main two metrics I looked for are as follows: 1) If the error played NO role, how many of the Top 10 would have NOT made it into the Top 10 otherwise? (listed as #1 in the table below); and, 2) How many grade points (on a 100-point scale), on average, did the Top 10 benefit from, due to the error factor? (listed as #2 in the table below). The results are shown:

% Error Allowed | Class Size | #1 (approx) | #2 (approx) |

5% | 25 | 0.5 | 2.1 |

5% | 50 | 0.75 | 2.4 |

5% | 100 | 1 | 2.9 |

5% | 500 | 1.5 | 3.2 |

10% | 25 | 1 | 5.6 |

10% | 50 | 1.5 | 6 |

10% | 100 | 2 | 6.4 |

10% | 500 | 2.5 | 7.1 |

15% | 25 | 1 | 8 |

15% | 50 | 2.5 | 9.4 |

15% | 100 | 3.5 | 10.1 |

15% | 500 | 5 | 12.6 |

Thus, even in a class of 25 and assuming a modest 5% error, those that score the top grades benefit from over 2 points on a 100-point scale due to error alone! Think about how many students have an 88 versus a 90 or how often the Valedictorian is chosen by just tenths of a point, and you see how even in this smallest scenario, the results are shocking!

That’s to say nothing about when the class size gets larger. Imagine a lecture hall at a college of 100 students, and let’s say there’s 10% of the grade based on “error” (or the arbitrariness and capriciousness of the professor). If this professor grades in such a way that only the Top 10 will receive an A, those Top 10 will have benefitted an average of 6.4 points (out of 100) just by whim and fancy alone, not by skills they showcased, and a full 2 of them shouldn’t even be in the Top 10 if error played no role!

This becomes devastating when you think of all the college and career choices people make over the 20+ years of schooling they endure based on the results of grades, GPA, and class rankings.

See, here’s the sad and dubious truth about grades: They use mathematics and statistics to produce the illusion of objectivity. In reality, grades are ALWAYS subjective; unless measuring rote memorization or simple fact-and-recall (which arguably shouldn’t be graded anyway), there is ALWAYS human judgement involved in the grading process. The veneer of grades simply allows administrators, policy-makers, and sometimes teachers the façade of evidence needed to keep others happy and to satisfy parents and students.

Note, I’m not advocating for a wholesale scrapping of the education system altogether. Rather, I think society needs to reimagine what mastery looks like. If a student has complete mastery of a concept, what does that look like? What evidence can you show that isn’t on a scale, rubric, or number line? Describe what you know and how you know it.

Until society can see grading for the ghost of a solution that it is, the injustice we parade as objective scoring will continue to deliver both lucky breaks and heartaches to students who are none the wiser.

(P.S. The R code I used is almost a facsimile of the code in my Does God Play Dice? post with a few label modifications, changing the distribution assumptions, and such. Feel free to play with it if you want.)