Testing is still on my mind. (Josie’s teacher emailed me yesterday morning and asked me to pick up some gum for the class; kids are encouraged to chew it during the tests if it helps them focus. Josie wailed, “BUT I HATE GUM!” I said, “You don’t HAVE to chew it!” She said, “But I want help concentrating! Should I make myself chew gum? Should I get a lollipop instead? Can you write a note and ask if I can have a lollipop??” Upshot: I gave her a sucking candy and sent her off into the wilds of Standardized Testland. But you see what I’m up against.)
Anyway, this deeply disturbing op-ed is by a guy named Todd who used to grade statewide 4th grade English Language Arts tests. He had no special expertise, he says; he just walked in off the street and got the $8/hour gig after a five-minute interview. Unlike the 3rd grade test Josie is taking today, the 4th grade tests Todd graded weren’t fill-in-the-ovals thingies. Todd’s work required subjective responses from him and his fellow graders. To quote one of my fave TV networks, Watch What Happens!
One of the tests I scored had students read a passage about bicycle safety. They were then instructed to draw a poster that illustrated a rule that was indicated in the text. We would award one point for a poster that included a correct rule and zero for a drawing that did not.
The first poster I saw was a drawing of a young cyclist, a helmet tightly attached to his head, flying his bike over a canal filled with flaming oil, his two arms waving wildly in the air. I stared at the response for minutes. Was this a picture of a helmet-wearing child who understood the basic rules of bike safety? Or was it meant to portray a youngster killing himself on two wheels?
I was not the only one who was confused. Soon several of my fellow scorers — pretty much people off the street, like me — were debating my poster, some positing that it clearly showed an understanding of bike safety while others argued that it most certainly did not. I realized then — an epiphany confirmed over a decade and a half of experience in the testing industry — that the score any student would earn mostly depended on which temporary employee viewed his response.
Oy. Another year, Todd was scoring a 9th grade test in which the kids had to write movie reviews. One kid snarkily reviewed Debbie Does Dallas. (In high school, that’s the kid I would have wanted for a boyfriend. But I digress.) Todd thought the essay was hilarious and well-written; another tester agreed but thought it should be penalized for being “inappropriate.”
All of the 100 or so scorers in the room soon became embroiled in the debate. Eventually we came to the “consensus” that the essay deserved a 6 (“genius”), or 4 (well-written but “naughty”), or a zero (“filth”). The essay was ultimately given a zero.
Todd also writes about scoring tests while well into his own personal happy hour. For more horrific details, I’ll have to buy his book.
My point, and I do have one: We’ve already discussed the fact that the test questions themselves blow some serious chunks. Now we see that the scoring system is also flawed. At some point I’ll go into more detail about how the data gleaned from these fakakta tests is being misused. (BTW my husband would say it should be “data are being misused.” Let’s ask Todd!) The upshot: These tests should not be as high-stakes as they are, because they are Made of Doody.