Robo-Readers Used to Grade Test Essays

A rather complacent article in Inside Higher Education touts a study out of the University of Akron that compares grades assigned to standardized test essays by humans and those assigned by computers. The news that they found no significant difference is sure to be greeted with great joy by Educational Testing Serive (ETS), which administers and/or develops the SAT, GRE, and TOEFL. This is because “the automated reader developed by ETS, e-Rater, can grade 16,000 essays in 20 seconds.” Human graders can barely be pushed to grade more than 30 essays an hour. What a marvelous increase in efficiency!

Fortunately, ETS was foolhardy enough to allow Les Perlman, a director of writing at MIT, access to e-Rater for a month. He found that:

the automated reader can be easily gamed, is vulnerable to test prep, sets a very limited and rigid standard for what good writing is, and will pressure teachers to dumb down writing instruction.

The e-Rater’s biggest problem, he says, is that it can’t identify truth. He tells students not to waste time worrying about whether their facts are accurate, since pretty much any fact will do as long as it is incorporated into a well-structured sentence. “E-Rater doesn’t care if you say the War of 1812 started in 1945,” he said.

Mr. Perelman found that e-Rater prefers long essays. A 716-word essay he wrote that was padded with more than a dozen nonsensical sentences received a top score of 6; a well-argued, well-written essay of 567 words was scored a 5.

An automated reader can count, he said, so it can set parameters for the number of words in a good sentence and the number of sentences in a good paragraph. “Once you understand e-Rater’s biases,” he said, “it’s not hard to raise your test score.”

E-Rater, he said, does not like short sentences.

Or short paragraphs.

Or sentences that begin with “or.” And sentences that start with “and.” Nor sentence fragments.

However, he said, e-Rater likes connectors, like “however,” which serve as programming proxies for complex thinking. Moreover, “moreover” is good, too.

Gargantuan words are indemnified because e-Rater interprets them as a sign of lexical complexity. “Whenever possible,” Mr. Perelman advises, “use a big word. ‘Egregious’ is better than ‘bad.’ ”

The substance of an argument doesn’t matter, he said, as long as it looks to the computer as if it’s nicely argued.

For a question asking students to discuss why college costs are so high, Mr. Perelman wrote that the No. 1 reason is excessive pay for greedy teaching assistants.

“The average teaching assistant makes six times as much money as college presidents,” he wrote. “In addition, they often receive a plethora of extra benefits such as private jets, vacations in the south seas, starring roles in motion pictures.”

E-Rater gave him a 6. He tossed in a line from Allen Ginsberg’s “Howl,” just to see if he could get away with it.

He could.

Robo-Readers Used to Grade Test Essays – NY Times

This entry was posted in Future of the University, Graduate Studies, Metrics, Science and technology ramifications, TechnoScience & Technoscientism. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>