Man and machine: Better writers, better grades

The University of Akron News

04/12/2012

A direct comparison between human graders and software designed to score student essays achieved virtually identical levels of accuracy, with the software in some cases proving to be more reliable, a groundbreaking study has found.

The study, underwritten by the William and Flora Hewlett Foundation was conducted by experts in educational measurement and assessment, led by Dr. Mark Shermis, dean of the College of Education at The University of Akron.

Dean Mark Shermis

Dr. Mark Shermis, dean of the College of Education, was the principal investigator of a study that found little difference in accuracy between essays graded by humans or by software.

“The demonstration showed conclusively that automated essay scoring systems are fast, accurate, and cost-effective,” said Tom Vander Ark, CEO of Open Education Solutions, which provides consulting services related to digital learning, and co-director of the study.

That’s important because writing essays is one way for students to learn critical reasoning, but teachers don’t assign essays often because grading them is both expensive and time consuming.

The key to better writing

"After 50 years of writing research, we know that if you want to be a better writer, you need to write more. And the major impediment to better writing is that someone has to go through the essays that students create," said Shermis, the principal investigator of the study. "My father was a high school English teacher. Every time he gave a writing assignment, he would come home with a stack 150 papers to grade. Usually he did this on a Friday, and when that happened, I knew our weekend was shot.

"This technology will help teachers go through more essays and give more writing assignments to improve student writing," Shermis said.

Education experts believe that critical reasoning and writing are part of a suite of skills that students need to be competitive in the 21st century. Others are working collaboratively, communicating effectively and learning how to learn, as well as mastering core academic content. The Hewlett Foundation calls this suite of skills Deeper Learning and is making grants to encourage its adoption at schools throughout the country.

“Better tests support better learning,” says Barbara Chow, education program director at the Hewlett Foundation. “This demonstration of rapid and accurate automated essay scoring will encourage states to include more writing in their state assessments. And, the more we can use essays to assess what students have learned, the greater the likelihood they’ll master important academic content, critical thinking, and effective communication.”

First comprehensive test of scoring software

For more than 20 years, companies that provide automated essay scoring software have claimed that their systems can perform as effectively, more affordably and faster than other available methods of essay scoring.

The study was the first comprehensive multi-vendor trial to test those claims. The study challenged nine companies that constitute more than 97 percent of the market of commercial providers of automated essay scoring to compare capabilities.

More than 16,000 essays were released from six participating state departments of education, with each set of essays varying in length, type, and grading protocols. The essays were already hand scored according to state standards. The challenge was for companies to approximate established scores by using software.

Solid results for most writing samples

Shermis said some critics of automated grading assert the technology will homogenize writing.

"To some degree, that's true; that is, automatic grading doesn’t do well on very creative kinds of writing, such as haiku," he said. "But this technology works well for about 95 percent of all the writing that's out there, and it will provide unlimited feedback to how you can improve what you have generated, 24 hours a day, seven days a week. If you are writing at 2 am, which many college students do, it's there to tend to your needs."

At a time when the U.S. Department of Education is funding states to design and develop new forms of high-stakes testing, the study introduces important data. Many states are limited to multiple-choice formats, because more sophisticated measures of academic performance cost too much to grade and take too long to process. Forty-five states are already actively overhauling testing standards, and many are considering the use of machine scoring systems.

Contest to spur innovations

The study grows from a contest call the Automated Student Assessment Prize, or ASAP, which the Hewlett Foundation is sponsoring to evaluate the current state of automated testing and to encourage further developments in the field.

In addition to looking at commercial vendors, the contest is offering $100,000 in cash prizes in a competition open to anyone to develop new automated essay scoring techniques. The open competition is under way now and scheduled to close on April 30. The pool of $100,000 will be awarded the best performers. Details of the public competition.

The open competition website includes an active leader board to document prize rules, regularly updated results, and discussion threads between competitors.

The goal of ASAP is to offer a series of impartial competitions in which a fair, open and transparent participation process will allow key participants in the world of education and testing to understand the value of automated student assessment technologies.

ASAP is being conducted with the support of the Partnership for Assessment of Readiness for College and Careers and Smarter Balanced Assessment Consortium, two multi-state consortia funded by the U.S. Department of Education to develop next- generation assessments. ASAP is aligned with the aspirations of the Common Core State Standards and seeks to accelerate assessment innovation to help more students graduate from college and to become career ready.

Breakthrough anticipated

Jaison Morgan, CEO of The Common Pool, a consulting business that specializes in developing effective incentive models for solving problems, and co-director of the study, said the prize and studies will raise broader awareness of the current capabilities of automated scoring of essays.

“By offering a private demonstration of current capabilities, we can reveal to our state partners what is already commercially available,” Morgan said. “But, by complimenting it with a public competition, we will attract new participants to the field and investment from new players. We believe that the public competition will trigger major breakthroughs.”

ASAP is preparing to introduce a second study, in which private providers and public competitors will be challenged to reveal the capabilities of automated scoring systems for grading short-answer questions. The second study will be conducted this summer. There are another three ASAP studies in development.

Shermis, a noted expert on automated scoring, is author of "Classroom Assessment in Action and co-editor of Automated Essay Scoring: A Cross-Disciplinary Approach."

ASAP was designed by The Common Pool, LLC, and is managed by Open Education Solutions, Inc.

Kaggle helps companies, governments, researchers, and other organizations identify solutions to some of the world's hardest problems by posting them as competitions to a community of more than 33,000 PhD-level data scientists located around the world.

News media coverage:

Akron Beacon Journal: "UA dean instrumental in automated grading study"

NPR | State Impact Ohio: "Computers can score student essays as well as humans, a study finds"

Media contact: Eileen Korey, 330-972-8589 or korey@uakron.edu

Video

Evaluating content

Dean Shermis reflects on the software's ability to assess the quality of written ideas.

Customized learning for students

Dean Shermis discusses how automated scoring of essays to lead to more tailored learning for students.

Assessing famous speeches

How did famous speeches by Abe Lincoln and John F. Kennedy fare when evaluated by the software?