Dear Ms. Sullivan:
- As The Boston Globe reports today, it is hardly “a number of states” that have adopted automated scoring systems but Utah and, covertly, Ohio.
- To my knowledge, the statement “automated scoring engines have become increasingly reliable and refined, particularly over the last several years,” has no basis in fact.
- Machines do not understand meaning; they just count. All automated essay scoring engines operate by counting “proxies” that substitute for higher-level traits. For example, the frequency of infrequently used words is often used as a proxy for verbal felicity, and, essay length for overall development. See my book chapter “Construct validity, length, score, and time in holistically graded writing assessments: The case against Automated Essay Scoring (AES)” <https://wac.colostate.edu/boo
ks/wrab2011/chapter7.pdf > - Machines are extremely poor at identifying grammatical errors in English. As I note in my 2016 article “Grammar checkers do not work” <http://lesperelman.com/wpcont
ent/uploads/2016/05/Perelman- >, when analyzing 5,000 words of an essay by Noam Chomsky originally published in The New York Review of Books, the grammar checker modules of ETS’s e-rater falsely identified 62 grammatical or usage errors, including 15 article errors and 5 preposition errors. The performance of another grammar checker was similarly flawedGrammar-Checkers-Do-Not-Work. pdf - AES engines also appear to privilege some linguistic and / or ethnic groups while unfairly penalizing others. In two studies by researchers at the Educational Testing Service, essays written by native Mandarin speakers were scored significantly higher the ETS’s engine than they were by human readers while essays by African-Americans were scored significantly lower by machines than they were by humans: Bridgeman, B., Trapani, C., & Attali, Y. (2012). “Comparison of human and machine scoring of essays: Differences by gender, ethnicity, and country.” Applied Measurement in Education, 25(1), 27–40; and Ramineni, C., Trapani, C. S., Williamson, D. M., Davey, T., & Bridgeman, B. (2012). “Evaluation of the e-rater ® scoring engine for the GRE® issue and argument prompts.” ETS RR—12-02. <https://www.ets.org/Media/Res
earch/pdf/RR-12-02.pdf > - Last year the Federal Education Minister of Australia proposed that the essay portion of NAPLAN, the Australian equivalent of MCAS, be scored by AES engines.
- I was commissioned by the New South Wales Teachers Federation to write a report on the proposal, “Automated Essay Scoring and NAPLAN: A summary report <https://www.nswtf.org.au/file
s/automated_essay_scoring_and_ > All the arguments made here are elaborated in much greater detail in that document.naplan.pdf - The response in Australia was highly supportive of my position. The editorial in the Sydney Morning Herald ”NAPLAN robo-marking plan does not compute” <https://www.smh.com.
au/national/naplan-robomarking > is a particularly eloquent and perceptive summary of the defects of machine scoring.-plan-does-not-compute- 20171012-gyzpl4.html - In December 2017, the Australian Education Council, a body consisting of the Education Ministers of all the Australian states and territories, unanimously overruled the Federal Education Minister and prohibited the use of AES machines in scoring the NAPLAN, including a proposal to have the essays read both by machines and human readers. <http://www.educationcouncil.e
du.au/site/DefaultSite/filesys >tem/documents/EC%20Communiques %20and%20media%20releases/ Education%20Council%20media% 20release%20%20-%20automated% 20essay%20scoring%20of% 20writing%20scripts.pdf
- I was commissioned by the New South Wales Teachers Federation to write a report on the proposal, “Automated Essay Scoring and NAPLAN: A summary report <https://www.nswtf.org.au/file